scorching-aiops
Описание
🔥 Scorching AIOps — Autonomous AI-powered Kubernetes incident management with eBPF, causal graphs, and LLM agents
Релизы1
Все релизыЯзыки
- TypeScript34,2%
- Rust31,3%
- Python17,4%
- JavaScript7,4%
- Shell4,3%
- CSS2%
- Остальные3,4%
Scorching AIOps
Autonomous AI-Powered Kubernetes Incident Management Platform
Open-source AIOps: eBPF telemetry + causal graphs + LLM agents = automated incident lifecycle
English · Русский · Quick Start · Architecture
Overview
Scorching automates the full incident lifecycle in Kubernetes: Observe → Analyze → Plan → Apply → Verify. Combines eBPF telemetry (Tetragon), causal graph analysis (Neo4j), and LLM decision-making (LangGraph + Ollama).
Features
- Observe — eBPF kernel telemetry (Tetragon) + OpenTelemetry + Prometheus
- Analyze — Causal RCA with Neo4j, neuro-symbolic reasoning, business impact prediction
- Plan — LLM-generated remediation via LangGraph orchestrator
- Apply — Automated K8s remediation (scale, restart, rollback, canary via Argo Rollouts)
- Verify — Health checks with retry, alert suppression, model drift detection
- AI Chat — DevInfra agent with model selection (qwen3.5, deepseek-r1, llama3.2)
- Dashboard — Incidents, metrics, forecasts, audit trail
- Governance — GDPR/SOC2 compliance before autonomous actions
- Security — eBPF threat detection, NetworkPolicy, RBAC
- GitOps — ArgoCD ApplicationSet with auto-sync
Comparison
| Feature | Commercial AIOps | Scorching |
|---|---|---|
| Cost | 500k/year | Free (open-source) |
| eBPF telemetry | Partial | Full (Tetragon) |
| Causal graph RCA | Proprietary | Neo4j (open) |
| LLM agent | Limited | LangGraph + local LLM |
| Self-hosted | Limited | Yes (kind/K8s) |
Quick Start
Prerequisites
- Docker (8GB+ RAM, 4+ CPU cores)
- Linux kernel ≥ 5.8 (for eBPF/Tetragon)
,kubectl,helm(auto-installed if missing)kind
Deploy
One command deploys: kind cluster → Kafka → ArgoCD → ClickHouse → Neo4j → Prometheus → Grafana → Tetragon → 12 microservices → WebUI → LLM model. ~15–30 min.
Access
| Service | URL | Credentials |
|---|---|---|
| WebUI | http://localhost:9090 | — |
| Ingress | http://localhost | — |
| ArgoCD | https://localhost:8080 | admin / (see output) |
| Grafana | http://localhost:3000 | admin / aiops-admin |
| Neo4j | http://localhost:7474 | neo4j / neo4j-aiops-password |
Fresh Reinstall
Architecture
WebUI (Next.js) → Backend (FastAPI) → Neo4j + ClickHouse + Kafka
↓
aiops-orchestrator (LangGraph: Observe→Analyze→Plan→Apply→Verify)
├── causal-ai-correlator ├── remediation-controller (Go)
├── llm-router → Ollama ├── governance-agent
├── security-agent ├── business-impact-predictor
├── neuro-symbolic-reasoner ├── alert-suppression-service
├── model-maintenance └── Tetragon (eBPF)
12 Microservices
| Service | Lang | Role |
|---|---|---|
| aiops-orchestrator | Python | LangGraph agent: full OODA loop |
| causal-ai-correlator | Python | Builds RCA graphs in Neo4j |
| remediation-controller | Go | Executes kubectl (scale/restart/rollback) |
| llm-router | Python | Routes to Ollama/vLLM/API |
| governance-agent | Python | Policy compliance checks |
| security-agent | Python | Threat detection |
| business-impact-predictor | Python | Revenue impact estimation |
| neuro-symbolic-reasoner | Python | Hybrid RCA |
| alert-suppression-service | Python | Dynamic noise reduction |
| model-maintenance-service | Python | Drift detection |
| webui-backend | Python | FastAPI backend (47+ endpoints) |
| webui-frontend | TypeScript | Next.js dashboard + AI chat |
Tech Stack
Infrastructure: Kubernetes (kind) · Kafka (KRaft) · ArgoCD · Argo Rollouts · cert-manager Data: ClickHouse · Neo4j · OpenTelemetry Collector AI/ML: LangGraph · Ollama (qwen3.5:2b) · Neuro-symbolic reasoning Observability: Prometheus · Grafana · Tetragon (eBPF) Security: Tetragon TracingPolicy · NetworkPolicy · RBAC · Governance Agent
Testing
Обзор
Scorching — self-hosted AIOps-платформа для Kubernetes. Полный цикл управления инцидентами: Observe → Analyze → Plan → Apply → Verify. eBPF (Tetragon) + Neo4j (каузальный граф) + LangGraph (LLM-агент) + Ollama (qwen3.5:2b).
Быстрый старт
Одна команда разворачивает: kind-кластер, Kafka, ArgoCD, ClickHouse, Neo4j, Prometheus, Grafana, Tetragon, 12 микросервисов, WebUI и LLM-модель.
Возможности
- Полный цикл AIOps: Observe → Analyze → Plan → Apply → Verify
- eBPF: Tetragon DaemonSet для безагентного сбора событий ядра
- Каузальный граф: Neo4j для причинно-следственных связей
- LLM-агент: LangGraph-оркестратор с локальной моделью
- AI-чат: DevInfra-агент с выбором модели и режимом рассуждений
- GitOps: ArgoCD с автосинком и самовосстановлением
- 12 микросервисов: Каждый решает конкретную подзадачу AIOps
Contributing
See CONTRIBUTING.md
Security
See SECURITY.md