gigaevo-platform

0

Описание

A machine learning experiment management system with a microservices architecture, featuring Kafka-based messaging and three-tier service separation.

Языки

  • Python96,3%
  • Shell2,4%
  • Makefile1%
  • Dockerfile0,3%
4 месяца назад
3 месяца назад
5 месяцев назад
5 месяцев назад
3 месяца назад
3 месяца назад
3 месяца назад
3 месяца назад
5 месяцев назад
3 месяца назад
5 месяцев назад
5 месяцев назад
3 месяца назад
3 месяца назад
3 месяца назад
3 месяца назад
3 месяца назад
3 месяца назад
3 месяца назад
3 месяца назад
README.md

GigaEvo Platform

A machine learning experiment management system with a microservices architecture, featuring Kafka-based messaging and three-tier service separation.

Python 3.12+ License: MIT

🏗️ Architecture Overview

GigaEvo Platform consists of three main components:

🔧 Master API (Port 8000)

  • Role: Experiment orchestration and coordination
  • Technology: FastAPI, Kafka, PostgreSQL, Redis
  • Features:
    • Kafka integration for async messaging
    • Experiment lifecycle management
    • Configuration storage and retrieval
    • uv-based dependency management

🏃 Runner API (Port 8001)

  • Role: Task execution with GigaEvolve integration
  • Technology: FastAPI, GigaEvolve tools
  • Features:
    • Experiment code execution
    • Results visualization
    • Best program extraction
    • Background task processing

🌐 WebUI (Port 7860)

  • Role: Gradio-based user interface
  • Technology: Gradio, Plotly, Requests
  • Features:
    • Interactive experiment creation
    • Real-time progress monitoring
    • Results visualization
    • System status dashboard

🚀 Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.12+ (for local development)
  • uv (recommended) or pip

LLM configuration

GigaEvo platform reads all LLM settings from a single repo-level file:

llm_models.yml
. Create
llm_models.yml
from the
llm_models.yml.example
template and fill in your credentials.

Using the Deployment System

GigaEvo Platform uses the deploy.sh script with Docker Compose for service orchestration:

This will deploy with automated health checks:

  • Infrastructure: PostgreSQL, Kafka, Zookeeper, Redis (2 instances), MinIO
  • Applications: Master API, Runner API, WebUI
  • Networking: Docker network and shared volumes
  • Health Monitoring: Automatic service health verification

2. Deploy Development Environment

3. Individual Service Development

4. Service Management

Access Points

Runner pool size configuration

By default, the platform starts with a single runner instance. To run multiple experiments in parallel, increase the runner pool size:

The system automatically generates a

docker-compose.runner-pool.*.generated.yml
file with N runner services.

Runner pool instance controls (WebUI)

The WebUI “Runner Instances” tab calls the Master API (

/api/v1/instances/*
) to start/stop/restart runners and fetch container logs.

With a Compose-managed runner pool (

RUNNER__MANAGE_CONTAINERS=false
, the default in
make dev
/
make deploy
), Master controls the already-created
runner-api-N
containers via Docker.

Requirements:

  • master-api
    has Docker access: mount
    /var/run/docker.sock
    (and ensure the container user can read/write it; otherwise run
    master-api
    as root or align the socket group).
  • Runner containers are started by Docker Compose (Master finds them via
    com.docker.compose.service=runner-api-N
    labels; set
    COMPOSE_PROJECT_NAME
    if you have multiple stacks with the same service names).

Security note: mounting the Docker socket grants the

master-api
container root-equivalent control over the host Docker engine.

Quick manual checks (requires the stack running):

📚 API Endpoints

Master API (as per docs/api_endpoints.md)

  • POST /api/v1/experiments/
    - Initialize experiment
  • GET /api/v1/experiments/
    - Get list of experiments
  • GET /api/v1/experiments/{experiment_id}/status
    - Request status
  • POST /api/v1/experiments/{experiment_id}/start
    - Start experiment
  • POST /api/v1/experiments/{experiment_id}/stop
    - Stop experiment
  • GET /api/v1/experiments/{experiment_id}/results
    - Get results

Runner API (as per docs/api_endpoints.md)

  • POST /api/v1/experiments/{experiment_id}/upload
    - Load experiment code
  • POST /api/v1/experiments/{experiment_id}/start
    - Start experiment
  • POST /api/v1/experiments/{experiment_id}/stop
    - Stop experiment
  • GET /api/v1/experiments/{experiment_id}/status
    - Get execution status
  • GET /api/v1/experiments/{experiment_id}/visualization
    - Get visualization
  • GET /api/v1/experiments/{experiment_id}/best-program
    - Get best program
  • GET /api/v1/experiments/{experiment_id}/logs
    - Get logs (optional)

🔄 Kafka Topics

The system uses these Kafka topics for coordination:

  • experiment-config
    - Experiment configuration received
  • experiment-prepared
    - Experiment prepared for execution
  • experiment-started
    - Experiment execution started
  • experiment-stopped
    - Experiment execution stopped
  • runner-status
    - Runner status updates

🛠️ Development

Local Development Setup

Container-Based Development

Code Quality

Database Management

🐛 Troubleshooting

Common Issues

  1. Port Conflicts: Ensure these ports are free:

    • 5432: PostgreSQL
    • 6379, 6380: Redis (2 instances)
    • 7860: WebUI
    • 8000: Master API
    • 8001: Runner API
    • 9000, 9001: MinIO
    • 9092, 29092, 29093: Kafka
    • 2181: Zookeeper
  2. Deployment Issues:

  3. Service Health Check Failures:

  4. Database Connection Issues:

Environment Variables

Key environment variables for Master API:

  • DATABASE__URL
    - PostgreSQL connection string
  • KAFKA__BOOTSTRAP_SERVERS
    - Kafka bootstrap servers
  • REDIS_URL
    - Redis connection URL
  • STORAGE__ENDPOINT_URL
    - MinIO endpoint
  • STORAGE__ACCESS_KEY
    - MinIO access key
  • STORAGE__SECRET_KEY
    - MinIO secret key

📊 Architecture Details

Current Kafka-Based Architecture

The platform uses a modern microservices architecture with:

  1. Kafka Message Broker - Asynchronous service communication with topics for experiment coordination
  2. Separate Docker Compositions - Modular deployment with infrastructure and application services
  3. Health Monitoring - Automated service health checks and recovery
  4. Resource Isolation - Dedicated Redis instances and MinIO storage
  5. uv Dependency Management - Fast package installation and dependency caching

Service Orchestration

  • deploy.sh: Main deployment script with health checks and service management
  • docker-compose.kafka.yml: Core infrastructure services
  • docker-compose.*.yml: Individual application service configurations
  • Makefile: Development commands and shortcuts

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and linting:
    make test && make lint
  5. Submit a pull request

📄 License

MIT License - see LICENSE file for details.