celery-control

0

Описание

Языки

  • Python94,6%
  • Shell3,8%
  • Dockerfile0,9%
  • Makefile0,7%
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
7 месяцев назад
README.md

Celery Control

A Prometheus monitoring library for Celery that implements a pull-based approach to collect metrics directly from workers, providing better performance, scalability, and flexibility compared to traditional event-based monitoring solutions.

Features

  • Pull-based metrics: Exposes a
    /metrics
    endpoint for Prometheus scraping.
  • Comprehensive task metrics:
    • Successful tasks counter
    • Error counter with exception labels
    • Retry counter with reason labels
    • Revoked tasks counter (expired/terminated)
    • Unknown tasks counter
    • Rejected tasks and messages counters
    • Internal Celery errors counter
    • Task runtime histogram
  • Worker state metrics:
    • Prefetched requests
    • Active requests
    • Waiting requests
    • Scheduled tasks
  • Worker health monitoring: Tracks worker online status.
  • Task publication metrics: Tracks tasks published by clients, beat, or other services.
  • Multiprocessing support: Compatible with both
    prefork
    and
    threads
    pools.
  • Django integration: Easy setup for Django + Celery + Beat projects.

Why Celery Control?

Traditional event-based monitoring (e.g., Flower) has limitations:

  • Additional load on the worker (17-46% performance impact)
  • Single point of failure
  • Difficulty scaling horizontally
  • Old metrics accumulation

Celery Control solves these by:

  • Reducing worker load (only 6-12% performance impact)
  • Enabling horizontal scaling
  • Providing real-time internal state metrics
  • Allowing application-level metrics (DB, cache, external services)

Explanation

Installation

Quick Start

For Celery Workers

For Task Publisher — Celery Beat

For Task Publisher — WSGI Server

Environment Variables

  • CELERY_CONTROL_SERVER_HOST
    : Metrics server host (default:
    0.0.0.0
    )
  • CELERY_CONTROL_SERVER_PORT
    : Metrics server port (default:
    5555
    )
  • CELERY_CONTROL_SERVER_DISABLE_COMPRESSION
    : Disable compression (default:
    False
    )
  • CELERY_CONTROL_TRACKER_DAEMON_INTERVAL
    : Daemon update interval (default:
    15
    )

Performance

Celery Control introduces minimal overhead compared to event-based monitoring solutions. Below are the performance test results comparing different configurations.

All tests conducted on MacBook M1 Pro with:

  • Log level: ERROR
  • Gossip, mingle, heartbeat: disabled
  • Prefetch multiplier: 1,000
  • Concurrency: 1

Test 1: Constant Load Simulation

  • Scenario: 100 batches of 1,000 tasks each (total: 100K tasks)
  • Execution: Publisher sends all tasks to queue, then worker processes them
  • Measurement: Median processing time per batch
  • Purpose: Simulates continuous workload with always-available tasks

Test 2: Temporary Load Simulation

  • Scenario: 10 batches of 1,000 tasks each, repeated 10 times (total: 100K tasks)
  • Execution: Each iteration: publish → start worker → process → stop worker
  • Measurement: Median of maximum processing times
  • Purpose: Simulates burst workload

Prefork Pool Results

Test 1 - Constant Load (Median TPS): ┌────────────────────────────────────────┐ │ Default: 2097 tasks/second │ │ With Events: 1252 tasks/second │ │ With Celery Control: 1968 tasks/second │ └────────────────────────────────────────┘ Test 2 - Temporary Load (Median Max Time): ┌────────────────────────────────────────┐ │ Default: 3191 tasks/second │ │ With Events: 1720 tasks/second │ │ With Celery Control: 2816 tasks/second │ └────────────────────────────────────────┘

Performance Impact:

  • Events: 40% reduction in Test 1, 46% in Test 2
  • Celery Control: 6% reduction in Test 1, 12% in Test 2

Threads Pool Results

Test 1 - Constant Load (Median TPS): ┌────────────────────────────────────────┐ │ Default: 2452 tasks/second │ │ With Events: 2038 tasks/second │ │ With Celery Control: 2285 tasks/second │ └────────────────────────────────────────┘ Test 2 - Temporary Load (Median Max Time): ┌────────────────────────────────────────┐ │ Default: 3672 tasks/second │ │ With Events: 2135 tasks/second │ │ With Celery Control: 3400 tasks/second │ └────────────────────────────────────────┘

Performance Impact:

  • Events: 17% reduction in Test 1, 42% in Test 2
  • Celery Control: 7% reduction in Test 1 and Test 2

Key Findings

  1. Event-based monitoring introduces significant overhead (17-46% reduction)
  2. Celery Control adds minimal overhead (6-12% reduction)
  3. Threads pool generally outperforms prefork pool
  4. Temporary load scenarios (Test 2) show more pronounced performance differences

Multiprocessing Mode

Enable multiprocessing mode by setting:

This allows metrics aggregation across multiple worker processes.

Metrics Reference

Counters

celery_task_accepted_total
— Total number of accepted tasks.

Labels:

  • worker
    (string): Name of the worker that accepted the task
  • task
    (string): Name of the task that was accepted

celery_task_succeeded_total
— Total number of successfully completed tasks.

Labels:

  • worker
    (string): Name of the worker that processed the task
  • task
    (string): Name of the task that was executed

celery_task_failed_total
— Total number of tasks that failed with unhandled exceptions.

Labels:

  • worker
    (string): Name of the worker that processed the task
  • task
    (string): Name of the task that failed
  • exception
    (string): Type of exception that caused the failure

celery_task_retried_total
— Total number of task retry attempts.

Labels:

  • worker
    (string): Name of the worker that processed the task
  • task
    (string): Name of the task being retried
  • exception
    (string): Reason for retry (exception type)

celery_task_revoked_total
— Total number of revoked tasks.

Labels:

  • worker
    (string): Name of the worker that processed the task
  • task
    (string): Name of the revoked task
  • expired
    (boolean): Whether the task was revoked due to expiration
  • terminated
    (boolean): Whether the task was terminated via remote control

celery_task_unknown_total
— Total number of unknown tasks received.

Labels:

  • worker
    (string): Name of the worker that received the task
  • Note: No
    task
    label to prevent metric cardinality explosion

celery_task_rejected_total
— Total number of tasks rejected by
Reject
exception.

Labels:

  • worker
    (string): Name of the worker that rejected the task
  • task
    (string): Name of the rejected task
  • requeue
    (boolean): Whether the task was requeued after rejection

celery_message_rejected_total
— Total number of messages rejected due to unknown type.

Labels:

  • worker
    (string): Name of the worker that rejected the message

celery_task_internal_errors_total
— Total number of internal Celery errors.

Labels:

  • worker
    (string): Name of the worker that processed the task
  • task
    (string): Name of the task that failed
  • exception
    (string): Type of exception that caused the failure

celery_task_published_total
— Total number of tasks published to queue.

Labels:

  • task
    (string): Name of the published task
  • Note: No
    worker
    label as publication typically happens outside workers

Gauges

celery_worker_online
— Timestamp of when the worker was last online.

Labels:

  • worker
    (string): Name of the worker

celery_task_prefetched
— Number of tasks currently prefetched (waiting or active).

Labels:

  • worker
    (string): Name of the worker
  • task
    (string): Name of the reserved task

celery_task_active
— Number of tasks currently being executed.

Labels:

  • worker
    (string): Name of the worker
  • task
    (string): Name of the active task

celery_task_waiting
— Number of tasks currently being waited.

Labels:

  • worker
    (string): Name of the worker
  • task
    (string): Name of the waiting task

celery_task_scheduled
— Number of tasks scheduled for future execution.

Labels:

  • worker
    (string): Name of the worker
  • task
    (string): Name of the scheduled task

Histograms

celery_task_runtime_seconds
— Histogram of task execution times.

Labels:

  • worker
    (string): Name of the worker that executed the task
  • task
    (string): Name of the task

Default Buckets:

Histogram.DEFAULT_BUCKETS

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT