celery-control
Описание
Языки
- Python94,6%
- Shell3,8%
- Dockerfile0,9%
- Makefile0,7%
Celery Control
A Prometheus monitoring library for Celery that implements a pull-based approach to collect metrics directly from workers, providing better performance, scalability, and flexibility compared to traditional event-based monitoring solutions.
Features
- Pull-based metrics: Exposes a
endpoint for Prometheus scraping./metrics - Comprehensive task metrics:
- Successful tasks counter
- Error counter with exception labels
- Retry counter with reason labels
- Revoked tasks counter (expired/terminated)
- Unknown tasks counter
- Rejected tasks and messages counters
- Internal Celery errors counter
- Task runtime histogram
- Worker state metrics:
- Prefetched requests
- Active requests
- Waiting requests
- Scheduled tasks
- Worker health monitoring: Tracks worker online status.
- Task publication metrics: Tracks tasks published by clients, beat, or other services.
- Multiprocessing support: Compatible with both
andpreforkpools.threads - Django integration: Easy setup for Django + Celery + Beat projects.
Why Celery Control?
Traditional event-based monitoring (e.g., Flower) has limitations:
- Additional load on the worker (17-46% performance impact)
- Single point of failure
- Difficulty scaling horizontally
- Old metrics accumulation
Celery Control solves these by:
- Reducing worker load (only 6-12% performance impact)
- Enabling horizontal scaling
- Providing real-time internal state metrics
- Allowing application-level metrics (DB, cache, external services)
Explanation
Installation
Quick Start
For Celery Workers
For Task Publisher — Celery Beat
For Task Publisher — WSGI Server
Environment Variables
: Metrics server host (default:CELERY_CONTROL_SERVER_HOST)0.0.0.0: Metrics server port (default:CELERY_CONTROL_SERVER_PORT)5555: Disable compression (default:CELERY_CONTROL_SERVER_DISABLE_COMPRESSION)False: Daemon update interval (default:CELERY_CONTROL_TRACKER_DAEMON_INTERVAL)15
Performance
Celery Control introduces minimal overhead compared to event-based monitoring solutions. Below are the performance test results comparing different configurations.
All tests conducted on MacBook M1 Pro with:
- Log level: ERROR
- Gossip, mingle, heartbeat: disabled
- Prefetch multiplier: 1,000
- Concurrency: 1
Test 1: Constant Load Simulation
- Scenario: 100 batches of 1,000 tasks each (total: 100K tasks)
- Execution: Publisher sends all tasks to queue, then worker processes them
- Measurement: Median processing time per batch
- Purpose: Simulates continuous workload with always-available tasks
Test 2: Temporary Load Simulation
- Scenario: 10 batches of 1,000 tasks each, repeated 10 times (total: 100K tasks)
- Execution: Each iteration: publish → start worker → process → stop worker
- Measurement: Median of maximum processing times
- Purpose: Simulates burst workload
Prefork Pool Results
Test 1 - Constant Load (Median TPS):
┌────────────────────────────────────────┐
│ Default: 2097 tasks/second │
│ With Events: 1252 tasks/second │
│ With Celery Control: 1968 tasks/second │
└────────────────────────────────────────┘
Test 2 - Temporary Load (Median Max Time):
┌────────────────────────────────────────┐
│ Default: 3191 tasks/second │
│ With Events: 1720 tasks/second │
│ With Celery Control: 2816 tasks/second │
└────────────────────────────────────────┘

Performance Impact:
- Events: 40% reduction in Test 1, 46% in Test 2
- Celery Control: 6% reduction in Test 1, 12% in Test 2
Threads Pool Results
Test 1 - Constant Load (Median TPS):
┌────────────────────────────────────────┐
│ Default: 2452 tasks/second │
│ With Events: 2038 tasks/second │
│ With Celery Control: 2285 tasks/second │
└────────────────────────────────────────┘
Test 2 - Temporary Load (Median Max Time):
┌────────────────────────────────────────┐
│ Default: 3672 tasks/second │
│ With Events: 2135 tasks/second │
│ With Celery Control: 3400 tasks/second │
└────────────────────────────────────────┘

Performance Impact:
- Events: 17% reduction in Test 1, 42% in Test 2
- Celery Control: 7% reduction in Test 1 and Test 2
Key Findings
- Event-based monitoring introduces significant overhead (17-46% reduction)
- Celery Control adds minimal overhead (6-12% reduction)
- Threads pool generally outperforms prefork pool
- Temporary load scenarios (Test 2) show more pronounced performance differences
Multiprocessing Mode
Enable multiprocessing mode by setting:
This allows metrics aggregation across multiple worker processes.
Metrics Reference
Counters
— Total number of accepted tasks.
Labels:
(string): Name of the worker that accepted the taskworker(string): Name of the task that was acceptedtask
— Total number of successfully completed tasks.
Labels:
(string): Name of the worker that processed the taskworker(string): Name of the task that was executedtask
— Total number of tasks that failed with unhandled exceptions.
Labels:
(string): Name of the worker that processed the taskworker(string): Name of the task that failedtask(string): Type of exception that caused the failureexception
— Total number of task retry attempts.
Labels:
(string): Name of the worker that processed the taskworker(string): Name of the task being retriedtask(string): Reason for retry (exception type)exception
— Total number of revoked tasks.
Labels:
(string): Name of the worker that processed the taskworker(string): Name of the revoked tasktask(boolean): Whether the task was revoked due to expirationexpired(boolean): Whether the task was terminated via remote controlterminated
— Total number of unknown tasks received.
Labels:
(string): Name of the worker that received the taskworker- Note: No
label to prevent metric cardinality explosiontask
— Total number of tasks rejected by exception.
Labels:
(string): Name of the worker that rejected the taskworker(string): Name of the rejected tasktask(boolean): Whether the task was requeued after rejectionrequeue
— Total number of messages rejected due to unknown type.
Labels:
(string): Name of the worker that rejected the messageworker
— Total number of internal Celery errors.
Labels:
(string): Name of the worker that processed the taskworker(string): Name of the task that failedtask(string): Type of exception that caused the failureexception
— Total number of tasks published to queue.
Labels:
(string): Name of the published tasktask- Note: No
label as publication typically happens outside workersworker
Gauges
— Timestamp of when the worker was last online.
Labels:
(string): Name of the workerworker
— Number of tasks currently prefetched (waiting or active).
Labels:
(string): Name of the workerworker(string): Name of the reserved tasktask
— Number of tasks currently being executed.
Labels:
(string): Name of the workerworker(string): Name of the active tasktask
— Number of tasks currently being waited.
Labels:
(string): Name of the workerworker(string): Name of the waiting tasktask
— Number of tasks scheduled for future execution.
Labels:
(string): Name of the workerworker(string): Name of the scheduled tasktask
Histograms
— Histogram of task execution times.
Labels:
(string): Name of the worker that executed the taskworker(string): Name of the tasktask
Default Buckets:
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT