bmuseum_news_system
Описание
Система автоматизированного сбора, AI-обработки и публикации новостей театрального искусства в Telegram-канал
Языки
- Python77,6%
- HTML12,5%
- Shell8,7%
- Mako0,7%
- Dockerfile0,5%
Bakhrushin Museum News — Telegram automation
Automated system for collecting theatre-related news from the internet, generating Telegram-ready summaries with AI (OpenAI-compatible provider polza.ai), moderating, and publishing to the Telegram channel @bakhrushinmuseum_news.
What it does
- Collects news from RSS and HTML sources
- Stores data in PostgreSQL (with Alembic migrations)
- Generates summaries via polza.ai (OpenAI-compatible)
- Provides a web moderation panel (FastAPI + Jinja2)
- Publishes to Telegram via aiogram
- Runs background jobs via Celery + Redis + Celery Beat
Pipeline:
Requirements
- Docker + Docker Compose (Compose v2)
- Ubuntu 22.04 / 24.04 recommended
Quick start
1) Configure environment
Fill at least:
- TELEGRAM_BOT_TOKEN
(default:TELEGRAM_CHANNEL)@bakhrushinmuseum_news- POLZA_API_KEY
2) Run services
3) Initialize database
4) Seed initial sources
5) Open web panel
- Posts moderation: http://localhost:8000/
- Sources management: http://localhost:8000/sources
Telegram setup
- Create a bot via @BotFather and set
.TELEGRAM_BOT_TOKEN - Add the bot as admin to the channel @bakhrushinmuseum_news with posting rights.
- Set
.TELEGRAM_CHANNEL=@bakhrushinmuseum_news
Moderation & publishing
Statuses:
— waiting for moderationpending— publish ASAP (publisher will pick it up)approved— publish atscheduledscheduled_at— published successfullyposted— publishing failed (seefailed)editor_notes
Approve publishes immediately
- Open a post → optionally edit text → Approve
Schedule uses Moscow time (MSK)
- Open a post → set
in MSK formatscheduled_at→ Approve + ScheduleYYYY-MM-DD HH:MM - DB stores schedule in UTC; UI accepts and displays MSK.
Sources management
Open: http://localhost:8000/sources
You can:
- add new sources (type:
/rss)html - edit
(JSON)parser_config - enable/disable sources
- delete sources
- Test a source URL (stores
,last_status_code,last_checked_at)last_error
When a source returns 401/403
Some sites block bots. Use the Test button and disable such sources, or switch to a different URL/feed.
Cutoff filters (avoid old news)
To prevent collecting or AI-processing old items, use sliding window cutoffs:
Rules:
: collector skips items older than N days if the item date is known.NEWS_NOT_BEFORE_DAYS: AI step skips items older than N days if the item date is known.AI_NOT_BEFORE_DAYS- Items without a date are not blocked by cutoffs.
Collector and AI steps log skipped items as:
- [cutoff] skipped by NEWS_NOT_BEFORE ...
- [cutoff] skipped by AI_NOT_BEFORE ...
Operations
Check services
Logs
Run jobs manually
Backup
Dump database
Restore
Troubleshooting
"Received unregistered task" in celery_worker
This means the worker did not import tasks. In this project it is fixed by importing in .
Restart:
Port 6379 already in use
Redis port conflict on host. Either stop local Redis, or remove port mapping for Redis in .
Alembic errors about sqlalchemy.url
This project uses to load from and sets at runtime.
Project structure
— web routesapp/api/— Jinja2 templatesapp/templates/— RSS/HTML collectionapp/collectors/— text extractionapp/parsers/— AI summarizationapp/ai/— Celery tasksapp/workers/— Telegram senderapp/tg/— Alembic migrationsmigrations/