bmuseum_news_system

0

Описание

Система автоматизированного сбора, AI-обработки и публикации новостей театрального искусства в Telegram-канал

Языки

  • Python77,6%
  • HTML12,5%
  • Shell8,7%
  • Mako0,7%
  • Dockerfile0,5%
README.md

Bakhrushin Museum News — Telegram automation

Automated system for collecting theatre-related news from the internet, generating Telegram-ready summaries with AI (OpenAI-compatible provider polza.ai), moderating, and publishing to the Telegram channel @bakhrushinmuseum_news.

What it does

  • Collects news from RSS and HTML sources
  • Stores data in PostgreSQL (with Alembic migrations)
  • Generates summaries via polza.ai (OpenAI-compatible)
  • Provides a web moderation panel (FastAPI + Jinja2)
  • Publishes to Telegram via aiogram
  • Runs background jobs via Celery + Redis + Celery Beat

Pipeline:

Sources → Items → AI → Posts (pending) → Approve/Schedule → Telegram

Requirements

  • Docker + Docker Compose (Compose v2)
  • Ubuntu 22.04 / 24.04 recommended

Quick start

1) Configure environment

Fill at least:

  • TELEGRAM_BOT_TOKEN
  • TELEGRAM_CHANNEL
    (default:
    @bakhrushinmuseum_news
    )
  • POLZA_API_KEY

2) Run services

3) Initialize database

4) Seed initial sources

5) Open web panel

Telegram setup

  1. Create a bot via @BotFather and set
    TELEGRAM_BOT_TOKEN
    .
  2. Add the bot as admin to the channel @bakhrushinmuseum_news with posting rights.
  3. Set
    TELEGRAM_CHANNEL=@bakhrushinmuseum_news
    .

Moderation & publishing

Statuses:

  • pending
    — waiting for moderation
  • approved
    publish ASAP (publisher will pick it up)
  • scheduled
    — publish at
    scheduled_at
  • posted
    — published successfully
  • failed
    — publishing failed (see
    editor_notes
    )

Approve publishes immediately

  • Open a post → optionally edit text → Approve

Schedule uses Moscow time (MSK)

  • Open a post → set
    scheduled_at
    in MSK format
    YYYY-MM-DD HH:MM
    Approve + Schedule
  • DB stores schedule in UTC; UI accepts and displays MSK.

Sources management

Open: http://localhost:8000/sources

You can:

  • add new sources (type:
    rss
    /
    html
    )
  • edit
    parser_config
    (JSON)
  • enable/disable sources
  • delete sources
  • Test a source URL (stores
    last_status_code
    ,
    last_checked_at
    ,
    last_error
    )

When a source returns 401/403

Some sites block bots. Use the Test button and disable such sources, or switch to a different URL/feed.

Cutoff filters (avoid old news)

To prevent collecting or AI-processing old items, use sliding window cutoffs:

Rules:

  • NEWS_NOT_BEFORE_DAYS
    : collector skips items older than N days if the item date is known.
  • AI_NOT_BEFORE_DAYS
    : AI step skips items older than N days if the item date is known.
  • Items without a date are not blocked by cutoffs.

Collector and AI steps log skipped items as:

  • [cutoff] skipped by NEWS_NOT_BEFORE ...
  • [cutoff] skipped by AI_NOT_BEFORE ...

Operations

Check services

Logs

Run jobs manually

Backup

Dump database

Restore

Troubleshooting

"Received unregistered task" in celery_worker

This means the worker did not import tasks. In this project it is fixed by importing

app.workers.tasks
in
app/workers/celery_app.py
.

Restart:

Port 6379 already in use

Redis port conflict on host. Either stop local Redis, or remove port mapping for Redis in

docker-compose.yml
.

Alembic errors about sqlalchemy.url

This project uses

migrations/env.py
to load
DATABASE_URL
from
.env
and sets
configuration["sqlalchemy.url"]
at runtime.

Project structure

  • app/api/
    — web routes
  • app/templates/
    — Jinja2 templates
  • app/collectors/
    — RSS/HTML collection
  • app/parsers/
    — text extraction
  • app/ai/
    — AI summarization
  • app/workers/
    — Celery tasks
  • app/tg/
    — Telegram sender
  • migrations/
    — Alembic migrations