Kagi News RSS Aggregator#
A Python-based RSS aggregator that posts Kagi News stories to Coves communities using rich text formatting.
Overview#
This aggregator:
- Fetches RSS feeds from Kagi News daily via CRON
- Parses HTML descriptions to extract structured content (highlights, perspectives, sources)
- Formats posts using Coves rich text with facets (bold, italic, links)
- Thumbnails are automatically extracted by the server's unfurl service
- Posts to configured communities via XRPC
Project Structure#
aggregators/kagi-news/
├── src/
│ ├── models.py # Data models (KagiStory, Perspective, etc.)
│ ├── rss_fetcher.py # RSS feed fetching with retry logic
│ ├── html_parser.py # Parse Kagi HTML to structured data
│ ├── richtext_formatter.py # Format content with rich text facets (TODO)
│ ├── atproto_client.py # ATProto authentication and operations (TODO)
│ ├── state_manager.py # Deduplication state tracking (TODO)
│ ├── config.py # Configuration loading (TODO)
│ └── main.py # Entry point (TODO)
├── tests/
│ ├── test_rss_fetcher.py # RSS fetcher tests ✓
│ ├── test_html_parser.py # HTML parser tests ✓
│ └── fixtures/
│ ├── sample_rss_item.xml
│ └── world.xml
├── scripts/
│ └── setup.sh # Automated Coves registration script
├── Dockerfile # Docker image definition
├── docker-compose.yml # Docker Compose configuration
├── docker-entrypoint.sh # Container entrypoint script
├── .dockerignore # Docker build exclusions
├── requirements.txt # Python dependencies
├── config.example.yaml # Example configuration
├── .env.example # Environment variables template
├── crontab # CRON schedule
└── README.md
Registration with Coves#
Before running the aggregator, you must register it with a Coves instance. This creates a DID for your aggregator and registers it with Coves.
Quick Setup (Automated)#
The automated setup script handles the entire registration process:
cd scripts
chmod +x setup.sh
./setup.sh
This will:
- Create a PDS account for your aggregator (generates a DID)
- Generate
.well-known/atproto-didfile for domain verification - Pause for manual upload - you'll upload the file to your web server
- Register with Coves instance via XRPC
- Create service declaration record (indexed by Jetstream)
Manual step required: During the process, you'll need to upload the .well-known/atproto-did file to your domain so it's accessible at https://yourdomain.com/.well-known/atproto-did.
After completion, you'll have a kagi-aggregator-config.env file with:
- Aggregator DID and credentials
- Access/refresh JWTs
- Service declaration URI
Keep this file secure! It contains your aggregator's credentials.
Manual Setup (Step-by-step)#
Alternatively, use the generic setup scripts from the main Coves repo for more control:
# From the Coves project root
cd scripts/aggregator-setup
# Follow the 4-step process
./1-create-pds-account.sh
./2-setup-wellknown.sh
./3-register-with-coves.sh
./4-create-service-declaration.sh
See scripts/aggregator-setup/README.md for detailed documentation on each step.
What Happens During Registration?#
- PDS Account Creation: Your aggregator gets a
did:plc:...identifier - Domain Verification: Proves you control your aggregator's domain
- Coves Registration: Inserts your DID into the Coves instance's
userstable - Service Declaration: Creates a record that gets indexed into the
aggregatorstable - Ready for Authorization: Community moderators can now authorize your aggregator
Once registered and authorized by a community, your aggregator can post content.
Setup#
Prerequisites#
- Python 3.11+
- python3-venv package (
apt install python3.12-venv) - Completed registration (see above)
Installation#
-
Create virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt -
Copy configuration templates:
cp config.example.yaml config.yaml cp .env.example .env -
Edit
config.yamlto map RSS feeds to communities -
Set environment variables in
.env(aggregator DID and private key)
Running Tests#
# Activate virtual environment
source venv/bin/activate
# Run all tests
pytest -v
# Run specific test file
pytest tests/test_html_parser.py -v
# Run with coverage
pytest --cov=src --cov-report=html
Deployment#
Docker Deployment (Recommended for Production)#
The easiest way to deploy the Kagi aggregator is using Docker. The cron job runs inside the container automatically.
Prerequisites#
- Docker and Docker Compose installed
- Completed registration (you have
.envwith credentials) config.yamlconfigured with your feed mappings
Quick Start#
-
Configure your environment:
# Copy and edit configuration cp config.example.yaml config.yaml cp .env.example .env # Edit .env with your aggregator credentials nano .env -
Start the aggregator:
docker compose up -d -
View logs:
docker compose logs -f -
Stop the aggregator:
docker compose down
Configuration#
The docker-compose.yml file supports these environment variables:
AGGREGATOR_HANDLE(required): Your aggregator's handleAGGREGATOR_PASSWORD(required): Your aggregator's passwordCOVES_API_URL(optional): Override Coves API endpoint (defaults tohttps://api.coves.social)RUN_ON_STARTUP(optional): Set totrueto run immediately on container start (useful for testing)
Testing the Setup#
Run the aggregator immediately without waiting for cron:
# Run once and exit
docker compose run --rm kagi-aggregator python -m src.main
# Or set RUN_ON_STARTUP=true in .env and restart
docker compose restart
Production Deployment#
For production, consider:
-
Using Docker Secrets for credentials:
secrets: aggregator_credentials: file: ./secrets/aggregator.env -
Setting up log rotation (already configured in docker-compose.yml):
- Max size: 10MB per file
- Max files: 3
-
Monitoring health checks:
docker inspect --format='{{.State.Health.Status}}' kagi-news-aggregator -
Auto-restart on failure (already enabled with
restart: unless-stopped)
Viewing Cron Logs#
# Follow cron execution logs
docker compose logs -f kagi-aggregator
# View last 100 lines
docker compose logs --tail=100 kagi-aggregator
Updating the Aggregator#
# Pull latest code
git pull
# Rebuild and restart
docker compose up -d --build
Manual Deployment (Alternative)#
If you prefer running without Docker, use the traditional approach:
-
Install dependencies:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Configure crontab:
# Edit the crontab file with your paths # Then install it: crontab crontab -
Verify cron is running:
crontab -l
Development Status#
✅ Phase 1-2 Complete (Oct 24, 2025)#
- Project structure created
- Data models defined (KagiStory, Perspective, Quote, Source)
- RSS fetcher with retry logic and tests
- HTML parser extracting all sections (summary, highlights, perspectives, sources, quote, image)
- Test fixtures from real Kagi News feed
🚧 Next Steps (Phase 3-4)#
- Rich text formatter (convert to Coves format with facets)
- State manager for deduplication
- Configuration loader
- ATProto client for post creation
- Main orchestration script
- End-to-end tests
Configuration#
Edit config.yaml to define feed-to-community mappings:
coves_api_url: "https://api.coves.social"
feeds:
- name: "World News"
url: "https://news.kagi.com/world.xml"
community_handle: "world-news.coves.social"
enabled: true
- name: "Tech News"
url: "https://news.kagi.com/tech.xml"
community_handle: "tech.coves.social"
enabled: true
Architecture#
Data Flow#
Kagi RSS Feed
↓ (HTTP GET)
RSS Fetcher
↓ (feedparser)
Parsed RSS Items
↓ (for each item)
HTML Parser
↓ (BeautifulSoup)
Structured KagiStory
↓
Rich Text Formatter
↓ (with facets)
Post Record
↓ (XRPC)
Coves Community
Rich Text Format#
Posts use Coves rich text with UTF-8 byte-positioned facets:
{
"content": "Summary text...\n\nHighlights:\n• Point 1\n...",
"facets": [
{
"index": {"byteStart": 20, "byteEnd": 31},
"features": [{"$type": "social.coves.richtext.facet#bold"}]
},
{
"index": {"byteStart": 50, "byteEnd": 75},
"features": [{"$type": "social.coves.richtext.facet#link", "uri": "https://..."}]
}
]
}
License#
See parent Coves project license.