Kagi News RSS Aggregator#

A Python-based RSS aggregator that posts Kagi News stories to Coves communities using rich text formatting.

Overview#

This aggregator:

Fetches RSS feeds from Kagi News daily via CRON
Parses HTML descriptions to extract structured content (highlights, perspectives, sources)
Formats posts using Coves rich text with facets (bold, italic, links)
Thumbnails are automatically extracted by the server's unfurl service
Posts to configured communities via XRPC

Project Structure#

aggregators/kagi-news/
├── src/
│   ├── models.py              # Data models (KagiStory, Perspective, etc.)
│   ├── rss_fetcher.py         # RSS feed fetching with retry logic
│   ├── html_parser.py         # Parse Kagi HTML to structured data
│   ├── richtext_formatter.py  # Format content with rich text facets (TODO)
│   ├── atproto_client.py      # ATProto authentication and operations (TODO)
│   ├── state_manager.py       # Deduplication state tracking (TODO)
│   ├── config.py              # Configuration loading (TODO)
│   └── main.py                # Entry point (TODO)
├── tests/
│   ├── test_rss_fetcher.py    # RSS fetcher tests ✓
│   ├── test_html_parser.py    # HTML parser tests ✓
│   └── fixtures/
│       ├── sample_rss_item.xml
│       └── world.xml
├── scripts/
│   └── setup.sh               # Automated Coves registration script
├── Dockerfile                 # Docker image definition
├── docker-compose.yml         # Docker Compose configuration
├── docker-entrypoint.sh       # Container entrypoint script
├── .dockerignore              # Docker build exclusions
├── requirements.txt           # Python dependencies
├── config.example.yaml        # Example configuration
├── .env.example               # Environment variables template
├── crontab                    # CRON schedule
└── README.md

Registration with Coves#

Before running the aggregator, you must register it with a Coves instance. This creates a DID for your aggregator and registers it with Coves.

Quick Setup (Automated)#

The automated setup script handles the entire registration process:

cd scripts
chmod +x setup.sh
./setup.sh

This will:

Create a PDS account for your aggregator (generates a DID)
Generate .well-known/atproto-did file for domain verification
Pause for manual upload - you'll upload the file to your web server
Register with Coves instance via XRPC
Create service declaration record (indexed by Jetstream)

Manual step required: During the process, you'll need to upload the .well-known/atproto-did file to your domain so it's accessible at https://yourdomain.com/.well-known/atproto-did.

After completion, you'll have a kagi-aggregator-config.env file with:

Aggregator DID and credentials
Access/refresh JWTs
Service declaration URI

Keep this file secure! It contains your aggregator's credentials.

Manual Setup (Step-by-step)#

Alternatively, use the generic setup scripts from the main Coves repo for more control:

# From the Coves project root
cd scripts/aggregator-setup

# Follow the 4-step process
./1-create-pds-account.sh
./2-setup-wellknown.sh
./3-register-with-coves.sh
./4-create-service-declaration.sh

See scripts/aggregator-setup/README.md for detailed documentation on each step.

What Happens During Registration?#

PDS Account Creation: Your aggregator gets a did:plc:... identifier
Domain Verification: Proves you control your aggregator's domain
Coves Registration: Inserts your DID into the Coves instance's users table
Service Declaration: Creates a record that gets indexed into the aggregators table
Ready for Authorization: Community moderators can now authorize your aggregator

Once registered and authorized by a community, your aggregator can post content.

Setup#

Prerequisites#

Python 3.11+
python3-venv package (apt install python3.12-venv)
Completed registration (see above)

Installation#

Create virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Copy configuration templates:

cp config.example.yaml config.yaml
cp .env.example .env

Edit config.yaml to map RSS feeds to communities
Set environment variables in .env (aggregator DID and private key)

Running Tests#

# Activate virtual environment
source venv/bin/activate

# Run all tests
pytest -v

# Run specific test file
pytest tests/test_html_parser.py -v

# Run with coverage
pytest --cov=src --cov-report=html

Deployment#

Docker Deployment (Recommended for Production)#

The easiest way to deploy the Kagi aggregator is using Docker. The cron job runs inside the container automatically.

Prerequisites#

Docker and Docker Compose installed
Completed registration (you have .env with credentials)
config.yaml configured with your feed mappings

Quick Start#

Configure your environment:

# Copy and edit configuration
cp config.example.yaml config.yaml
cp .env.example .env

# Edit .env with your aggregator credentials
nano .env

Start the aggregator:
```
docker compose up -d
```
View logs:
```
docker compose logs -f
```
Stop the aggregator:
```
docker compose down
```

Configuration#

The docker-compose.yml file supports these environment variables:

AGGREGATOR_HANDLE (required): Your aggregator's handle
AGGREGATOR_PASSWORD (required): Your aggregator's password
COVES_API_URL (optional): Override Coves API endpoint (defaults to https://api.coves.social)
RUN_ON_STARTUP (optional): Set to true to run immediately on container start (useful for testing)

Testing the Setup#

Run the aggregator immediately without waiting for cron:

# Run once and exit
docker compose run --rm kagi-aggregator python -m src.main

# Or set RUN_ON_STARTUP=true in .env and restart
docker compose restart

Production Deployment#

For production, consider:

Using Docker Secrets for credentials:

secrets:
  aggregator_credentials:
    file: ./secrets/aggregator.env

Setting up log rotation (already configured in docker-compose.yml):
- Max size: 10MB per file
- Max files: 3

Monitoring health checks:

docker inspect --format='{{.State.Health.Status}}' kagi-news-aggregator

Auto-restart on failure (already enabled with restart: unless-stopped)

Viewing Cron Logs#

# Follow cron execution logs
docker compose logs -f kagi-aggregator

# View last 100 lines
docker compose logs --tail=100 kagi-aggregator

Updating the Aggregator#

# Pull latest code
git pull

# Rebuild and restart
docker compose up -d --build

Manual Deployment (Alternative)#

If you prefer running without Docker, use the traditional approach:

Install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure crontab:

# Edit the crontab file with your paths
# Then install it:
crontab crontab

Verify cron is running:
```
crontab -l
```

Development Status#

✅ Phase 1-2 Complete (Oct 24, 2025)#

Project structure created
Data models defined (KagiStory, Perspective, Quote, Source)
RSS fetcher with retry logic and tests
HTML parser extracting all sections (summary, highlights, perspectives, sources, quote, image)
Test fixtures from real Kagi News feed

🚧 Next Steps (Phase 3-4)#

Rich text formatter (convert to Coves format with facets)
State manager for deduplication
Configuration loader
ATProto client for post creation
Main orchestration script
End-to-end tests

Configuration#

Edit config.yaml to define feed-to-community mappings:

coves_api_url: "https://api.coves.social"

feeds:
  - name: "World News"
    url: "https://news.kagi.com/world.xml"
    community_handle: "world-news.coves.social"
    enabled: true

  - name: "Tech News"
    url: "https://news.kagi.com/tech.xml"
    community_handle: "tech.coves.social"
    enabled: true

Architecture#

Data Flow#

Kagi RSS Feed
    ↓ (HTTP GET)
RSS Fetcher
    ↓ (feedparser)
Parsed RSS Items
    ↓ (for each item)
HTML Parser
    ↓ (BeautifulSoup)
Structured KagiStory
    ↓
Rich Text Formatter
    ↓ (with facets)
Post Record
    ↓ (XRPC)
Coves Community

Rich Text Format#

Posts use Coves rich text with UTF-8 byte-positioned facets:

{
  "content": "Summary text...\n\nHighlights:\n• Point 1\n...",
  "facets": [
    {
      "index": {"byteStart": 20, "byteEnd": 31},
      "features": [{"$type": "social.coves.richtext.facet#bold"}]
    },
    {
      "index": {"byteStart": 50, "byteEnd": 75},
      "features": [{"$type": "social.coves.richtext.facet#link", "uri": "https://..."}]
    }
  ]
}

License#

See parent Coves project license.

Kagi News RSS Aggregator#

Overview#

Project Structure#

Registration with Coves#

Quick Setup (Automated)#

Manual Setup (Step-by-step)#

What Happens During Registration?#

Setup#

Prerequisites#

Installation#

Running Tests#

Deployment#

Docker Deployment (Recommended for Production)#

Prerequisites#

Quick Start#

Configuration#

Testing the Setup#

Production Deployment#

Viewing Cron Logs#

Updating the Aggregator#

Manual Deployment (Alternative)#

Development Status#

✅ Phase 1-2 Complete (Oct 24, 2025)#

🚧 Next Steps (Phase 3-4)#

Configuration#

Architecture#

Data Flow#

Rich Text Format#

License#

Related Documentation#