A community based topic aggregation platform built on atproto

Kagi News RSS Aggregator#

A Python-based RSS aggregator that posts Kagi News stories to Coves communities using rich text formatting.

Overview#

This aggregator:

  • Fetches RSS feeds from Kagi News daily via CRON
  • Parses HTML descriptions to extract structured content (highlights, perspectives, sources)
  • Formats posts using Coves rich text with facets (bold, italic, links)
  • Thumbnails are automatically extracted by the server's unfurl service
  • Posts to configured communities via XRPC

Project Structure#

aggregators/kagi-news/
├── src/
│   ├── models.py              # Data models (KagiStory, Perspective, etc.)
│   ├── rss_fetcher.py         # RSS feed fetching with retry logic
│   ├── html_parser.py         # Parse Kagi HTML to structured data
│   ├── richtext_formatter.py  # Format content with rich text facets (TODO)
│   ├── atproto_client.py      # ATProto authentication and operations (TODO)
│   ├── state_manager.py       # Deduplication state tracking (TODO)
│   ├── config.py              # Configuration loading (TODO)
│   └── main.py                # Entry point (TODO)
├── tests/
│   ├── test_rss_fetcher.py    # RSS fetcher tests ✓
│   ├── test_html_parser.py    # HTML parser tests ✓
│   └── fixtures/
│       ├── sample_rss_item.xml
│       └── world.xml
├── scripts/
│   └── setup.sh               # Automated Coves registration script
├── Dockerfile                 # Docker image definition
├── docker-compose.yml         # Docker Compose configuration
├── docker-entrypoint.sh       # Container entrypoint script
├── .dockerignore              # Docker build exclusions
├── requirements.txt           # Python dependencies
├── config.example.yaml        # Example configuration
├── .env.example               # Environment variables template
├── crontab                    # CRON schedule
└── README.md

Registration with Coves#

Before running the aggregator, you must register it with a Coves instance. This creates a DID for your aggregator and registers it with Coves.

Quick Setup (Automated)#

The automated setup script handles the entire registration process:

cd scripts
chmod +x setup.sh
./setup.sh

This will:

  1. Create a PDS account for your aggregator (generates a DID)
  2. Generate .well-known/atproto-did file for domain verification
  3. Pause for manual upload - you'll upload the file to your web server
  4. Register with Coves instance via XRPC
  5. Create service declaration record (indexed by Jetstream)

Manual step required: During the process, you'll need to upload the .well-known/atproto-did file to your domain so it's accessible at https://yourdomain.com/.well-known/atproto-did.

After completion, you'll have a kagi-aggregator-config.env file with:

  • Aggregator DID and credentials
  • Access/refresh JWTs
  • Service declaration URI

Keep this file secure! It contains your aggregator's credentials.

Manual Setup (Step-by-step)#

Alternatively, use the generic setup scripts from the main Coves repo for more control:

# From the Coves project root
cd scripts/aggregator-setup

# Follow the 4-step process
./1-create-pds-account.sh
./2-setup-wellknown.sh
./3-register-with-coves.sh
./4-create-service-declaration.sh

See scripts/aggregator-setup/README.md for detailed documentation on each step.

What Happens During Registration?#

  1. PDS Account Creation: Your aggregator gets a did:plc:... identifier
  2. Domain Verification: Proves you control your aggregator's domain
  3. Coves Registration: Inserts your DID into the Coves instance's users table
  4. Service Declaration: Creates a record that gets indexed into the aggregators table
  5. Ready for Authorization: Community moderators can now authorize your aggregator

Once registered and authorized by a community, your aggregator can post content.

Setup#

Prerequisites#

  • Python 3.11+
  • python3-venv package (apt install python3.12-venv)
  • Completed registration (see above)

Installation#

  1. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Copy configuration templates:

    cp config.example.yaml config.yaml
    cp .env.example .env
    
  4. Edit config.yaml to map RSS feeds to communities

  5. Set environment variables in .env (aggregator DID and private key)

Running Tests#

# Activate virtual environment
source venv/bin/activate

# Run all tests
pytest -v

# Run specific test file
pytest tests/test_html_parser.py -v

# Run with coverage
pytest --cov=src --cov-report=html

Deployment#

The easiest way to deploy the Kagi aggregator is using Docker. The cron job runs inside the container automatically.

Prerequisites#

  • Docker and Docker Compose installed
  • Completed registration (you have .env with credentials)
  • config.yaml configured with your feed mappings

Quick Start#

  1. Configure your environment:

    # Copy and edit configuration
    cp config.example.yaml config.yaml
    cp .env.example .env
    
    # Edit .env with your aggregator credentials
    nano .env
    
  2. Start the aggregator:

    docker compose up -d
    
  3. View logs:

    docker compose logs -f
    
  4. Stop the aggregator:

    docker compose down
    

Configuration#

The docker-compose.yml file supports these environment variables:

  • AGGREGATOR_HANDLE (required): Your aggregator's handle
  • AGGREGATOR_PASSWORD (required): Your aggregator's password
  • COVES_API_URL (optional): Override Coves API endpoint (defaults to https://api.coves.social)
  • RUN_ON_STARTUP (optional): Set to true to run immediately on container start (useful for testing)

Testing the Setup#

Run the aggregator immediately without waiting for cron:

# Run once and exit
docker compose run --rm kagi-aggregator python -m src.main

# Or set RUN_ON_STARTUP=true in .env and restart
docker compose restart

Production Deployment#

For production, consider:

  1. Using Docker Secrets for credentials:

    secrets:
      aggregator_credentials:
        file: ./secrets/aggregator.env
    
  2. Setting up log rotation (already configured in docker-compose.yml):

    • Max size: 10MB per file
    • Max files: 3
  3. Monitoring health checks:

    docker inspect --format='{{.State.Health.Status}}' kagi-news-aggregator
    
  4. Auto-restart on failure (already enabled with restart: unless-stopped)

Viewing Cron Logs#

# Follow cron execution logs
docker compose logs -f kagi-aggregator

# View last 100 lines
docker compose logs --tail=100 kagi-aggregator

Updating the Aggregator#

# Pull latest code
git pull

# Rebuild and restart
docker compose up -d --build

Manual Deployment (Alternative)#

If you prefer running without Docker, use the traditional approach:

  1. Install dependencies:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  2. Configure crontab:

    # Edit the crontab file with your paths
    # Then install it:
    crontab crontab
    
  3. Verify cron is running:

    crontab -l
    

Development Status#

✅ Phase 1-2 Complete (Oct 24, 2025)#

  • Project structure created
  • Data models defined (KagiStory, Perspective, Quote, Source)
  • RSS fetcher with retry logic and tests
  • HTML parser extracting all sections (summary, highlights, perspectives, sources, quote, image)
  • Test fixtures from real Kagi News feed

🚧 Next Steps (Phase 3-4)#

  • Rich text formatter (convert to Coves format with facets)
  • State manager for deduplication
  • Configuration loader
  • ATProto client for post creation
  • Main orchestration script
  • End-to-end tests

Configuration#

Edit config.yaml to define feed-to-community mappings:

coves_api_url: "https://api.coves.social"

feeds:
  - name: "World News"
    url: "https://news.kagi.com/world.xml"
    community_handle: "world-news.coves.social"
    enabled: true

  - name: "Tech News"
    url: "https://news.kagi.com/tech.xml"
    community_handle: "tech.coves.social"
    enabled: true

Architecture#

Data Flow#

Kagi RSS Feed
    ↓ (HTTP GET)
RSS Fetcher
    ↓ (feedparser)
Parsed RSS Items
    ↓ (for each item)
HTML Parser
    ↓ (BeautifulSoup)
Structured KagiStory
    ↓
Rich Text Formatter
    ↓ (with facets)
Post Record
    ↓ (XRPC)
Coves Community

Rich Text Format#

Posts use Coves rich text with UTF-8 byte-positioned facets:

{
  "content": "Summary text...\n\nHighlights:\n• Point 1\n...",
  "facets": [
    {
      "index": {"byteStart": 20, "byteEnd": 31},
      "features": [{"$type": "social.coves.richtext.facet#bold"}]
    },
    {
      "index": {"byteStart": 50, "byteEnd": 75},
      "features": [{"$type": "social.coves.richtext.facet#link", "uri": "https://..."}]
    }
  ]
}

License#

See parent Coves project license.