A community based topic aggregation platform built on atproto
1# Kagi News RSS Aggregator 2 3A Python-based RSS aggregator that posts Kagi News stories to Coves communities using rich text formatting. 4 5## Overview 6 7This aggregator: 8- Fetches RSS feeds from Kagi News daily via CRON 9- Parses HTML descriptions to extract structured content (highlights, perspectives, sources) 10- Formats posts using Coves rich text with facets (bold, italic, links) 11- Thumbnails are automatically extracted by the server's unfurl service 12- Posts to configured communities via XRPC 13 14## Project Structure 15 16``` 17aggregators/kagi-news/ 18├── src/ 19│ ├── models.py # Data models (KagiStory, Perspective, etc.) 20│ ├── rss_fetcher.py # RSS feed fetching with retry logic 21│ ├── html_parser.py # Parse Kagi HTML to structured data 22│ ├── richtext_formatter.py # Format content with rich text facets (TODO) 23│ ├── atproto_client.py # ATProto authentication and operations (TODO) 24│ ├── state_manager.py # Deduplication state tracking (TODO) 25│ ├── config.py # Configuration loading (TODO) 26│ └── main.py # Entry point (TODO) 27├── tests/ 28│ ├── test_rss_fetcher.py # RSS fetcher tests ✓ 29│ ├── test_html_parser.py # HTML parser tests ✓ 30│ └── fixtures/ 31│ ├── sample_rss_item.xml 32│ └── world.xml 33├── scripts/ 34│ └── setup.sh # Automated Coves registration script 35├── Dockerfile # Docker image definition 36├── docker-compose.yml # Docker Compose configuration 37├── docker-entrypoint.sh # Container entrypoint script 38├── .dockerignore # Docker build exclusions 39├── requirements.txt # Python dependencies 40├── config.example.yaml # Example configuration 41├── .env.example # Environment variables template 42├── crontab # CRON schedule 43└── README.md 44``` 45 46## Registration with Coves 47 48Before running the aggregator, you must register it with a Coves instance. This creates a DID for your aggregator and registers it with Coves. 49 50### Quick Setup (Automated) 51 52The automated setup script handles the entire registration process: 53 54```bash 55cd scripts 56chmod +x setup.sh 57./setup.sh 58``` 59 60This will: 611. **Create a PDS account** for your aggregator (generates a DID) 622. **Generate `.well-known/atproto-did`** file for domain verification 633. **Pause for manual upload** - you'll upload the file to your web server 644. **Register with Coves** instance via XRPC 655. **Create service declaration** record (indexed by Jetstream) 66 67**Manual step required:** During the process, you'll need to upload the `.well-known/atproto-did` file to your domain so it's accessible at `https://yourdomain.com/.well-known/atproto-did`. 68 69After completion, you'll have a `kagi-aggregator-config.env` file with: 70- Aggregator DID and credentials 71- Access/refresh JWTs 72- Service declaration URI 73 74**Keep this file secure!** It contains your aggregator's credentials. 75 76### Manual Setup (Step-by-step) 77 78Alternatively, use the generic setup scripts from the main Coves repo for more control: 79 80```bash 81# From the Coves project root 82cd scripts/aggregator-setup 83 84# Follow the 4-step process 85./1-create-pds-account.sh 86./2-setup-wellknown.sh 87./3-register-with-coves.sh 88./4-create-service-declaration.sh 89``` 90 91See [scripts/aggregator-setup/README.md](../../scripts/aggregator-setup/README.md) for detailed documentation on each step. 92 93### What Happens During Registration? 94 951. **PDS Account Creation**: Your aggregator gets a `did:plc:...` identifier 962. **Domain Verification**: Proves you control your aggregator's domain 973. **Coves Registration**: Inserts your DID into the Coves instance's `users` table 984. **Service Declaration**: Creates a record that gets indexed into the `aggregators` table 995. **Ready for Authorization**: Community moderators can now authorize your aggregator 100 101Once registered and authorized by a community, your aggregator can post content. 102 103## Setup 104 105### Prerequisites 106 107- Python 3.11+ 108- python3-venv package (`apt install python3.12-venv`) 109- **Completed registration** (see above) 110 111### Installation 112 1131. Create virtual environment: 114 ```bash 115 python3 -m venv venv 116 source venv/bin/activate 117 ``` 118 1192. Install dependencies: 120 ```bash 121 pip install -r requirements.txt 122 ``` 123 1243. Copy configuration templates: 125 ```bash 126 cp config.example.yaml config.yaml 127 cp .env.example .env 128 ``` 129 1304. Edit `config.yaml` to map RSS feeds to communities 1315. Set environment variables in `.env` (aggregator DID and private key) 132 133## Running Tests 134 135```bash 136# Activate virtual environment 137source venv/bin/activate 138 139# Run all tests 140pytest -v 141 142# Run specific test file 143pytest tests/test_html_parser.py -v 144 145# Run with coverage 146pytest --cov=src --cov-report=html 147``` 148 149## Deployment 150 151### Docker Deployment (Recommended for Production) 152 153The easiest way to deploy the Kagi aggregator is using Docker. The cron job runs inside the container automatically. 154 155#### Prerequisites 156 157- Docker and Docker Compose installed 158- Completed registration (you have `.env` with credentials) 159- `config.yaml` configured with your feed mappings 160 161#### Quick Start 162 1631. **Configure your environment:** 164 ```bash 165 # Copy and edit configuration 166 cp config.example.yaml config.yaml 167 cp .env.example .env 168 169 # Edit .env with your aggregator credentials 170 nano .env 171 ``` 172 1732. **Start the aggregator:** 174 ```bash 175 docker compose up -d 176 ``` 177 1783. **View logs:** 179 ```bash 180 docker compose logs -f 181 ``` 182 1834. **Stop the aggregator:** 184 ```bash 185 docker compose down 186 ``` 187 188#### Configuration 189 190The `docker-compose.yml` file supports these environment variables: 191 192- **`AGGREGATOR_HANDLE`** (required): Your aggregator's handle 193- **`AGGREGATOR_PASSWORD`** (required): Your aggregator's password 194- **`COVES_API_URL`** (optional): Override Coves API endpoint (defaults to `https://api.coves.social`) 195- **`RUN_ON_STARTUP`** (optional): Set to `true` to run immediately on container start (useful for testing) 196 197#### Testing the Setup 198 199Run the aggregator immediately without waiting for cron: 200 201```bash 202# Run once and exit 203docker compose run --rm kagi-aggregator python -m src.main 204 205# Or set RUN_ON_STARTUP=true in .env and restart 206docker compose restart 207``` 208 209#### Production Deployment 210 211For production, consider: 212 2131. **Using Docker Secrets** for credentials: 214 ```yaml 215 secrets: 216 aggregator_credentials: 217 file: ./secrets/aggregator.env 218 ``` 219 2202. **Setting up log rotation** (already configured in docker-compose.yml): 221 - Max size: 10MB per file 222 - Max files: 3 223 2243. **Monitoring health checks:** 225 ```bash 226 docker inspect --format='{{.State.Health.Status}}' kagi-news-aggregator 227 ``` 228 2294. **Auto-restart on failure** (already enabled with `restart: unless-stopped`) 230 231#### Viewing Cron Logs 232 233```bash 234# Follow cron execution logs 235docker compose logs -f kagi-aggregator 236 237# View last 100 lines 238docker compose logs --tail=100 kagi-aggregator 239``` 240 241#### Updating the Aggregator 242 243```bash 244# Pull latest code 245git pull 246 247# Rebuild and restart 248docker compose up -d --build 249``` 250 251### Manual Deployment (Alternative) 252 253If you prefer running without Docker, use the traditional approach: 254 2551. **Install dependencies:** 256 ```bash 257 python3 -m venv venv 258 source venv/bin/activate 259 pip install -r requirements.txt 260 ``` 261 2622. **Configure crontab:** 263 ```bash 264 # Edit the crontab file with your paths 265 # Then install it: 266 crontab crontab 267 ``` 268 2693. **Verify cron is running:** 270 ```bash 271 crontab -l 272 ``` 273 274## Development Status 275 276### ✅ Phase 1-2 Complete (Oct 24, 2025) 277- [x] Project structure created 278- [x] Data models defined (KagiStory, Perspective, Quote, Source) 279- [x] RSS fetcher with retry logic and tests 280- [x] HTML parser extracting all sections (summary, highlights, perspectives, sources, quote, image) 281- [x] Test fixtures from real Kagi News feed 282 283### 🚧 Next Steps (Phase 3-4) 284- [ ] Rich text formatter (convert to Coves format with facets) 285- [ ] State manager for deduplication 286- [ ] Configuration loader 287- [ ] ATProto client for post creation 288- [ ] Main orchestration script 289- [ ] End-to-end tests 290 291## Configuration 292 293Edit `config.yaml` to define feed-to-community mappings: 294 295```yaml 296coves_api_url: "https://api.coves.social" 297 298feeds: 299 - name: "World News" 300 url: "https://news.kagi.com/world.xml" 301 community_handle: "world-news.coves.social" 302 enabled: true 303 304 - name: "Tech News" 305 url: "https://news.kagi.com/tech.xml" 306 community_handle: "tech.coves.social" 307 enabled: true 308``` 309 310## Architecture 311 312### Data Flow 313 314``` 315Kagi RSS Feed 316 ↓ (HTTP GET) 317RSS Fetcher 318 ↓ (feedparser) 319Parsed RSS Items 320 ↓ (for each item) 321HTML Parser 322 ↓ (BeautifulSoup) 323Structured KagiStory 324325Rich Text Formatter 326 ↓ (with facets) 327Post Record 328 ↓ (XRPC) 329Coves Community 330``` 331 332### Rich Text Format 333 334Posts use Coves rich text with UTF-8 byte-positioned facets: 335 336```python 337{ 338 "content": "Summary text...\n\nHighlights:\n• Point 1\n...", 339 "facets": [ 340 { 341 "index": {"byteStart": 20, "byteEnd": 31}, 342 "features": [{"$type": "social.coves.richtext.facet#bold"}] 343 }, 344 { 345 "index": {"byteStart": 50, "byteEnd": 75}, 346 "features": [{"$type": "social.coves.richtext.facet#link", "uri": "https://..."}] 347 } 348 ] 349} 350``` 351 352## License 353 354See parent Coves project license. 355 356## Related Documentation 357 358- [PRD: Kagi News Aggregator](../../docs/aggregators/PRD_KAGI_NEWS_RSS.md) 359- [PRD: Aggregator System](../../docs/aggregators/PRD_AGGREGATORS.md) 360- [Coves Rich Text Lexicon](../../internal/atproto/lexicon/social/coves/richtext/README.md)