A community based topic aggregation platform built on atproto
1# Kagi News RSS Aggregator
2
3A Python-based RSS aggregator that posts Kagi News stories to Coves communities using rich text formatting.
4
5## Overview
6
7This aggregator:
8- Fetches RSS feeds from Kagi News daily via CRON
9- Parses HTML descriptions to extract structured content (highlights, perspectives, sources)
10- Formats posts using Coves rich text with facets (bold, italic, links)
11- Thumbnails are automatically extracted by the server's unfurl service
12- Posts to configured communities via XRPC
13
14## Project Structure
15
16```
17aggregators/kagi-news/
18├── src/
19│ ├── models.py # Data models (KagiStory, Perspective, etc.)
20│ ├── rss_fetcher.py # RSS feed fetching with retry logic
21│ ├── html_parser.py # Parse Kagi HTML to structured data
22│ ├── richtext_formatter.py # Format content with rich text facets (TODO)
23│ ├── atproto_client.py # ATProto authentication and operations (TODO)
24│ ├── state_manager.py # Deduplication state tracking (TODO)
25│ ├── config.py # Configuration loading (TODO)
26│ └── main.py # Entry point (TODO)
27├── tests/
28│ ├── test_rss_fetcher.py # RSS fetcher tests ✓
29│ ├── test_html_parser.py # HTML parser tests ✓
30│ └── fixtures/
31│ ├── sample_rss_item.xml
32│ └── world.xml
33├── scripts/
34│ └── setup.sh # Automated Coves registration script
35├── Dockerfile # Docker image definition
36├── docker-compose.yml # Docker Compose configuration
37├── docker-entrypoint.sh # Container entrypoint script
38├── .dockerignore # Docker build exclusions
39├── requirements.txt # Python dependencies
40├── config.example.yaml # Example configuration
41├── .env.example # Environment variables template
42├── crontab # CRON schedule
43└── README.md
44```
45
46## Registration with Coves
47
48Before running the aggregator, you must register it with a Coves instance. This creates a DID for your aggregator and registers it with Coves.
49
50### Quick Setup (Automated)
51
52The automated setup script handles the entire registration process:
53
54```bash
55cd scripts
56chmod +x setup.sh
57./setup.sh
58```
59
60This will:
611. **Create a PDS account** for your aggregator (generates a DID)
622. **Generate `.well-known/atproto-did`** file for domain verification
633. **Pause for manual upload** - you'll upload the file to your web server
644. **Register with Coves** instance via XRPC
655. **Create service declaration** record (indexed by Jetstream)
66
67**Manual step required:** During the process, you'll need to upload the `.well-known/atproto-did` file to your domain so it's accessible at `https://yourdomain.com/.well-known/atproto-did`.
68
69After completion, you'll have a `kagi-aggregator-config.env` file with:
70- Aggregator DID and credentials
71- Access/refresh JWTs
72- Service declaration URI
73
74**Keep this file secure!** It contains your aggregator's credentials.
75
76### Manual Setup (Step-by-step)
77
78Alternatively, use the generic setup scripts from the main Coves repo for more control:
79
80```bash
81# From the Coves project root
82cd scripts/aggregator-setup
83
84# Follow the 4-step process
85./1-create-pds-account.sh
86./2-setup-wellknown.sh
87./3-register-with-coves.sh
88./4-create-service-declaration.sh
89```
90
91See [scripts/aggregator-setup/README.md](../../scripts/aggregator-setup/README.md) for detailed documentation on each step.
92
93### What Happens During Registration?
94
951. **PDS Account Creation**: Your aggregator gets a `did:plc:...` identifier
962. **Domain Verification**: Proves you control your aggregator's domain
973. **Coves Registration**: Inserts your DID into the Coves instance's `users` table
984. **Service Declaration**: Creates a record that gets indexed into the `aggregators` table
995. **Ready for Authorization**: Community moderators can now authorize your aggregator
100
101Once registered and authorized by a community, your aggregator can post content.
102
103## Setup
104
105### Prerequisites
106
107- Python 3.11+
108- python3-venv package (`apt install python3.12-venv`)
109- **Completed registration** (see above)
110
111### Installation
112
1131. Create virtual environment:
114 ```bash
115 python3 -m venv venv
116 source venv/bin/activate
117 ```
118
1192. Install dependencies:
120 ```bash
121 pip install -r requirements.txt
122 ```
123
1243. Copy configuration templates:
125 ```bash
126 cp config.example.yaml config.yaml
127 cp .env.example .env
128 ```
129
1304. Edit `config.yaml` to map RSS feeds to communities
1315. Set environment variables in `.env` (aggregator DID and private key)
132
133## Running Tests
134
135```bash
136# Activate virtual environment
137source venv/bin/activate
138
139# Run all tests
140pytest -v
141
142# Run specific test file
143pytest tests/test_html_parser.py -v
144
145# Run with coverage
146pytest --cov=src --cov-report=html
147```
148
149## Deployment
150
151### Docker Deployment (Recommended for Production)
152
153The easiest way to deploy the Kagi aggregator is using Docker. The cron job runs inside the container automatically.
154
155#### Prerequisites
156
157- Docker and Docker Compose installed
158- Completed registration (you have `.env` with credentials)
159- `config.yaml` configured with your feed mappings
160
161#### Quick Start
162
1631. **Configure your environment:**
164 ```bash
165 # Copy and edit configuration
166 cp config.example.yaml config.yaml
167 cp .env.example .env
168
169 # Edit .env with your aggregator credentials
170 nano .env
171 ```
172
1732. **Start the aggregator:**
174 ```bash
175 docker compose up -d
176 ```
177
1783. **View logs:**
179 ```bash
180 docker compose logs -f
181 ```
182
1834. **Stop the aggregator:**
184 ```bash
185 docker compose down
186 ```
187
188#### Configuration
189
190The `docker-compose.yml` file supports these environment variables:
191
192- **`AGGREGATOR_HANDLE`** (required): Your aggregator's handle
193- **`AGGREGATOR_PASSWORD`** (required): Your aggregator's password
194- **`COVES_API_URL`** (optional): Override Coves API endpoint (defaults to `https://api.coves.social`)
195- **`RUN_ON_STARTUP`** (optional): Set to `true` to run immediately on container start (useful for testing)
196
197#### Testing the Setup
198
199Run the aggregator immediately without waiting for cron:
200
201```bash
202# Run once and exit
203docker compose run --rm kagi-aggregator python -m src.main
204
205# Or set RUN_ON_STARTUP=true in .env and restart
206docker compose restart
207```
208
209#### Production Deployment
210
211For production, consider:
212
2131. **Using Docker Secrets** for credentials:
214 ```yaml
215 secrets:
216 aggregator_credentials:
217 file: ./secrets/aggregator.env
218 ```
219
2202. **Setting up log rotation** (already configured in docker-compose.yml):
221 - Max size: 10MB per file
222 - Max files: 3
223
2243. **Monitoring health checks:**
225 ```bash
226 docker inspect --format='{{.State.Health.Status}}' kagi-news-aggregator
227 ```
228
2294. **Auto-restart on failure** (already enabled with `restart: unless-stopped`)
230
231#### Viewing Cron Logs
232
233```bash
234# Follow cron execution logs
235docker compose logs -f kagi-aggregator
236
237# View last 100 lines
238docker compose logs --tail=100 kagi-aggregator
239```
240
241#### Updating the Aggregator
242
243```bash
244# Pull latest code
245git pull
246
247# Rebuild and restart
248docker compose up -d --build
249```
250
251### Manual Deployment (Alternative)
252
253If you prefer running without Docker, use the traditional approach:
254
2551. **Install dependencies:**
256 ```bash
257 python3 -m venv venv
258 source venv/bin/activate
259 pip install -r requirements.txt
260 ```
261
2622. **Configure crontab:**
263 ```bash
264 # Edit the crontab file with your paths
265 # Then install it:
266 crontab crontab
267 ```
268
2693. **Verify cron is running:**
270 ```bash
271 crontab -l
272 ```
273
274## Development Status
275
276### ✅ Phase 1-2 Complete (Oct 24, 2025)
277- [x] Project structure created
278- [x] Data models defined (KagiStory, Perspective, Quote, Source)
279- [x] RSS fetcher with retry logic and tests
280- [x] HTML parser extracting all sections (summary, highlights, perspectives, sources, quote, image)
281- [x] Test fixtures from real Kagi News feed
282
283### 🚧 Next Steps (Phase 3-4)
284- [ ] Rich text formatter (convert to Coves format with facets)
285- [ ] State manager for deduplication
286- [ ] Configuration loader
287- [ ] ATProto client for post creation
288- [ ] Main orchestration script
289- [ ] End-to-end tests
290
291## Configuration
292
293Edit `config.yaml` to define feed-to-community mappings:
294
295```yaml
296coves_api_url: "https://api.coves.social"
297
298feeds:
299 - name: "World News"
300 url: "https://news.kagi.com/world.xml"
301 community_handle: "world-news.coves.social"
302 enabled: true
303
304 - name: "Tech News"
305 url: "https://news.kagi.com/tech.xml"
306 community_handle: "tech.coves.social"
307 enabled: true
308```
309
310## Architecture
311
312### Data Flow
313
314```
315Kagi RSS Feed
316 ↓ (HTTP GET)
317RSS Fetcher
318 ↓ (feedparser)
319Parsed RSS Items
320 ↓ (for each item)
321HTML Parser
322 ↓ (BeautifulSoup)
323Structured KagiStory
324 ↓
325Rich Text Formatter
326 ↓ (with facets)
327Post Record
328 ↓ (XRPC)
329Coves Community
330```
331
332### Rich Text Format
333
334Posts use Coves rich text with UTF-8 byte-positioned facets:
335
336```python
337{
338 "content": "Summary text...\n\nHighlights:\n• Point 1\n...",
339 "facets": [
340 {
341 "index": {"byteStart": 20, "byteEnd": 31},
342 "features": [{"$type": "social.coves.richtext.facet#bold"}]
343 },
344 {
345 "index": {"byteStart": 50, "byteEnd": 75},
346 "features": [{"$type": "social.coves.richtext.facet#link", "uri": "https://..."}]
347 }
348 ]
349}
350```
351
352## License
353
354See parent Coves project license.
355
356## Related Documentation
357
358- [PRD: Kagi News Aggregator](../../docs/aggregators/PRD_KAGI_NEWS_RSS.md)
359- [PRD: Aggregator System](../../docs/aggregators/PRD_AGGREGATORS.md)
360- [Coves Rich Text Lexicon](../../internal/atproto/lexicon/social/coves/richtext/README.md)