Kagi News RSS Aggregator PRD#
Status: Planning Phase Owner: Platform Team Last Updated: 2025-10-20 Parent PRD: PRD_AGGREGATORS.md
Overview#
The Kagi News RSS Aggregator is a reference implementation of the Coves aggregator system that automatically posts high-quality, multi-source news summaries to communities. It leverages Kagi News's free RSS feeds to provide pre-aggregated, deduped news content with multiple perspectives and source citations.
Key Value Propositions:
- Multi-source aggregation: Kagi News already aggregates multiple sources per story
- Balanced perspectives: Built-in perspective tracking from different outlets
- Rich metadata: Categories, highlights, source links included
- Legal & free: CC BY-NC licensed for non-commercial use
- Low complexity: No LLM deduplication needed (Kagi does it)
Data Source: Kagi News RSS Feeds#
Licensing & Legal#
License: CC BY-NC (Creative Commons Attribution-NonCommercial)
Terms:
- ✅ Free for non-commercial use (Coves qualifies)
- ✅ Attribution required (must credit Kagi News)
- ❌ Cannot use commercially (must contact support@kagi.com for commercial license)
- ✅ Data can be shared (with same attribution + NC requirements)
Source: https://news.kagi.com/about
Quote from Kagi:
Note that kite.json and files referenced by it are licensed under CC BY-NC license. This means that this data can be used free of charge (with attribution and for non-commercial use). If you would like to license this data for commercial use let us know through support@kagi.com.
Compliance Requirements:
- Visible attribution to Kagi News on every post
- Link back to original Kagi story page
- Non-commercial operation (met: Coves is non-commercial)
RSS Feed Structure#
Base URL Pattern: https://news.kagi.com/{category}.xml
Known Categories:
world.xml- World newstech.xml- Technology (likely)business.xml- Business (likely)sports.xml- Sports (likely)- Additional categories TBD (need to scrape homepage)
Feed Format: RSS 2.0 (standard XML)
Update Frequency: One daily update (~noon UTC)
RSS Item Schema#
Each <item> in the feed contains:
<item>
<title>Story headline</title>
<link>https://kite.kagi.com/{uuid}/{category}/{id}</link>
<description>Full HTML content (see below)</description>
<guid isPermaLink="true">https://kite.kagi.com/{uuid}/{category}/{id}</guid>
<category>Primary category (e.g., "World")</category>
<category>Subcategory (e.g., "World/Conflict & Security")</category>
<category>Tag (e.g., "Conflict & Security")</category>
<pubDate>Mon, 20 Oct 2025 01:46:31 +0000</pubDate>
</item>
Description HTML Structure:
<p>Main summary paragraph with inline source citations [source1.com#1][source2.com#1]</p>
<img src='https://kagiproxy.com/img/...' alt='Image caption' />
<h3>Highlights:</h3>
<ul>
<li>Key point 1 with [source.com#1] citations</li>
<li>Key point 2...</li>
</ul>
<blockquote>Notable quote - Person Name</blockquote>
<h3>Perspectives:</h3>
<ul>
<li>Viewpoint holder: Their perspective. (<a href='...'>Source</a>)</li>
</ul>
<h3>Sources:</h3>
<ul>
<li><a href='https://...'>Article title</a> - domain.com</li>
</ul>
Key Features:
- Multiple source citations inline
- Balanced perspectives from different actors
- Highlights extract key points
- Direct quotes preserved
- All sources linked with attribution
Architecture#
High-Level Flow#
┌─────────────────────────────────────────────────────────────┐
│ Kagi News RSS Feeds (External) │
│ - https://news.kagi.com/world.xml │
│ - https://news.kagi.com/tech.xml │
│ - etc. │
└─────────────────────────────────────────────────────────────┘
│
│ HTTP GET one job after update
▼
┌─────────────────────────────────────────────────────────────┐
│ Kagi News Aggregator Service │
│ DID: did:web:kagi-news.coves.social │
│ │
│ Components: │
│ 1. Feed Poller: Fetches RSS feeds on schedule │
│ 2. Item Parser: Extracts structured data from HTML │
│ 3. Deduplication: Tracks posted GUIDs (no LLM needed) │
│ 4. Category Mapper: Maps Kagi categories to communities │
│ 5. Post Formatter: Converts to Coves post format │
│ 6. Post Publisher: Calls social.coves.post.create │
└─────────────────────────────────────────────────────────────┘
│
│ Authenticated XRPC calls
▼
┌─────────────────────────────────────────────────────────────┐
│ Coves AppView (social.coves.post.create) │
│ - Validates aggregator authorization │
│ - Creates post with author = did:web:kagi-news.coves.social│
│ - Indexes to community feeds │
└─────────────────────────────────────────────────────────────┘
Aggregator Service Declaration#
{
"$type": "social.coves.aggregator.service",
"did": "did:web:kagi-news.coves.social",
"displayName": "Kagi News Aggregator",
"description": "Automatically posts breaking news from Kagi News RSS feeds. Kagi News aggregates multiple sources per story with balanced perspectives and comprehensive source citations.",
"aggregatorType": "social.coves.aggregator.types#rss",
"avatar": "<blob reference to Kagi logo>",
"configSchema": {
"type": "object",
"properties": {
"categories": {
"type": "array",
"items": {
"type": "string",
"enum": ["world", "tech", "business", "sports", "science"]
},
"description": "Kagi News categories to monitor",
"minItems": 1
},
"subcategoryFilter": {
"type": "array",
"items": { "type": "string" },
"description": "Optional: only post stories with these subcategories (e.g., 'World/Middle East', 'Tech/AI')"
},
"minSources": {
"type": "integer",
"minimum": 1,
"default": 2,
"description": "Minimum number of sources required for a story to be posted"
},
"includeImages": {
"type": "boolean",
"default": true,
"description": "Include images from Kagi proxy in posts"
},
"postFormat": {
"type": "string",
"enum": ["full", "summary", "minimal"],
"default": "full",
"description": "How much content to include: full (all sections), summary (main paragraph + sources), minimal (title + link only)"
}
},
"required": ["categories"]
},
"sourceUrl": "https://github.com/coves-social/kagi-news-aggregator",
"maintainer": "did:plc:coves-platform",
"createdAt": "2025-10-20T12:00:00Z"
}
Community Configuration Examples#
Example 1: World News Community#
{
"aggregatorDid": "did:web:kagi-news.coves.social",
"enabled": true,
"config": {
"categories": ["world"],
"minSources": 3,
"includeImages": true,
"postFormat": "full"
}
}
Result: Posts all world news stories with 3+ sources, full content including images/highlights/perspectives.
Example 2: AI/Tech Community (Filtered)#
{
"aggregatorDid": "did:web:kagi-news.coves.social",
"enabled": true,
"config": {
"categories": ["tech", "business"],
"subcategoryFilter": ["Tech/AI", "Tech/Machine Learning", "Business/Tech Industry"],
"minSources": 2,
"includeImages": true,
"postFormat": "full"
}
}
Result: Only posts tech stories about AI/ML or tech industry business news with 2+ sources.
Example 3: Breaking News (Minimal)#
{
"aggregatorDid": "did:web:kagi-news.coves.social",
"enabled": true,
"config": {
"categories": ["world", "business", "tech"],
"minSources": 5,
"includeImages": false,
"postFormat": "minimal"
}
}
Result: Only major stories (5+ sources), minimal format (headline + link), no images.
Post Format Specification#
Post Record Structure#
{
"$type": "social.coves.post.record",
"author": "did:web:kagi-news.coves.social",
"community": "did:plc:worldnews123",
"title": "{Kagi story title}",
"content": "{formatted content based on postFormat config}",
"embed": {
"$type": "app.bsky.embed.external",
"external": {
"uri": "https://kite.kagi.com/{uuid}/{category}/{id}",
"title": "{story title}",
"description": "{summary excerpt}",
"thumb": "{image blob if includeImages=true}"
}
},
"federatedFrom": {
"platform": "kagi-news-rss",
"uri": "https://kite.kagi.com/{uuid}/{category}/{id}",
"id": "{guid}",
"originalCreatedAt": "{pubDate from RSS}"
},
"contentLabels": [
"{primary category}",
"{subcategories}"
],
"createdAt": "{current timestamp}"
}
Content Formatting by postFormat#
Format: full (Default)#
{Main summary paragraph with source citations}
**Highlights:**
• {Bullet point 1}
• {Bullet point 2}
• ...
**Perspectives:**
• **{Actor}**: {Their perspective} ([Source]({url}))
• ...
> {Notable quote} — {Attribution}
**Sources:**
• [{Title}]({url}) - {domain}
• ...
---
📰 Story aggregated by [Kagi News]({kagi_story_url})
Rationale: Preserves Kagi's rich multi-source analysis, provides maximum value.
Format: summary#
{Main summary paragraph with source citations}
**Sources:**
• [{Title}]({url}) - {domain}
• ...
---
📰 Story aggregated by [Kagi News]({kagi_story_url})
Rationale: Clean summary with source links, less overwhelming.
Format: minimal#
{Story title}
Read more: {kagi_story_url}
**Sources:** {domain1}, {domain2}, {domain3}...
---
📰 Via [Kagi News]({kagi_story_url})
Rationale: Just headlines with link, for high-volume communities or breaking news alerts.
Implementation Details#
Component 1: Feed Poller#
Responsibility: Fetch RSS feeds on schedule
type FeedPoller struct {
categories []string
pollInterval time.Duration
httpClient *http.Client
}
func (p *FeedPoller) Start(ctx context.Context) error {
ticker := time.NewTicker(p.pollInterval) // 15 minutes
defer ticker.Stop()
for {
select {
case <-ticker.C:
for _, category := range p.categories {
feedURL := fmt.Sprintf("https://news.kagi.com/%s.xml", category)
feed, err := p.fetchFeed(feedURL)
if err != nil {
log.Printf("Failed to fetch %s: %v", feedURL, err)
continue
}
p.handleFeed(ctx, category, feed)
}
case <-ctx.Done():
return nil
}
}
}
func (p *FeedPoller) fetchFeed(url string) (*gofeed.Feed, error) {
parser := gofeed.NewParser()
feed, err := parser.ParseURL(url)
return feed, err
}
Libraries:
github.com/mmcdole/gofeed- RSS/Atom parser
Component 2: Item Parser#
Responsibility: Extract structured data from RSS item HTML
type KagiStory struct {
Title string
Link string
GUID string
PubDate time.Time
Categories []string
// Parsed from HTML description
Summary string
Highlights []string
Perspectives []Perspective
Quote *Quote
Sources []Source
ImageURL string
ImageAlt string
}
type Perspective struct {
Actor string
Description string
SourceURL string
}
type Quote struct {
Text string
Attribution string
}
type Source struct {
Title string
URL string
Domain string
}
func (p *ItemParser) Parse(item *gofeed.Item) (*KagiStory, error) {
doc, err := goquery.NewDocumentFromReader(strings.NewReader(item.Description))
if err != nil {
return nil, err
}
story := &KagiStory{
Title: item.Title,
Link: item.Link,
GUID: item.GUID,
PubDate: *item.PublishedParsed,
Categories: item.Categories,
}
// Extract summary (first <p> tag)
story.Summary = doc.Find("p").First().Text()
// Extract highlights
doc.Find("h3:contains('Highlights')").Next("ul").Find("li").Each(func(i int, s *goquery.Selection) {
story.Highlights = append(story.Highlights, s.Text())
})
// Extract perspectives
doc.Find("h3:contains('Perspectives')").Next("ul").Find("li").Each(func(i int, s *goquery.Selection) {
text := s.Text()
link := s.Find("a").First()
sourceURL, _ := link.Attr("href")
// Parse format: "Actor: Description (Source)"
parts := strings.SplitN(text, ":", 2)
if len(parts) == 2 {
story.Perspectives = append(story.Perspectives, Perspective{
Actor: strings.TrimSpace(parts[0]),
Description: strings.TrimSpace(parts[1]),
SourceURL: sourceURL,
})
}
})
// Extract quote
doc.Find("blockquote").Each(func(i int, s *goquery.Selection) {
text := s.Text()
parts := strings.Split(text, " - ")
if len(parts) == 2 {
story.Quote = &Quote{
Text: strings.TrimSpace(parts[0]),
Attribution: strings.TrimSpace(parts[1]),
}
}
})
// Extract sources
doc.Find("h3:contains('Sources')").Next("ul").Find("li").Each(func(i int, s *goquery.Selection) {
link := s.Find("a").First()
url, _ := link.Attr("href")
title := link.Text()
domain := extractDomain(s.Text())
story.Sources = append(story.Sources, Source{
Title: title,
URL: url,
Domain: domain,
})
})
// Extract image
img := doc.Find("img").First()
if img.Length() > 0 {
story.ImageURL, _ = img.Attr("src")
story.ImageAlt, _ = img.Attr("alt")
}
return story, nil
}
Libraries:
github.com/PuerkitoBio/goquery- HTML parsing
Component 3: Deduplication#
Responsibility: Track posted stories to prevent duplicates
type Deduplicator struct {
db *sql.DB
}
func (d *Deduplicator) AlreadyPosted(guid string) (bool, error) {
var exists bool
err := d.db.QueryRow(`
SELECT EXISTS(
SELECT 1 FROM kagi_news_posted_stories
WHERE guid = $1
)
`, guid).Scan(&exists)
return exists, err
}
func (d *Deduplicator) MarkPosted(guid, postURI string) error {
_, err := d.db.Exec(`
INSERT INTO kagi_news_posted_stories (guid, post_uri, posted_at)
VALUES ($1, $2, NOW())
ON CONFLICT (guid) DO NOTHING
`, guid, postURI)
return err
}
Database Table:
CREATE TABLE kagi_news_posted_stories (
guid TEXT PRIMARY KEY,
post_uri TEXT NOT NULL,
posted_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_kagi_posted_at ON kagi_news_posted_stories(posted_at DESC);
Cleanup: Periodic job deletes rows older than 30 days (Kagi unlikely to re-post old stories).
Component 4: Category Mapper#
Responsibility: Map Kagi categories to authorized communities
func (m *CategoryMapper) GetTargetCommunities(story *KagiStory) ([]*CommunityAuth, error) {
// Get all communities that have authorized this aggregator
allAuths, err := m.aggregator.GetAuthorizedCommunities(context.Background())
if err != nil {
return nil, err
}
var targets []*CommunityAuth
for _, auth := range allAuths {
if !auth.Enabled {
continue
}
config := auth.Config
// Check if story's primary category is in config.categories
primaryCategory := story.Categories[0]
if !contains(config["categories"], primaryCategory) {
continue
}
// Check subcategory filter (if specified)
if subcatFilter, ok := config["subcategoryFilter"].([]string); ok && len(subcatFilter) > 0 {
if !hasAnySubcategory(story.Categories, subcatFilter) {
continue
}
}
// Check minimum sources requirement
minSources := config["minSources"].(int)
if len(story.Sources) < minSources {
continue
}
targets = append(targets, auth)
}
return targets, nil
}
Component 5: Post Formatter#
Responsibility: Convert Kagi story to Coves post format
func (f *PostFormatter) Format(story *KagiStory, format string) string {
switch format {
case "full":
return f.formatFull(story)
case "summary":
return f.formatSummary(story)
case "minimal":
return f.formatMinimal(story)
default:
return f.formatFull(story)
}
}
func (f *PostFormatter) formatFull(story *KagiStory) string {
var buf strings.Builder
// Summary
buf.WriteString(story.Summary)
buf.WriteString("\n\n")
// Highlights
if len(story.Highlights) > 0 {
buf.WriteString("**Highlights:**\n")
for _, h := range story.Highlights {
buf.WriteString(fmt.Sprintf("• %s\n", h))
}
buf.WriteString("\n")
}
// Perspectives
if len(story.Perspectives) > 0 {
buf.WriteString("**Perspectives:**\n")
for _, p := range story.Perspectives {
buf.WriteString(fmt.Sprintf("• **%s**: %s ([Source](%s))\n", p.Actor, p.Description, p.SourceURL))
}
buf.WriteString("\n")
}
// Quote
if story.Quote != nil {
buf.WriteString(fmt.Sprintf("> %s — %s\n\n", story.Quote.Text, story.Quote.Attribution))
}
// Sources
buf.WriteString("**Sources:**\n")
for _, s := range story.Sources {
buf.WriteString(fmt.Sprintf("• [%s](%s) - %s\n", s.Title, s.URL, s.Domain))
}
buf.WriteString("\n")
// Attribution
buf.WriteString(fmt.Sprintf("---\n📰 Story aggregated by [Kagi News](%s)", story.Link))
return buf.String()
}
func (f *PostFormatter) formatSummary(story *KagiStory) string {
var buf strings.Builder
buf.WriteString(story.Summary)
buf.WriteString("\n\n**Sources:**\n")
for _, s := range story.Sources {
buf.WriteString(fmt.Sprintf("• [%s](%s) - %s\n", s.Title, s.URL, s.Domain))
}
buf.WriteString("\n")
buf.WriteString(fmt.Sprintf("---\n📰 Story aggregated by [Kagi News](%s)", story.Link))
return buf.String()
}
func (f *PostFormatter) formatMinimal(story *KagiStory) string {
sourceDomains := make([]string, len(story.Sources))
for i, s := range story.Sources {
sourceDomains[i] = s.Domain
}
return fmt.Sprintf(
"%s\n\nRead more: %s\n\n**Sources:** %s\n\n---\n📰 Via [Kagi News](%s)",
story.Title,
story.Link,
strings.Join(sourceDomains, ", "),
story.Link,
)
}
Component 6: Post Publisher#
Responsibility: Create posts via Coves API
func (p *PostPublisher) PublishStory(ctx context.Context, story *KagiStory, communities []*CommunityAuth) error {
for _, comm := range communities {
config := comm.Config
// Format content based on config
postFormat := config["postFormat"].(string)
content := p.formatter.Format(story, postFormat)
// Build embed
var embed *aggregator.Embed
if config["includeImages"].(bool) && story.ImageURL != "" {
// TODO: Handle image upload/blob creation
embed = &aggregator.Embed{
Type: "app.bsky.embed.external",
External: &aggregator.External{
URI: story.Link,
Title: story.Title,
Description: truncate(story.Summary, 300),
Thumb: story.ImageURL, // or blob reference
},
}
}
// Create post
post := aggregator.Post{
Title: story.Title,
Content: content,
Embed: embed,
FederatedFrom: &aggregator.FederatedSource{
Platform: "kagi-news-rss",
URI: story.Link,
ID: story.GUID,
OriginalCreatedAt: story.PubDate,
},
ContentLabels: story.Categories,
}
err := p.aggregator.CreatePost(ctx, comm.CommunityDID, post)
if err != nil {
log.Printf("Failed to create post in %s: %v", comm.CommunityDID, err)
continue
}
// Mark as posted
_ = p.deduplicator.MarkPosted(story.GUID, "post-uri-from-response")
}
return nil
}
Image Handling Strategy#
Initial Implementation (MVP)#
Approach: Use Kagi proxy URLs directly in embeds
Rationale:
- Simplest implementation
- Kagi proxy likely allows hotlinking for non-commercial use
- No storage costs
- Images are already optimized by Kagi
Risk Mitigation:
- Monitor for broken images
- Add fallback: if image fails to load, skip embed
- Prepare migration plan to self-hosting if needed
Code:
if config["includeImages"].(bool) && story.ImageURL != "" {
// Use Kagi proxy URL directly
embed = &aggregator.Embed{
External: &aggregator.External{
Thumb: story.ImageURL, // https://kagiproxy.com/img/...
},
}
}
Future Enhancement (If Issues Arise)#
Approach: Download and re-host images
Implementation:
- Download image from Kagi proxy
- Upload to Coves blob storage (or S3/CDN)
- Use blob reference in embed
Code:
func (p *PostPublisher) uploadImage(imageURL string) (string, error) {
// Download from Kagi proxy
resp, err := http.Get(imageURL)
if err != nil {
return "", err
}
defer resp.Body.Close()
// Upload to blob storage
blob, err := p.blobStore.Upload(resp.Body, resp.Header.Get("Content-Type"))
if err != nil {
return "", err
}
return blob.Ref, nil
}
Decision Point: Only implement if:
- Kagi blocks hotlinking
- Kagi proxy becomes unreliable
- Legal clarification needed
Rate Limiting & Performance#
Rate Limits#
RSS Fetching:
- Poll each category feed every 15 minutes
- Max 4 categories = 4 requests per 15 min = 16 req/hour
- Well within any reasonable limit
Post Creation:
- Aggregator rate limit: 10 posts/hour per community
- Global limit: 100 posts/hour across all communities
- Kagi News publishes ~5-10 stories per category per day
- = ~20-40 posts/day total across all categories
- = ~2-4 posts/hour average
- Well within limits
Performance Targets:
- Story posted within 15 minutes of appearing in RSS feed
- < 1 second to parse and format a story
- < 500ms to publish a post via API
Monitoring & Observability#
Metrics to Track#
Feed Polling:
kagi_feed_poll_total(counter) - Total feed polls by categorykagi_feed_poll_errors(counter) - Failed polls by category/errorkagi_feed_items_fetched(gauge) - Items per poll by categorykagi_feed_poll_duration_seconds(histogram) - Poll latency
Story Processing:
kagi_stories_parsed_total(counter) - Successfully parsed storieskagi_stories_parse_errors(counter) - Parse failures by error typekagi_stories_filtered(counter) - Stories filtered out by reason (duplicate, min sources, category)kagi_stories_posted(counter) - Stories successfully posted by community
Post Publishing:
kagi_posts_created_total(counter) - Total posts createdkagi_posts_failed(counter) - Failed posts by error typekagi_post_publish_duration_seconds(histogram) - Post creation latency
Health:
kagi_aggregator_up(gauge) - Service health (1 = healthy, 0 = down)kagi_last_successful_poll_timestamp(gauge) - Last successful poll time by category
Logging#
Structured Logging:
log.Info("Story posted",
"guid", story.GUID,
"title", story.Title,
"community", comm.CommunityDID,
"post_uri", postURI,
"sources", len(story.Sources),
"format", postFormat,
)
log.Error("Failed to parse story",
"guid", item.GUID,
"feed", feedURL,
"error", err,
)
Log Levels:
- DEBUG: Feed items, parsing details
- INFO: Stories posted, communities targeted
- WARN: Parse errors, rate limit approaching
- ERROR: Failed posts, feed fetch failures
Alerts#
Critical:
- Feed polling failing for > 1 hour
- Post creation failing for > 10 consecutive attempts
- Aggregator unauthorized (auth record disabled/deleted)
Warning:
- Post creation rate < 50% of expected
- Parse errors > 10% of items
- Approaching rate limits (> 80% of quota)
Deployment#
Infrastructure#
Service Type: Long-running daemon
Hosting: Kubernetes (same cluster as Coves AppView)
Resources:
- CPU: 0.5 cores (low CPU usage, mostly I/O)
- Memory: 512 MB (small in-memory cache for recent GUIDs)
- Storage: 1 GB (SQLite for deduplication tracking)
Configuration#
Environment Variables:
# Aggregator identity
AGGREGATOR_DID=did:web:kagi-news.coves.social
AGGREGATOR_PRIVATE_KEY_PATH=/secrets/private-key.pem
# Coves API
COVES_API_URL=https://api.coves.social
# Feed polling
POLL_INTERVAL=15m
CATEGORIES=world,tech,business,sports
# Database (for deduplication)
DB_PATH=/data/kagi-news.db
# Monitoring
METRICS_PORT=9090
LOG_LEVEL=info
Deployment Manifest#
apiVersion: apps/v1
kind: Deployment
metadata:
name: kagi-news-aggregator
namespace: coves
spec:
replicas: 1
selector:
matchLabels:
app: kagi-news-aggregator
template:
metadata:
labels:
app: kagi-news-aggregator
spec:
containers:
- name: aggregator
image: coves/kagi-news-aggregator:latest
env:
- name: AGGREGATOR_DID
value: did:web:kagi-news.coves.social
- name: COVES_API_URL
value: https://api.coves.social
- name: POLL_INTERVAL
value: 15m
- name: CATEGORIES
value: world,tech,business,sports
- name: DB_PATH
value: /data/kagi-news.db
- name: AGGREGATOR_PRIVATE_KEY_PATH
value: /secrets/private-key.pem
volumeMounts:
- name: data
mountPath: /data
- name: secrets
mountPath: /secrets
readOnly: true
ports:
- name: metrics
containerPort: 9090
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: data
persistentVolumeClaim:
claimName: kagi-news-data
- name: secrets
secret:
secretName: kagi-news-private-key
Testing Strategy#
Unit Tests#
Feed Parsing:
func TestParseFeed(t *testing.T) {
feed := loadTestFeed("testdata/world.xml")
stories, err := parser.Parse(feed)
assert.NoError(t, err)
assert.Len(t, stories, 10)
story := stories[0]
assert.NotEmpty(t, story.Title)
assert.NotEmpty(t, story.Summary)
assert.Greater(t, len(story.Sources), 1)
}
func TestParseStoryHTML(t *testing.T) {
html := `<p>Summary [source.com#1]</p>
<h3>Highlights:</h3>
<ul><li>Point 1</li></ul>
<h3>Sources:</h3>
<ul><li><a href="https://example.com">Title</a> - example.com</li></ul>`
story, err := parser.ParseHTML(html)
assert.NoError(t, err)
assert.Equal(t, "Summary [source.com#1]", story.Summary)
assert.Len(t, story.Highlights, 1)
assert.Len(t, story.Sources, 1)
}
Formatting:
func TestFormatFull(t *testing.T) {
story := &KagiStory{
Summary: "Test summary",
Sources: []Source{
{Title: "Article", URL: "https://example.com", Domain: "example.com"},
},
}
content := formatter.Format(story, "full")
assert.Contains(t, content, "Test summary")
assert.Contains(t, content, "**Sources:**")
assert.Contains(t, content, "📰 Story aggregated by")
}
Deduplication:
func TestDeduplication(t *testing.T) {
guid := "test-guid-123"
posted, err := deduplicator.AlreadyPosted(guid)
assert.NoError(t, err)
assert.False(t, posted)
err = deduplicator.MarkPosted(guid, "at://...")
assert.NoError(t, err)
posted, err = deduplicator.AlreadyPosted(guid)
assert.NoError(t, err)
assert.True(t, posted)
}
Integration Tests#
With Mock Coves API:
func TestPublishStory(t *testing.T) {
// Setup mock Coves API
mockAPI := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, "/xrpc/social.coves.post.create", r.URL.Path)
var input CreatePostInput
json.NewDecoder(r.Body).Decode(&input)
assert.Equal(t, "did:plc:test-community", input.Community)
assert.NotEmpty(t, input.Title)
assert.Contains(t, input.Content, "📰 Story aggregated by")
w.WriteHeader(200)
json.NewEncoder(w).Encode(CreatePostOutput{URI: "at://..."})
}))
defer mockAPI.Close()
// Test story publishing
publisher := NewPostPublisher(mockAPI.URL)
err := publisher.PublishStory(ctx, testStory, []*CommunityAuth{testComm})
assert.NoError(t, err)
}
E2E Tests#
With Real RSS Feed:
func TestE2E_FetchAndParse(t *testing.T) {
if testing.Short() {
t.Skip("Skipping E2E test")
}
// Fetch real Kagi News feed
feed, err := poller.fetchFeed("https://news.kagi.com/world.xml")
assert.NoError(t, err)
assert.NotEmpty(t, feed.Items)
// Parse first item
story, err := parser.Parse(feed.Items[0])
assert.NoError(t, err)
assert.NotEmpty(t, story.Title)
assert.NotEmpty(t, story.Summary)
assert.Greater(t, len(story.Sources), 0)
}
With Test Coves Instance:
func TestE2E_CreatePost(t *testing.T) {
if testing.Short() {
t.Skip("Skipping E2E test")
}
// Create post in test community
post := aggregator.Post{
Title: "Test Kagi News Post",
Content: "Test content...",
}
err := aggregator.CreatePost(ctx, testCommunityDID, post)
assert.NoError(t, err)
// Verify post appears in feed
// (requires test community setup)
}
Success Metrics#
Pre-Launch Checklist#
- Aggregator service declaration published
- DID created and configured (did:web:kagi-news.coves.social)
- RSS feed parser handles all Kagi HTML structures
- Deduplication prevents duplicate posts
- Category mapping works for all configs
- All 3 post formats render correctly
- Attribution to Kagi News visible on all posts
- Rate limiting prevents spam
- Monitoring/alerting configured
- E2E tests passing against test instance
Alpha Goals (First Week)#
- 3+ communities using Kagi News aggregator
- 50+ posts created successfully
- Zero duplicate posts
- < 5% parse errors
- < 1% post creation failures
- Stories posted within 15 minutes of RSS publication
Beta Goals (First Month)#
- 10+ communities using aggregator
- 500+ posts created
- Community feedback positive (surveys)
- Attribution compliance verified
- No rate limit violations
- < 1% error rate (parsing + posting)
Future Enhancements#
Phase 2 Features#
Smart Category Detection:
- Use LLM to suggest additional categories for stories
- Map Kagi categories to community tags automatically
Customizable Templates:
- Allow communities to customize post format with templates
- Support Markdown/Handlebars templates in config
Story Scoring:
- Prioritize high-impact stories (many sources, breaking news)
- Delay low-priority stories to avoid flooding feed
Cross-posting Prevention:
- Detect when multiple communities authorize same category
- Intelligently cross-post vs. duplicate
Phase 3 Features#
Interactive Features:
- Bot responds to comments with additional sources
- Updates megathread with new sources as story develops
Analytics Dashboard:
- Show communities which stories get most engagement
- Trending topics from Kagi News
- Source diversity metrics
Federation:
- Support other Coves instances using same aggregator
- Shared deduplication across instances
Open Questions#
Need to Resolve Before Launch#
-
Image Licensing:
- ❓ Are images from Kagi proxy covered by CC BY-NC?
- ❓ Do we need to attribute original image sources?
- Action: Email support@kagi.com for clarification
-
Hotlinking Policy:
- ❓ Is embedding Kagi proxy images acceptable?
- ❓ Should we download and re-host?
- Action: Test in staging, monitor for issues
-
Category Discovery:
- ❓ How to discover all available category feeds?
- ❓ Are there categories beyond world/tech/business/sports?
- Action: Scrape https://news.kagi.com/ for all .xml links
-
Attribution Format:
- ❓ Is "📰 Story aggregated by Kagi News" sufficient?
- ❓ Do we need more prominent attribution?
- Action: Review CC BY-NC best practices
References#
- Kagi News About Page: https://news.kagi.com/about
- Kagi News RSS Example: https://news.kagi.com/world.xml
- Kagi Kite Public Repo: https://github.com/kagisearch/kite-public
- CC BY-NC License: https://creativecommons.org/licenses/by-nc/4.0/
- Parent PRD: PRD_AGGREGATORS.md
- Aggregator SDK: [TBD]