A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt
2
3**Status:** Ongoing
4**Owner:** Platform Team
5**Last Updated:** 2025-10-17
6
7## Overview
8
9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
10
11---
12
13## 🔴 P0: Critical (Alpha Blockers)
14
15### OAuth DPoP Token Architecture - Voting Write-Forward
16**Added:** 2025-11-02 | **Completed:** 2025-11-02 | **Effort:** 2 hours | **Priority:** ALPHA BLOCKER
17**Status:** ✅ COMPLETE
18
19**Problem:**
20Our backend is attempting to use DPoP-bound OAuth tokens to write votes to users' PDSs, causing "Malformed token" errors. This violates atProto architecture patterns.
21
22**Current (Incorrect) Flow:**
23```
24Mobile Client (OAuth + DPoP) → Coves Backend → User's PDS ❌
25 ↓
26 "Malformed token" error
27```
28
29**Root Cause:**
30- Mobile app uses OAuth with DPoP (Demonstrating Proof of Possession)
31- DPoP tokens are cryptographically bound to client's private key via `cnf.jkt` claim
32- Each PDS request requires **both**:
33 - `Authorization: Bearer <token>`
34 - `DPoP: <signed-proof-jwt>` (signature proves client has private key)
35- Backend cannot create DPoP proofs (doesn't have client's private key)
36- **DPoP tokens are intentionally non-transferable** (security feature to prevent token theft)
37
38**Evidence:**
39```json
40// Token decoded from mobile app session
41{
42 "sub": "did:plc:txrork7rurdueix27ulzi7ke",
43 "cnf": {
44 "jkt": "LSWROJhTkPn4yT18xUjiIz2Z7z7l_gozKfjjQTYgW9o" // ← DPoP binding
45 },
46 "client_id": "https://lingering-darkness-50a6.brettmay0212.workers.dev/client-metadata.json",
47 "iss": "http://localhost:3001"
48}
49```
50
51**atProto Best Practice (from Bluesky social-app analysis):**
52- ✅ Clients write **directly to their own PDS** (no backend proxy)
53- ✅ AppView **only indexes** from Jetstream (eventual consistency)
54- ✅ PDS = User's personal data store (user controls writes)
55- ✅ AppView = Read-only aggregator/indexer
56- ❌ Backend should NOT proxy user write operations
57
58**Correct Architecture:**
59```
60Mobile Client → User's PDS (direct write with DPoP proof) ✓
61 ↓
62 Jetstream (firehose)
63 ↓
64 Coves AppView (indexes votes from firehose)
65```
66
67**Affected Endpoints:**
681. **Vote Creation** - [create_vote.go:76](../internal/api/handlers/vote/create_vote.go#L76)
69 - Currently: Backend writes to PDS using user's token
70 - Should: Return error directing client to write directly
71
722. **Vote Service** - [service.go:126](../internal/core/votes/service.go#L126)
73 - Currently: `createRecordOnPDSAs()` attempts write-forward
74 - Should: Remove write-forward, rely on Jetstream indexing only
75
76**Solution Options:**
77
78**Option A: Client Direct Write (RECOMMENDED - Follows Bluesky)**
79```typescript
80// Mobile client writes directly (like Bluesky social-app)
81const agent = new Agent(oauthSession)
82await agent.call('com.atproto.repo.createRecord', {
83 repo: userDid,
84 collection: 'social.coves.interaction.vote',
85 record: {
86 $type: 'social.coves.interaction.vote',
87 subject: { uri: postUri, cid: postCid },
88 direction: 'up',
89 createdAt: new Date().toISOString()
90 }
91})
92```
93
94Backend changes:
95- Remove write-forward code from vote service
96- Return error from XRPC endpoint: "Votes must be created directly at your PDS"
97- Index votes from Jetstream consumer (already implemented)
98
99**Option B: Backend App Passwords (NOT RECOMMENDED)**
100- User creates app-specific password
101- Backend uses password auth (gets regular JWTs, not DPoP)
102- Security downgrade, poor UX
103
104**Option C: Service Auth Token (Complex)**
105- Backend gets its own service credentials
106- Requires PDS to trust our AppView as delegated writer
107- Non-standard atProto pattern
108
109**Recommendation:** Option A (Client Direct Write)
110- Matches atProto architecture
111- Follows Bluesky social-app pattern
112- Best security (user controls their data)
113- Simplest implementation
114
115**Implementation Tasks:**
1161. Update Flutter OAuth package to expose `agent.call()` for custom lexicons
1172. Update mobile vote UI to write directly to PDS
1183. Remove write-forward code from backend vote service
1194. Update vote XRPC handler to return helpful error message
1205. Verify Jetstream consumer correctly indexes votes
1216. Update integration tests to match new flow
122
123**References:**
124- Bluesky social-app: Direct PDS writes via agent
125- atProto OAuth spec: DPoP binding prevents token reuse
126- atProto architecture: AppView = read-only indexer
127
128---
129
130### OAuth DPoP Token Architecture - Community Subscriptions
131**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER
132**Status:** 📋 TODO (Waiting for frontend implementation)
133
134**Problem:**
135Same DPoP token issue as voting - backend cannot use user's DPoP-bound OAuth tokens to write subscription records to user's PDS.
136
137**Affected Operations:**
138- `SubscribeToCommunity()` - [service.go:564-624](../internal/core/communities/service.go#L564-L624)
139- `UnsubscribeFromCommunity()` - [service.go:626-660](../internal/core/communities/service.go#L626-L660)
140
141**Collection:** `social.coves.community.subscription`
142
143**Solution:**
144Client writes directly using `com.atproto.repo.createRecord`:
145```typescript
146await agent.call('com.atproto.repo.createRecord', {
147 repo: userDid,
148 collection: 'social.coves.community.subscription',
149 record: {
150 $type: 'social.coves.community.subscription',
151 subject: communityDid,
152 contentVisibility: 3,
153 createdAt: new Date().toISOString()
154 }
155})
156```
157
158**Backend Changes Needed:**
1591. Remove write-forward from `SubscribeToCommunity()` and `UnsubscribeFromCommunity()`
1602. Update handlers to return errors directing to client-direct pattern
1613. Verify Jetstream consumer indexes subscriptions (already working)
162
163**Files to Modify:**
164- `internal/core/communities/service.go`
165- `internal/api/handlers/community/subscribe.go`
166
167---
168
169### OAuth DPoP Token Architecture - Community Blocking
170**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER
171**Status:** 📋 TODO (Waiting for frontend implementation)
172
173**Problem:**
174Same DPoP token issue - backend cannot use user's DPoP-bound OAuth tokens to write block records to user's PDS.
175
176**Affected Operations:**
177- `BlockCommunity()` - [service.go:709-781](../internal/core/communities/service.go#L709-L781)
178- `UnblockCommunity()` - [service.go:783-816](../internal/core/communities/service.go#L783-L816)
179
180**Collection:** `social.coves.community.block`
181
182**Solution:**
183Client writes directly using `com.atproto.repo.createRecord`:
184```typescript
185await agent.call('com.atproto.repo.createRecord', {
186 repo: userDid,
187 collection: 'social.coves.community.block',
188 record: {
189 $type: 'social.coves.community.block',
190 subject: communityDid,
191 createdAt: new Date().toISOString()
192 }
193})
194```
195
196**Backend Changes Needed:**
1971. Remove write-forward from `BlockCommunity()` and `UnblockCommunity()`
1982. Update handlers to return errors directing to client-direct pattern
1993. Verify Jetstream consumer indexes blocks (already working)
200
201**Files to Modify:**
202- `internal/core/communities/service.go`
203- `internal/api/handlers/community/block.go`
204
205---
206
207## 🟡 P1: Important (Alpha Blockers)
208
209### at-identifier Handle Resolution in Endpoints
210**Added:** 2025-10-18 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER
211
212**Problem:**
213Current implementation rejects handles in endpoints that declare `"format": "at-identifier"` in their lexicon schemas, violating atProto best practices and breaking legitimate client usage.
214
215**Impact:**
216- ❌ Post creation fails when client sends community handle (e.g., `!gardening.communities.coves.social`)
217- ❌ Subscribe/unsubscribe endpoints reject handles despite lexicon declaring `at-identifier`
218- ❌ Block endpoints use `"format": "did"` but should use `at-identifier` for consistency
219- 🔴 **P0 Issue:** API contract violation - clients following the schema are rejected
220
221**Root Cause:**
222Handlers and services validate `strings.HasPrefix(req.Community, "did:")` instead of calling `ResolveCommunityIdentifier()`.
223
224**Affected Endpoints:**
2251. **Post Creation** - [create.go:54](../internal/api/handlers/post/create.go#L54), [service.go:51](../internal/core/posts/service.go#L51)
226 - Lexicon declares `at-identifier`: [post/create.json:16](../internal/atproto/lexicon/social/coves/post/create.json#L16)
227
2282. **Subscribe** - [subscribe.go:52](../internal/api/handlers/community/subscribe.go#L52)
229 - Lexicon declares `at-identifier`: [subscribe.json:16](../internal/atproto/lexicon/social/coves/community/subscribe.json#L16)
230
2313. **Unsubscribe** - [subscribe.go:120](../internal/api/handlers/community/subscribe.go#L120)
232 - Lexicon declares `at-identifier`: [unsubscribe.json:16](../internal/atproto/lexicon/social/coves/community/unsubscribe.json#L16)
233
2344. **Block/Unblock** - [block.go:58](../internal/api/handlers/community/block.go#L58), [block.go:132](../internal/api/handlers/community/block.go#L132)
235 - Lexicon declares `"format": "did"`: [block.json:15](../internal/atproto/lexicon/social/coves/community/block.json#L15)
236 - Should be changed to `at-identifier` for consistency and best practice
237
238**atProto Best Practice (from docs):**
239- ✅ API endpoints should accept both DIDs and handles via `at-identifier` format
240- ✅ Resolve handles to DIDs immediately at API boundary
241- ✅ Use DIDs internally for all business logic and storage
242- ✅ Handles are weak refs (changeable), DIDs are strong refs (permanent)
243- ⚠️ Bidirectional verification required (already handled by `identity.CachingResolver`)
244
245**Solution:**
246Replace direct DID validation with handle resolution using existing `ResolveCommunityIdentifier()`:
247
248```go
249// BEFORE (wrong) ❌
250if !strings.HasPrefix(req.Community, "did:") {
251 return error
252}
253
254// AFTER (correct) ✅
255communityDID, err := h.communityService.ResolveCommunityIdentifier(ctx, req.Community)
256if err != nil {
257 if communities.IsNotFound(err) {
258 writeError(w, http.StatusNotFound, "CommunityNotFound", "Community not found")
259 return
260 }
261 writeError(w, http.StatusBadRequest, "InvalidRequest", err.Error())
262 return
263}
264// Now use communityDID (guaranteed to be a DID)
265```
266
267**Implementation Plan:**
2681. ✅ **Phase 1 (Alpha Blocker):** Fix post creation endpoint
269 - Update handler validation in `internal/api/handlers/post/create.go`
270 - Update service validation in `internal/core/posts/service.go`
271 - Add integration tests for handle resolution in post creation
272
2732. 📋 **Phase 2 (Beta):** Fix subscription endpoints
274 - Update subscribe/unsubscribe handlers
275 - Add tests for handle resolution in subscriptions
276
2773. 📋 **Phase 3 (Beta):** Fix block endpoints
278 - Update lexicon from `"format": "did"` → `"format": "at-identifier"`
279 - Update block/unblock handlers
280 - Add tests for handle resolution in blocking
281
282**Files to Modify (Phase 1 - Post Creation):**
283- `internal/api/handlers/post/create.go` - Remove DID validation, add handle resolution
284- `internal/core/posts/service.go` - Remove DID validation, add handle resolution
285- `internal/core/posts/interfaces.go` - Add `CommunityService` dependency
286- `cmd/server/main.go` - Pass community service to post service constructor
287- `tests/integration/post_creation_test.go` - Add handle resolution test cases
288
289**Existing Infrastructure:**
290✅ `ResolveCommunityIdentifier()` already implemented at [service.go:843](../internal/core/communities/service.go#L843)
291✅ `identity.CachingResolver` handles bidirectional verification and caching
292✅ Supports both handle (`!name.communities.instance.com`) and DID formats
293
294**Current Status:**
295- ⚠️ **BLOCKING POST CREATION PR**: Identified as P0 issue in code review
296- 📋 Phase 1 (post creation) - To be implemented immediately
297- 📋 Phase 2-3 (other endpoints) - Deferred to Beta
298
299---
300
301### did:web Domain Verification & hostedByDID Auto-Population
302**Added:** 2025-10-11 | **Updated:** 2025-10-16 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER
303
304**Problem:**
3051. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains
3062. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing
307
308**Attack Scenarios:**
309- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves
310- Federation partners can't verify instance authenticity
311- AppView pollution with fake hosting claims
312
313**Solution:**
3141. **Basic Validation (Phase 1)**: Verify `did:web:` domain matches configured `instanceDomain`
3152. **Cryptographic Verification (Phase 2)**: Fetch `https://domain/.well-known/did.json` and verify:
316 - DID document exists and is valid
317 - Domain ownership proven via HTTPS hosting
318 - DID document matches claimed `instanceDID`
3193. **Auto-populate hostedByDID**: Remove from client API, derive from instance configuration in service layer
320
321**Current Status:**
322- ✅ Default changed from `coves.local` → `coves.social` (fixes `.local` TLD bug)
323- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131)
324- ✅ hostedByDID removed from client requests (2025-10-16)
325- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16)
326- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16)
327- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16)
328- ⚠️ **REMAINING**: Full DID document verification (cryptographic proof of ownership)
329
330**Implementation Notes:**
331- Phase 1 complete: Basic validation catches config errors, logs warnings
332- Phase 2 needed: Fetch `https://domain/.well-known/did.json` and verify ownership
333- Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode
334- Full verification blocks startup if domain ownership cannot be proven
335
336---
337
338### ✅ Token Refresh Logic for Community Credentials - COMPLETE
339**Added:** 2025-10-11 | **Completed:** 2025-10-17 | **Effort:** 1.5 days | **Status:** ✅ DONE
340
341**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
342
343**Solution Implemented:**
344- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration)
345- ✅ JWT expiration parsing without signature verification (`parseJWTExpiration`, `needsRefresh`)
346- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
347- ✅ Password fallback when refresh tokens expire (~2 months) via `atproto.ServerCreateSession`
348- ✅ Atomic credential updates (`UpdateCredentials` repository method)
349- ✅ Concurrency-safe with per-community mutex locking
350- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
351- ✅ Integration tests for token expiration detection and credential updates
352
353**Files Created:**
354- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) - JWT parsing utilities
355- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) - Refresh and re-auth logic
356- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) - Integration tests
357
358**Files Modified:**
359- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` + concurrency control
360- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
361- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
362
363**Documentation:** See [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) for full details
364
365**Impact:** ✅ Communities can now be updated 24+ hours after creation without manual intervention
366
367---
368
369### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE
370**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE
371
372**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it.
373
374**Solution Implemented:**
375- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3)
376- ✅ Store in subscription record on PDS (`social.coves.community.subscription`)
377- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint
378- ✅ Clamping at all layers (handler, service, consumer) for defense in depth
379- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount)
380- ✅ Idempotent operations (safe for Jetstream event replays)
381- ✅ Fixed critical collection name bug (was using wrong namespace)
382- ✅ Production Jetstream consumer now running
383- ✅ 13 comprehensive integration tests - all passing
384
385**Files Modified:**
386- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions
387- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility
388- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS
389- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes
390- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated
391- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes
392- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage
393
394**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details
395
396**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled)
397
398---
399
400### Community Blocking
401**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER
402
403**Problem:** Users have no way to block unwanted communities from their feeds.
404
405**Solution:**
4061. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only)
4072. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()`
4083. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock`
4094. **Repository:** Add methods to track blocked communities
4105. **Feed:** Filter blocked communities from feed queries (beta work)
411
412**Code:**
413- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs
414- Service: New methods needed
415- Handlers: New files needed
416
417**Impact:** Users can't avoid unwanted content without blocking
418
419---
420
421### Post comment_count Reconciliation Missing
422**Added:** 2025-11-04 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER
423
424**Problem:**
425When comments arrive before their parent post is indexed (common with cross-repo Jetstream ordering), the post's `comment_count` is never reconciled. Later, when the post consumer indexes the post, there's no logic to count pre-existing comments. This causes posts to have permanently stale `comment_count` values.
426
427**End-User Impact:**
428- 🔴 Posts show "0 comments" when they actually have comments
429- ❌ Broken engagement signals (users don't know there are discussions)
430- ❌ UI inconsistency (thread page shows comments, but counter says "0")
431- ⚠️ Users may not click into posts thinking they're empty
432- 📉 Reduced engagement due to misleading counters
433
434**Root Cause:**
435- Comment consumer updates post counts when processing comment events ([comment_consumer.go:323-343](../internal/atproto/jetstream/comment_consumer.go#L323-L343))
436- If comment arrives BEFORE post is indexed, update query returns 0 rows (only logs warning)
437- When post consumer later indexes the post, it sets `comment_count = 0` with NO reconciliation
438- Comments already exist in DB, but post never "discovers" them
439
440**Solution:**
441Post consumer MUST implement the same reconciliation pattern as comment consumer (see [comment_consumer.go:292-305](../internal/atproto/jetstream/comment_consumer.go#L292-L305)):
442
443```go
444// After inserting new post, reconcile comment_count for out-of-order comments
445reconcileQuery := `
446 UPDATE posts
447 SET comment_count = (
448 SELECT COUNT(*)
449 FROM comments c
450 WHERE c.parent_uri = $1 AND c.deleted_at IS NULL
451 )
452 WHERE id = $2
453`
454_, reconcileErr := tx.ExecContext(ctx, reconcileQuery, postURI, postID)
455```
456
457**Affected Operations:**
458- Post indexing from Jetstream ([post_consumer.go](../internal/atproto/jetstream/post_consumer.go))
459- Any cross-repo event ordering (community DID ≠ author DID)
460
461**Current Status:**
462- 🔴 Issue documented with FIXME(P1) comment at [comment_consumer.go:311-321](../internal/atproto/jetstream/comment_consumer.go#L311-L321)
463- ⚠️ Test demonstrating limitation exists: `TestCommentConsumer_PostCountReconciliation_Limitation`
464- 📋 Fix required in post consumer (out of scope for comment system PR)
465
466**Files to Modify:**
467- `internal/atproto/jetstream/post_consumer.go` - Add reconciliation after post creation
468- `tests/integration/post_consumer_test.go` - Add test for out-of-order comment reconciliation
469
470**Similar Issue Fixed:**
471- ✅ Comment reply_count reconciliation - Fixed in comment system implementation (2025-11-04)
472
473---
474
475## 🔴 P1.5: Federation Blockers (Beta Launch)
476
477### Cross-PDS Write-Forward Support for Community Service
478**Added:** 2025-10-17 | **Updated:** 2025-11-02 | **Effort:** 3-4 hours | **Priority:** FEDERATION BLOCKER (Beta)
479
480**Problem:** Community service write-forward methods assume all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to subscribe/block communities.
481
482**Current Behavior:**
483- User on `pds.bsky.social` subscribes to community on `coves.social`
484- Coves calls `s.pdsURL` (instance default: `http://localhost:3001`)
485- Write goes to WRONG PDS → fails with `{"error":"InvalidToken","message":"Malformed token"}`
486
487**Impact:**
488- ✅ **Alpha**: Works fine (single PDS deployment, no federation)
489- ❌ **Beta**: Breaks federation (users on different PDSs can't subscribe/block)
490
491**Root Cause:**
492- [service.go:1033](../internal/core/communities/service.go#L1033): `createRecordOnPDSAs` hardcodes `s.pdsURL`
493- [service.go:1050](../internal/core/communities/service.go#L1050): `putRecordOnPDSAs` hardcodes `s.pdsURL`
494- [service.go:1063](../internal/core/communities/service.go#L1063): `deleteRecordOnPDSAs` hardcodes `s.pdsURL`
495
496**Affected Operations:**
497- `SubscribeToCommunity` ([service.go:608](../internal/core/communities/service.go#L608))
498- `UnsubscribeFromCommunity` (calls `deleteRecordOnPDSAs`)
499- `BlockCommunity` ([service.go:739](../internal/core/communities/service.go#L739))
500- `UnblockCommunity` (calls `deleteRecordOnPDSAs`)
501
502**Solution:**
5031. Add `identityResolver identity.Resolver` to `communityService` struct
5042. Before write-forward, resolve user's DID → extract PDS URL
5053. Call user's actual PDS instead of hardcoded `s.pdsURL`
506
507**Implementation Pattern (from Vote Service):**
508```go
509// Add helper method to resolve user's PDS
510func (s *communityService) resolveUserPDS(ctx context.Context, userDID string) (string, error) {
511 identity, err := s.identityResolver.Resolve(ctx, userDID)
512 if err != nil {
513 return "", fmt.Errorf("failed to resolve user PDS: %w", err)
514 }
515 if identity.PDSURL == "" {
516 log.Printf("[COMMUNITY-PDS] WARNING: No PDS URL found for %s, using fallback: %s", userDID, s.pdsURL)
517 return s.pdsURL, nil
518 }
519 return identity.PDSURL, nil
520}
521
522// Update write-forward methods:
523func (s *communityService) createRecordOnPDSAs(ctx context.Context, repoDID, collection, rkey string, record map[string]interface{}, accessToken string) (string, string, error) {
524 // Resolve user's actual PDS (critical for federation)
525 pdsURL, err := s.resolveUserPDS(ctx, repoDID)
526 if err != nil {
527 return "", "", fmt.Errorf("failed to resolve user PDS: %w", err)
528 }
529 endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", strings.TrimSuffix(pdsURL, "/"))
530 // ... rest of method
531}
532```
533
534**Files to Modify:**
535- `internal/core/communities/service.go` - Add resolver field + `resolveUserPDS` helper
536- `internal/core/communities/service.go` - Update `createRecordOnPDSAs`, `putRecordOnPDSAs`, `deleteRecordOnPDSAs`
537- `cmd/server/main.go` - Pass identity resolver to community service constructor
538- Tests - Add cross-PDS subscription/block scenarios
539
540**Testing:**
541- User on external PDS subscribes to community → writes to their PDS
542- User on external PDS blocks community → writes to their PDS
543- Community profile updates still work (writes to community's own PDS)
544
545**Related:**
546- ✅ **Vote Service**: Fixed in Alpha (2025-11-02) - users can vote from any PDS
547- 🔴 **Community Service**: Deferred to Beta (no federation in Alpha)
548
549---
550
551## 🟢 P2: Nice-to-Have
552
553### Remove Categories from Community Lexicon
554**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup
555
556**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value.
557
558**Solution:**
559- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54)
560- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59)
561- Remove from [community.go:91](../internal/core/communities/community.go#L91)
562- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110))
563
564**Impact:** Simplifies lexicon, removes unused feature
565
566---
567
568### Improve .local TLD Error Messages
569**Added:** 2025-10-11 | **Effort:** 1 hour
570
571**Problem:** Generic error "TLD .local is not allowed" confuses developers.
572
573**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`.
574
575---
576
577### Self-Hosting Security Guide
578**Added:** 2025-10-11 | **Effort:** 1 day
579
580**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
581
582---
583
584### OAuth Session Cleanup Race Condition
585**Added:** 2025-10-11 | **Effort:** 2 hours
586
587**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
588
589**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout.
590
591---
592
593### Jetstream Consumer Race Condition
594**Added:** 2025-10-11 | **Effort:** 1 hour
595
596**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown.
597
598**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state.
599
600**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114)
601
602---
603
604### Unfurl Cache Cleanup Background Job
605**Added:** 2025-11-07 | **Effort:** 2-3 hours | **Priority:** Performance/Maintenance
606
607**Problem:** The `unfurl_cache` table will grow indefinitely as expired entries are not deleted. While the cache uses lazy expiration (checking `expires_at` on read), old records remain in the database consuming disk space.
608
609**Impact:**
610- 📊 ~1KB per cached URL
611- 📈 At 10K cached URLs = ~10MB (negligible for alpha)
612- ⚠️ At 1M cached URLs = ~1GB (potential issue at scale)
613- 🐌 Table bloat can slow down queries over time
614
615**Current Mitigation:**
616- ✅ Lazy expiration: Cache hits check `expires_at` and refetch if expired
617- ✅ Indexed on `expires_at` for efficient expiration queries
618- ✅ Not critical for alpha (growth is gradual)
619
620**Solution (Beta/Production):**
621Implement background cleanup job to delete expired entries:
622
623```go
624// Periodic cleanup (run daily or weekly)
625func (r *unfurlRepository) CleanupExpired(ctx context.Context) (int64, error) {
626 query := `DELETE FROM unfurl_cache WHERE expires_at < NOW()`
627 result, err := r.db.ExecContext(ctx, query)
628 if err != nil {
629 return 0, err
630 }
631 return result.RowsAffected()
632}
633```
634
635**Implementation Options:**
6361. **Cron job**: Separate process runs cleanup on schedule
6372. **Background goroutine**: Service-level background task with configurable interval
6383. **PostgreSQL pg_cron extension**: Database-level scheduled cleanup
639
640**Recommended Approach:**
641- Phase 1 (Beta): Background goroutine running weekly cleanup
642- Phase 2 (Production): Migrate to pg_cron or external cron for reliability
643
644**Configuration:**
645```bash
646UNFURL_CACHE_CLEANUP_ENABLED=true
647UNFURL_CACHE_CLEANUP_INTERVAL=168h # 7 days
648```
649
650**Monitoring:**
651- Log cleanup operations: `[UNFURL-CACHE-CLEANUP] Deleted 1234 expired entries`
652- Track table size growth over time
653- Alert if table exceeds threshold (e.g., 100MB)
654
655**Files to Create:**
656- `internal/core/unfurl/cleanup.go` - Background cleanup service
657
658**Related:**
659- Implemented in oEmbed unfurling feature (2025-11-07)
660- Cache table: [migration XXX_create_unfurl_cache.sql](../internal/db/migrations/)
661
662---
663
664## 🔵 P3: Technical Debt
665
666### Consolidate Environment Variable Validation
667**Added:** 2025-10-11 | **Effort:** 2-3 hours
668
669Create `internal/config` package with structured config validation. Fail fast with clear errors.
670
671---
672
673### Add Connection Pooling for PDS HTTP Clients
674**Added:** 2025-10-11 | **Effort:** 2 hours
675
676Create shared `http.Client` with connection pooling instead of new client per request.
677
678---
679
680### Architecture Decision Records (ADRs)
681**Added:** 2025-10-11 | **Effort:** Ongoing
682
683Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
684
685---
686
687### Replace log Package with Structured Logger
688**Added:** 2025-10-11 | **Effort:** 1 day
689
690**Problem:** Using standard `log` package. Need structured logging (JSON) with levels.
691
692**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields.
693
694**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46)
695
696---
697
698### PDS URL Resolution from DID
699**Added:** 2025-10-11 | **Effort:** 2-3 hours
700
701**Problem:** User consumer doesn't resolve PDS URL from DID document when missing.
702
703**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`.
704
705**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203)
706
707---
708
709## Recent Completions
710
711### ✅ Token Refresh for Community Credentials (2025-10-17)
712**Completed:** Automatic token refresh prevents communities from breaking after 2 hours
713
714**Implementation:**
715- ✅ JWT expiration parsing and refresh detection (5-minute buffer)
716- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
717- ✅ Password fallback when refresh tokens expire (`atproto.ServerCreateSession`)
718- ✅ Atomic credential updates in database (`UpdateCredentials`)
719- ✅ Concurrency-safe with per-community mutex locking
720- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
721- ✅ Integration tests for expiration detection and credential updates
722
723**Files Created:**
724- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go)
725- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go)
726- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go)
727
728**Files Modified:**
729- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` method
730- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
731- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
732
733**Documentation:** [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md)
734
735**Impact:** Communities now work indefinitely without manual token management
736
737---
738
739### ✅ OAuth Authentication for Community Actions (2025-10-16)
740**Completed:** Full OAuth JWT authentication flow for protected endpoints
741
742**Implementation:**
743- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling)
744- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints
745- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)`
746- ✅ Removed all X-User-DID header placeholders
747- ✅ E2E tests validate complete OAuth flow with real PDS tokens
748- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs
749
750**Files Modified:**
751- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility
752- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware
753- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated
754- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests
755
756**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above)
757
758---
759
760### ✅ Fix .local TLD Bug (2025-10-11)
761Changed default `INSTANCE_DID` from `did:web:coves.local` → `did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD.
762
763---
764
765## Prioritization
766
767- **P0:** Security vulns, data loss, prod blockers
768- **P1:** Major UX/reliability issues
769- **P2:** QOL improvements, minor bugs, docs
770- **P3:** Refactoring, code quality