A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt
2
3**Status:** Ongoing
4**Owner:** Platform Team
5**Last Updated:** 2025-10-17
6
7## Overview
8
9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
10
11---
12
13## 🔴 P0: Critical (Alpha Blockers)
14
15### OAuth DPoP Token Architecture - Voting Write-Forward
16**Added:** 2025-11-02 | **Completed:** 2025-11-02 | **Effort:** 2 hours | **Priority:** ALPHA BLOCKER
17**Status:** ✅ COMPLETE
18
19**Problem:**
20Our backend is attempting to use DPoP-bound OAuth tokens to write votes to users' PDSs, causing "Malformed token" errors. This violates atProto architecture patterns.
21
22**Current (Incorrect) Flow:**
23```
24Mobile Client (OAuth + DPoP) → Coves Backend → User's PDS ❌
25 ↓
26 "Malformed token" error
27```
28
29**Root Cause:**
30- Mobile app uses OAuth with DPoP (Demonstrating Proof of Possession)
31- DPoP tokens are cryptographically bound to client's private key via `cnf.jkt` claim
32- Each PDS request requires **both**:
33 - `Authorization: Bearer <token>`
34 - `DPoP: <signed-proof-jwt>` (signature proves client has private key)
35- Backend cannot create DPoP proofs (doesn't have client's private key)
36- **DPoP tokens are intentionally non-transferable** (security feature to prevent token theft)
37
38**Evidence:**
39```json
40// Token decoded from mobile app session
41{
42 "sub": "did:plc:txrork7rurdueix27ulzi7ke",
43 "cnf": {
44 "jkt": "LSWROJhTkPn4yT18xUjiIz2Z7z7l_gozKfjjQTYgW9o" // ← DPoP binding
45 },
46 "client_id": "https://lingering-darkness-50a6.brettmay0212.workers.dev/client-metadata.json",
47 "iss": "http://localhost:3001"
48}
49```
50
51**atProto Best Practice (from Bluesky social-app analysis):**
52- ✅ Clients write **directly to their own PDS** (no backend proxy)
53- ✅ AppView **only indexes** from Jetstream (eventual consistency)
54- ✅ PDS = User's personal data store (user controls writes)
55- ✅ AppView = Read-only aggregator/indexer
56- ❌ Backend should NOT proxy user write operations
57
58**Correct Architecture:**
59```
60Mobile Client → User's PDS (direct write with DPoP proof) ✓
61 ↓
62 Jetstream (firehose)
63 ↓
64 Coves AppView (indexes votes from firehose)
65```
66
67**Affected Endpoints:**
681. **Vote Creation** - [create_vote.go:76](../internal/api/handlers/vote/create_vote.go#L76)
69 - Currently: Backend writes to PDS using user's token
70 - Should: Return error directing client to write directly
71
722. **Vote Service** - [service.go:126](../internal/core/votes/service.go#L126)
73 - Currently: `createRecordOnPDSAs()` attempts write-forward
74 - Should: Remove write-forward, rely on Jetstream indexing only
75
76**Solution Options:**
77
78**Option A: Client Direct Write (RECOMMENDED - Follows Bluesky)**
79```typescript
80// Mobile client writes directly (like Bluesky social-app)
81const agent = new Agent(oauthSession)
82await agent.call('com.atproto.repo.createRecord', {
83 repo: userDid,
84 collection: 'social.coves.interaction.vote',
85 record: {
86 $type: 'social.coves.interaction.vote',
87 subject: { uri: postUri, cid: postCid },
88 direction: 'up',
89 createdAt: new Date().toISOString()
90 }
91})
92```
93
94Backend changes:
95- Remove write-forward code from vote service
96- Return error from XRPC endpoint: "Votes must be created directly at your PDS"
97- Index votes from Jetstream consumer (already implemented)
98
99**Option B: Backend App Passwords (NOT RECOMMENDED)**
100- User creates app-specific password
101- Backend uses password auth (gets regular JWTs, not DPoP)
102- Security downgrade, poor UX
103
104**Option C: Service Auth Token (Complex)**
105- Backend gets its own service credentials
106- Requires PDS to trust our AppView as delegated writer
107- Non-standard atProto pattern
108
109**Recommendation:** Option A (Client Direct Write)
110- Matches atProto architecture
111- Follows Bluesky social-app pattern
112- Best security (user controls their data)
113- Simplest implementation
114
115**Implementation Tasks:**
1161. Update Flutter OAuth package to expose `agent.call()` for custom lexicons
1172. Update mobile vote UI to write directly to PDS
1183. Remove write-forward code from backend vote service
1194. Update vote XRPC handler to return helpful error message
1205. Verify Jetstream consumer correctly indexes votes
1216. Update integration tests to match new flow
122
123**References:**
124- Bluesky social-app: Direct PDS writes via agent
125- atProto OAuth spec: DPoP binding prevents token reuse
126- atProto architecture: AppView = read-only indexer
127
128---
129
130### OAuth DPoP Token Architecture - Community Subscriptions
131**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER
132**Status:** 📋 TODO (Waiting for frontend implementation)
133
134**Problem:**
135Same DPoP token issue as voting - backend cannot use user's DPoP-bound OAuth tokens to write subscription records to user's PDS.
136
137**Affected Operations:**
138- `SubscribeToCommunity()` - [service.go:564-624](../internal/core/communities/service.go#L564-L624)
139- `UnsubscribeFromCommunity()` - [service.go:626-660](../internal/core/communities/service.go#L626-L660)
140
141**Collection:** `social.coves.community.subscription`
142
143**Solution:**
144Client writes directly using `com.atproto.repo.createRecord`:
145```typescript
146await agent.call('com.atproto.repo.createRecord', {
147 repo: userDid,
148 collection: 'social.coves.community.subscription',
149 record: {
150 $type: 'social.coves.community.subscription',
151 subject: communityDid,
152 contentVisibility: 3,
153 createdAt: new Date().toISOString()
154 }
155})
156```
157
158**Backend Changes Needed:**
1591. Remove write-forward from `SubscribeToCommunity()` and `UnsubscribeFromCommunity()`
1602. Update handlers to return errors directing to client-direct pattern
1613. Verify Jetstream consumer indexes subscriptions (already working)
162
163**Files to Modify:**
164- `internal/core/communities/service.go`
165- `internal/api/handlers/community/subscribe.go`
166
167---
168
169### OAuth DPoP Token Architecture - Community Blocking
170**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER
171**Status:** 📋 TODO (Waiting for frontend implementation)
172
173**Problem:**
174Same DPoP token issue - backend cannot use user's DPoP-bound OAuth tokens to write block records to user's PDS.
175
176**Affected Operations:**
177- `BlockCommunity()` - [service.go:709-781](../internal/core/communities/service.go#L709-L781)
178- `UnblockCommunity()` - [service.go:783-816](../internal/core/communities/service.go#L783-L816)
179
180**Collection:** `social.coves.community.block`
181
182**Solution:**
183Client writes directly using `com.atproto.repo.createRecord`:
184```typescript
185await agent.call('com.atproto.repo.createRecord', {
186 repo: userDid,
187 collection: 'social.coves.community.block',
188 record: {
189 $type: 'social.coves.community.block',
190 subject: communityDid,
191 createdAt: new Date().toISOString()
192 }
193})
194```
195
196**Backend Changes Needed:**
1971. Remove write-forward from `BlockCommunity()` and `UnblockCommunity()`
1982. Update handlers to return errors directing to client-direct pattern
1993. Verify Jetstream consumer indexes blocks (already working)
200
201**Files to Modify:**
202- `internal/core/communities/service.go`
203- `internal/api/handlers/community/block.go`
204
205---
206
207## 🟡 P1: Important (Alpha Blockers)
208
209### at-identifier Handle Resolution in Endpoints
210**Added:** 2025-10-18 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER
211
212**Problem:**
213Current implementation rejects handles in endpoints that declare `"format": "at-identifier"` in their lexicon schemas, violating atProto best practices and breaking legitimate client usage.
214
215**Impact:**
216- ❌ Post creation fails when client sends community handle (e.g., `!gardening.communities.coves.social`)
217- ❌ Subscribe/unsubscribe endpoints reject handles despite lexicon declaring `at-identifier`
218- ❌ Block endpoints use `"format": "did"` but should use `at-identifier` for consistency
219- 🔴 **P0 Issue:** API contract violation - clients following the schema are rejected
220
221**Root Cause:**
222Handlers and services validate `strings.HasPrefix(req.Community, "did:")` instead of calling `ResolveCommunityIdentifier()`.
223
224**Affected Endpoints:**
2251. **Post Creation** - [create.go:54](../internal/api/handlers/post/create.go#L54), [service.go:51](../internal/core/posts/service.go#L51)
226 - Lexicon declares `at-identifier`: [post/create.json:16](../internal/atproto/lexicon/social/coves/post/create.json#L16)
227
2282. **Subscribe** - [subscribe.go:52](../internal/api/handlers/community/subscribe.go#L52)
229 - Lexicon declares `at-identifier`: [subscribe.json:16](../internal/atproto/lexicon/social/coves/community/subscribe.json#L16)
230
2313. **Unsubscribe** - [subscribe.go:120](../internal/api/handlers/community/subscribe.go#L120)
232 - Lexicon declares `at-identifier`: [unsubscribe.json:16](../internal/atproto/lexicon/social/coves/community/unsubscribe.json#L16)
233
2344. **Block/Unblock** - [block.go:58](../internal/api/handlers/community/block.go#L58), [block.go:132](../internal/api/handlers/community/block.go#L132)
235 - Lexicon declares `"format": "did"`: [block.json:15](../internal/atproto/lexicon/social/coves/community/block.json#L15)
236 - Should be changed to `at-identifier` for consistency and best practice
237
238**atProto Best Practice (from docs):**
239- ✅ API endpoints should accept both DIDs and handles via `at-identifier` format
240- ✅ Resolve handles to DIDs immediately at API boundary
241- ✅ Use DIDs internally for all business logic and storage
242- ✅ Handles are weak refs (changeable), DIDs are strong refs (permanent)
243- ⚠️ Bidirectional verification required (already handled by `identity.CachingResolver`)
244
245**Solution:**
246Replace direct DID validation with handle resolution using existing `ResolveCommunityIdentifier()`:
247
248```go
249// BEFORE (wrong) ❌
250if !strings.HasPrefix(req.Community, "did:") {
251 return error
252}
253
254// AFTER (correct) ✅
255communityDID, err := h.communityService.ResolveCommunityIdentifier(ctx, req.Community)
256if err != nil {
257 if communities.IsNotFound(err) {
258 writeError(w, http.StatusNotFound, "CommunityNotFound", "Community not found")
259 return
260 }
261 writeError(w, http.StatusBadRequest, "InvalidRequest", err.Error())
262 return
263}
264// Now use communityDID (guaranteed to be a DID)
265```
266
267**Implementation Plan:**
2681. ✅ **Phase 1 (Alpha Blocker):** Fix post creation endpoint - COMPLETE (2025-10-18)
269 - Post creation already uses `ResolveCommunityIdentifier()` at [service.go:100](../internal/core/posts/service.go#L100)
270 - Supports handles, DIDs, and scoped formats
271
2722. 📋 **Phase 2 (Beta):** Fix subscription endpoints
273 - Update subscribe/unsubscribe handlers
274 - Add tests for handle resolution in subscriptions
275
2763. ✅ **Phase 3 (Beta):** Fix block endpoints - COMPLETE (2025-11-16)
277 - Updated block/unblock handlers to use `ResolveCommunityIdentifier()`
278 - Accepts handles (`@gaming.community.coves.social`), DIDs, and scoped format (`!gaming@coves.social`)
279 - Added comprehensive tests: [block_handle_resolution_test.go](../tests/integration/block_handle_resolution_test.go)
280 - All 7 test cases passing
281
282**Files Modified (Phase 3 - Block Endpoints):**
283- `internal/api/handlers/community/block.go` - Added `ResolveCommunityIdentifier()` calls
284- `tests/integration/block_handle_resolution_test.go` - Comprehensive test coverage
285
286**Existing Infrastructure:**
287✅ `ResolveCommunityIdentifier()` already implemented at [service.go:852](../internal/core/communities/service.go#L852)
288✅ `identity.CachingResolver` handles bidirectional verification and caching
289✅ Supports both handle (`!name.communities.instance.com`) and DID formats
290
291**Current Status:**
292- ✅ Phase 1 (post creation) - Already implemented
293- 📋 Phase 2 (subscriptions) - Deferred to Beta (lower priority)
294- ✅ Phase 3 (block endpoints) - COMPLETE (2025-11-16)
295
296---
297
298### ✅ did:web Domain Verification & hostedByDID Auto-Population - COMPLETE
299**Added:** 2025-10-11 | **Updated:** 2025-11-16 | **Completed:** 2025-11-16 | **Status:** ✅ DONE
300
301**Problem:**
3021. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains
3032. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing
304
305**Attack Scenarios:**
306- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves
307- Federation partners can't verify instance authenticity
308- AppView pollution with fake hosting claims
309
310**Solution Implemented (Bluesky-Compatible):**
3111. ✅ **Domain Matching**: Verify `did:web:` domain matches configured `instanceDomain`
3122. ✅ **Bidirectional Verification**: Fetch `https://domain/.well-known/did.json` and verify:
313 - DID document exists and is valid
314 - DID document ID matches claimed `instanceDID`
315 - DID document claims handle domain in `alsoKnownAs` field (bidirectional binding)
316 - Domain ownership proven via HTTPS hosting (matches Bluesky's trust model)
3173. ✅ **Auto-populate hostedByDID**: Removed from client API, derived from instance configuration in service layer
318
319**Current Status:**
320- ✅ Default changed from `coves.local` → `coves.social` (fixes `.local` TLD bug)
321- ✅ hostedByDID removed from client requests (2025-10-16)
322- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16)
323- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16)
324- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16)
325- ✅ **MANDATORY bidirectional DID verification** (2025-11-16)
326- ✅ Cache TTL updated to 24h (matches Bluesky recommendations) (2025-11-16)
327
328**Implementation Details:**
329- **Security Model**: Matches Bluesky's approach - relies on DNS/HTTPS authority, not cryptographic proof
330- **Enforcement**: MANDATORY hard-fail in production (rejects communities with verification failures)
331- **Dev Mode**: Set `SKIP_DID_WEB_VERIFICATION=true` to bypass verification for local development
332- **Performance**: Bounded LRU cache (1000 entries), rate limiting (10 req/s), 24h cache TTL
333- **Bidirectional Check**: Prevents impersonation by requiring DID document to claim the handle
334- **Location**: [internal/atproto/jetstream/community_consumer.go](../internal/atproto/jetstream/community_consumer.go)
335
336---
337
338### ✅ Token Refresh Logic for Community Credentials - COMPLETE
339**Added:** 2025-10-11 | **Completed:** 2025-10-17 | **Effort:** 1.5 days | **Status:** ✅ DONE
340
341**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
342
343**Solution Implemented:**
344- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration)
345- ✅ JWT expiration parsing without signature verification (`parseJWTExpiration`, `needsRefresh`)
346- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
347- ✅ Password fallback when refresh tokens expire (~2 months) via `atproto.ServerCreateSession`
348- ✅ Atomic credential updates (`UpdateCredentials` repository method)
349- ✅ Concurrency-safe with per-community mutex locking
350- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
351- ✅ Integration tests for token expiration detection and credential updates
352
353**Files Created:**
354- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) - JWT parsing utilities
355- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) - Refresh and re-auth logic
356- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) - Integration tests
357
358**Files Modified:**
359- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` + concurrency control
360- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
361- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
362
363**Documentation:** See [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) for full details
364
365**Impact:** ✅ Communities can now be updated 24+ hours after creation without manual intervention
366
367---
368
369### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE
370**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE
371
372**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it.
373
374**Solution Implemented:**
375- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3)
376- ✅ Store in subscription record on PDS (`social.coves.community.subscription`)
377- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint
378- ✅ Clamping at all layers (handler, service, consumer) for defense in depth
379- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount)
380- ✅ Idempotent operations (safe for Jetstream event replays)
381- ✅ Fixed critical collection name bug (was using wrong namespace)
382- ✅ Production Jetstream consumer now running
383- ✅ 13 comprehensive integration tests - all passing
384
385**Files Modified:**
386- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions
387- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility
388- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS
389- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes
390- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated
391- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes
392- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage
393
394**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details
395
396**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled)
397
398---
399
400### Community Blocking
401**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER
402
403**Problem:** Users have no way to block unwanted communities from their feeds.
404
405**Solution:**
4061. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only)
4072. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()`
4083. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock`
4094. **Repository:** Add methods to track blocked communities
4105. **Feed:** Filter blocked communities from feed queries (beta work)
411
412**Code:**
413- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs
414- Service: New methods needed
415- Handlers: New files needed
416
417**Impact:** Users can't avoid unwanted content without blocking
418
419---
420
421### ✅ Post comment_count Reconciliation - COMPLETE
422**Added:** 2025-11-04 | **Completed:** 2025-11-16 | **Effort:** 2 hours | **Status:** ✅ DONE
423
424**Problem:**
425When comments arrive before their parent post is indexed (common with cross-repo Jetstream ordering), the post's `comment_count` was never reconciled, causing posts to show permanently stale "0 comments" counters.
426
427**Solution Implemented:**
428- ✅ Post consumer reconciliation logic WAS already implemented at [post_consumer.go:210-226](../internal/atproto/jetstream/post_consumer.go#L210-L226)
429- ✅ Reconciliation query counts pre-existing comments when indexing new posts
430- ✅ Comprehensive test suite added: [post_consumer_test.go](../tests/integration/post_consumer_test.go)
431 - Single comment before post
432 - Multiple comments before post
433 - Mixed before/after ordering
434 - Idempotent indexing preserves counts
435- ✅ Updated outdated FIXME comment at [comment_consumer.go:362](../internal/atproto/jetstream/comment_consumer.go#L362)
436- ✅ All 4 test cases passing
437
438**Implementation:**
439```go
440// Post consumer reconciliation (lines 210-226)
441reconcileQuery := `
442 UPDATE posts
443 SET comment_count = (
444 SELECT COUNT(*)
445 FROM comments c
446 WHERE c.parent_uri = $1 AND c.deleted_at IS NULL
447 )
448 WHERE id = $2
449`
450_, reconcileErr := tx.ExecContext(ctx, reconcileQuery, post.URI, postID)
451```
452
453**Files Modified:**
454- `internal/atproto/jetstream/comment_consumer.go` - Updated documentation
455- `tests/integration/post_consumer_test.go` - Added comprehensive test coverage
456
457**Impact:** ✅ Post comment counters are now accurate regardless of Jetstream event ordering
458
459---
460
461## 🔴 P1.5: Federation Blockers (Beta Launch)
462
463### Cross-PDS Write-Forward Support for Community Service
464**Added:** 2025-10-17 | **Updated:** 2025-11-02 | **Effort:** 3-4 hours | **Priority:** FEDERATION BLOCKER (Beta)
465
466**Problem:** Community service write-forward methods assume all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to subscribe/block communities.
467
468**Current Behavior:**
469- User on `pds.bsky.social` subscribes to community on `coves.social`
470- Coves calls `s.pdsURL` (instance default: `http://localhost:3001`)
471- Write goes to WRONG PDS → fails with `{"error":"InvalidToken","message":"Malformed token"}`
472
473**Impact:**
474- ✅ **Alpha**: Works fine (single PDS deployment, no federation)
475- ❌ **Beta**: Breaks federation (users on different PDSs can't subscribe/block)
476
477**Root Cause:**
478- [service.go:1033](../internal/core/communities/service.go#L1033): `createRecordOnPDSAs` hardcodes `s.pdsURL`
479- [service.go:1050](../internal/core/communities/service.go#L1050): `putRecordOnPDSAs` hardcodes `s.pdsURL`
480- [service.go:1063](../internal/core/communities/service.go#L1063): `deleteRecordOnPDSAs` hardcodes `s.pdsURL`
481
482**Affected Operations:**
483- `SubscribeToCommunity` ([service.go:608](../internal/core/communities/service.go#L608))
484- `UnsubscribeFromCommunity` (calls `deleteRecordOnPDSAs`)
485- `BlockCommunity` ([service.go:739](../internal/core/communities/service.go#L739))
486- `UnblockCommunity` (calls `deleteRecordOnPDSAs`)
487
488**Solution:**
4891. Add `identityResolver identity.Resolver` to `communityService` struct
4902. Before write-forward, resolve user's DID → extract PDS URL
4913. Call user's actual PDS instead of hardcoded `s.pdsURL`
492
493**Implementation Pattern (from Vote Service):**
494```go
495// Add helper method to resolve user's PDS
496func (s *communityService) resolveUserPDS(ctx context.Context, userDID string) (string, error) {
497 identity, err := s.identityResolver.Resolve(ctx, userDID)
498 if err != nil {
499 return "", fmt.Errorf("failed to resolve user PDS: %w", err)
500 }
501 if identity.PDSURL == "" {
502 log.Printf("[COMMUNITY-PDS] WARNING: No PDS URL found for %s, using fallback: %s", userDID, s.pdsURL)
503 return s.pdsURL, nil
504 }
505 return identity.PDSURL, nil
506}
507
508// Update write-forward methods:
509func (s *communityService) createRecordOnPDSAs(ctx context.Context, repoDID, collection, rkey string, record map[string]interface{}, accessToken string) (string, string, error) {
510 // Resolve user's actual PDS (critical for federation)
511 pdsURL, err := s.resolveUserPDS(ctx, repoDID)
512 if err != nil {
513 return "", "", fmt.Errorf("failed to resolve user PDS: %w", err)
514 }
515 endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", strings.TrimSuffix(pdsURL, "/"))
516 // ... rest of method
517}
518```
519
520**Files to Modify:**
521- `internal/core/communities/service.go` - Add resolver field + `resolveUserPDS` helper
522- `internal/core/communities/service.go` - Update `createRecordOnPDSAs`, `putRecordOnPDSAs`, `deleteRecordOnPDSAs`
523- `cmd/server/main.go` - Pass identity resolver to community service constructor
524- Tests - Add cross-PDS subscription/block scenarios
525
526**Testing:**
527- User on external PDS subscribes to community → writes to their PDS
528- User on external PDS blocks community → writes to their PDS
529- Community profile updates still work (writes to community's own PDS)
530
531**Related:**
532- ✅ **Vote Service**: Fixed in Alpha (2025-11-02) - users can vote from any PDS
533- 🔴 **Community Service**: Deferred to Beta (no federation in Alpha)
534
535---
536
537## 🟢 P2: Nice-to-Have
538
539### Remove Categories from Community Lexicon
540**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup
541
542**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value.
543
544**Solution:**
545- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54)
546- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59)
547- Remove from [community.go:91](../internal/core/communities/community.go#L91)
548- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110))
549
550**Impact:** Simplifies lexicon, removes unused feature
551
552---
553
554### Improve .local TLD Error Messages
555**Added:** 2025-10-11 | **Effort:** 1 hour
556
557**Problem:** Generic error "TLD .local is not allowed" confuses developers.
558
559**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`.
560
561---
562
563### Self-Hosting Security Guide
564**Added:** 2025-10-11 | **Effort:** 1 day
565
566**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
567
568---
569
570### OAuth Session Cleanup Race Condition
571**Added:** 2025-10-11 | **Effort:** 2 hours
572
573**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
574
575**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout.
576
577---
578
579### Jetstream Consumer Race Condition
580**Added:** 2025-10-11 | **Effort:** 1 hour
581
582**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown.
583
584**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state.
585
586**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114)
587
588---
589
590### Unfurl Cache Cleanup Background Job
591**Added:** 2025-11-07 | **Effort:** 2-3 hours | **Priority:** Performance/Maintenance
592
593**Problem:** The `unfurl_cache` table will grow indefinitely as expired entries are not deleted. While the cache uses lazy expiration (checking `expires_at` on read), old records remain in the database consuming disk space.
594
595**Impact:**
596- 📊 ~1KB per cached URL
597- 📈 At 10K cached URLs = ~10MB (negligible for alpha)
598- ⚠️ At 1M cached URLs = ~1GB (potential issue at scale)
599- 🐌 Table bloat can slow down queries over time
600
601**Current Mitigation:**
602- ✅ Lazy expiration: Cache hits check `expires_at` and refetch if expired
603- ✅ Indexed on `expires_at` for efficient expiration queries
604- ✅ Not critical for alpha (growth is gradual)
605
606**Solution (Beta/Production):**
607Implement background cleanup job to delete expired entries:
608
609```go
610// Periodic cleanup (run daily or weekly)
611func (r *unfurlRepository) CleanupExpired(ctx context.Context) (int64, error) {
612 query := `DELETE FROM unfurl_cache WHERE expires_at < NOW()`
613 result, err := r.db.ExecContext(ctx, query)
614 if err != nil {
615 return 0, err
616 }
617 return result.RowsAffected()
618}
619```
620
621**Implementation Options:**
6221. **Cron job**: Separate process runs cleanup on schedule
6232. **Background goroutine**: Service-level background task with configurable interval
6243. **PostgreSQL pg_cron extension**: Database-level scheduled cleanup
625
626**Recommended Approach:**
627- Phase 1 (Beta): Background goroutine running weekly cleanup
628- Phase 2 (Production): Migrate to pg_cron or external cron for reliability
629
630**Configuration:**
631```bash
632UNFURL_CACHE_CLEANUP_ENABLED=true
633UNFURL_CACHE_CLEANUP_INTERVAL=168h # 7 days
634```
635
636**Monitoring:**
637- Log cleanup operations: `[UNFURL-CACHE-CLEANUP] Deleted 1234 expired entries`
638- Track table size growth over time
639- Alert if table exceeds threshold (e.g., 100MB)
640
641**Files to Create:**
642- `internal/core/unfurl/cleanup.go` - Background cleanup service
643
644**Related:**
645- Implemented in oEmbed unfurling feature (2025-11-07)
646- Cache table: [migration XXX_create_unfurl_cache.sql](../internal/db/migrations/)
647
648---
649
650## 🔵 P3: Technical Debt
651
652### Consolidate Environment Variable Validation
653**Added:** 2025-10-11 | **Effort:** 2-3 hours
654
655Create `internal/config` package with structured config validation. Fail fast with clear errors.
656
657---
658
659### Add Connection Pooling for PDS HTTP Clients
660**Added:** 2025-10-11 | **Effort:** 2 hours
661
662Create shared `http.Client` with connection pooling instead of new client per request.
663
664---
665
666### Architecture Decision Records (ADRs)
667**Added:** 2025-10-11 | **Effort:** Ongoing
668
669Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
670
671---
672
673### Replace log Package with Structured Logger
674**Added:** 2025-10-11 | **Effort:** 1 day
675
676**Problem:** Using standard `log` package. Need structured logging (JSON) with levels.
677
678**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields.
679
680**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46)
681
682---
683
684### PDS URL Resolution from DID
685**Added:** 2025-10-11 | **Effort:** 2-3 hours
686
687**Problem:** User consumer doesn't resolve PDS URL from DID document when missing.
688
689**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`.
690
691**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203)
692
693---
694
695## Recent Completions
696
697### ✅ Token Refresh for Community Credentials (2025-10-17)
698**Completed:** Automatic token refresh prevents communities from breaking after 2 hours
699
700**Implementation:**
701- ✅ JWT expiration parsing and refresh detection (5-minute buffer)
702- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
703- ✅ Password fallback when refresh tokens expire (`atproto.ServerCreateSession`)
704- ✅ Atomic credential updates in database (`UpdateCredentials`)
705- ✅ Concurrency-safe with per-community mutex locking
706- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
707- ✅ Integration tests for expiration detection and credential updates
708
709**Files Created:**
710- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go)
711- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go)
712- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go)
713
714**Files Modified:**
715- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` method
716- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
717- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
718
719**Documentation:** [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md)
720
721**Impact:** Communities now work indefinitely without manual token management
722
723---
724
725### ✅ OAuth Authentication for Community Actions (2025-10-16)
726**Completed:** Full OAuth JWT authentication flow for protected endpoints
727
728**Implementation:**
729- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling)
730- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints
731- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)`
732- ✅ Removed all X-User-DID header placeholders
733- ✅ E2E tests validate complete OAuth flow with real PDS tokens
734- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs
735
736**Files Modified:**
737- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility
738- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware
739- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated
740- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests
741
742**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above)
743
744---
745
746### ✅ Fix .local TLD Bug (2025-10-11)
747Changed default `INSTANCE_DID` from `did:web:coves.local` → `did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD.
748
749---
750
751## Prioritization
752
753- **P0:** Security vulns, data loss, prod blockers
754- **P1:** Major UX/reliability issues
755- **P2:** QOL improvements, minor bugs, docs
756- **P3:** Refactoring, code quality