A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt
2
3**Status:** Ongoing
4**Owner:** Platform Team
5**Last Updated:** 2025-10-17
6
7## Overview
8
9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
10
11---
12
13## 🔴 P0: Critical (Alpha Blockers)
14
15### OAuth DPoP Token Architecture - Voting Write-Forward
16**Added:** 2025-11-02 | **Completed:** 2025-11-02 | **Effort:** 2 hours | **Priority:** ALPHA BLOCKER
17**Status:** ✅ COMPLETE
18
19**Problem:**
20Our backend is attempting to use DPoP-bound OAuth tokens to write votes to users' PDSs, causing "Malformed token" errors. This violates atProto architecture patterns.
21
22**Current (Incorrect) Flow:**
23```
24Mobile Client (OAuth + DPoP) → Coves Backend → User's PDS ❌
25 ↓
26 "Malformed token" error
27```
28
29**Root Cause:**
30- Mobile app uses OAuth with DPoP (Demonstrating Proof of Possession)
31- DPoP tokens are cryptographically bound to client's private key via `cnf.jkt` claim
32- Each PDS request requires **both**:
33 - `Authorization: Bearer <token>`
34 - `DPoP: <signed-proof-jwt>` (signature proves client has private key)
35- Backend cannot create DPoP proofs (doesn't have client's private key)
36- **DPoP tokens are intentionally non-transferable** (security feature to prevent token theft)
37
38**Evidence:**
39```json
40// Token decoded from mobile app session
41{
42 "sub": "did:plc:txrork7rurdueix27ulzi7ke",
43 "cnf": {
44 "jkt": "LSWROJhTkPn4yT18xUjiIz2Z7z7l_gozKfjjQTYgW9o" // ← DPoP binding
45 },
46 "client_id": "https://lingering-darkness-50a6.brettmay0212.workers.dev/client-metadata.json",
47 "iss": "http://localhost:3001"
48}
49```
50
51**atProto Best Practice (from Bluesky social-app analysis):**
52- ✅ Clients write **directly to their own PDS** (no backend proxy)
53- ✅ AppView **only indexes** from Jetstream (eventual consistency)
54- ✅ PDS = User's personal data store (user controls writes)
55- ✅ AppView = Read-only aggregator/indexer
56- ❌ Backend should NOT proxy user write operations
57
58**Correct Architecture:**
59```
60Mobile Client → User's PDS (direct write with DPoP proof) ✓
61 ↓
62 Jetstream (firehose)
63 ↓
64 Coves AppView (indexes votes from firehose)
65```
66
67**Affected Endpoints:**
681. **Vote Creation** - [create_vote.go:76](../internal/api/handlers/vote/create_vote.go#L76)
69 - Currently: Backend writes to PDS using user's token
70 - Should: Return error directing client to write directly
71
722. **Vote Service** - [service.go:126](../internal/core/votes/service.go#L126)
73 - Currently: `createRecordOnPDSAs()` attempts write-forward
74 - Should: Remove write-forward, rely on Jetstream indexing only
75
76**Solution Options:**
77
78**Option A: Client Direct Write (RECOMMENDED - Follows Bluesky)**
79```typescript
80// Mobile client writes directly (like Bluesky social-app)
81const agent = new Agent(oauthSession)
82await agent.call('com.atproto.repo.createRecord', {
83 repo: userDid,
84 collection: 'social.coves.interaction.vote',
85 record: {
86 $type: 'social.coves.interaction.vote',
87 subject: { uri: postUri, cid: postCid },
88 direction: 'up',
89 createdAt: new Date().toISOString()
90 }
91})
92```
93
94Backend changes:
95- Remove write-forward code from vote service
96- Return error from XRPC endpoint: "Votes must be created directly at your PDS"
97- Index votes from Jetstream consumer (already implemented)
98
99**Option B: Backend App Passwords (NOT RECOMMENDED)**
100- User creates app-specific password
101- Backend uses password auth (gets regular JWTs, not DPoP)
102- Security downgrade, poor UX
103
104**Option C: Service Auth Token (Complex)**
105- Backend gets its own service credentials
106- Requires PDS to trust our AppView as delegated writer
107- Non-standard atProto pattern
108
109**Recommendation:** Option A (Client Direct Write)
110- Matches atProto architecture
111- Follows Bluesky social-app pattern
112- Best security (user controls their data)
113- Simplest implementation
114
115**Implementation Tasks:**
1161. Update Flutter OAuth package to expose `agent.call()` for custom lexicons
1172. Update mobile vote UI to write directly to PDS
1183. Remove write-forward code from backend vote service
1194. Update vote XRPC handler to return helpful error message
1205. Verify Jetstream consumer correctly indexes votes
1216. Update integration tests to match new flow
122
123**References:**
124- Bluesky social-app: Direct PDS writes via agent
125- atProto OAuth spec: DPoP binding prevents token reuse
126- atProto architecture: AppView = read-only indexer
127
128---
129
130### OAuth DPoP Token Architecture - Community Subscriptions
131**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER
132**Status:** 📋 TODO (Waiting for frontend implementation)
133
134**Problem:**
135Same DPoP token issue as voting - backend cannot use user's DPoP-bound OAuth tokens to write subscription records to user's PDS.
136
137**Affected Operations:**
138- `SubscribeToCommunity()` - [service.go:564-624](../internal/core/communities/service.go#L564-L624)
139- `UnsubscribeFromCommunity()` - [service.go:626-660](../internal/core/communities/service.go#L626-L660)
140
141**Collection:** `social.coves.community.subscription`
142
143**Solution:**
144Client writes directly using `com.atproto.repo.createRecord`:
145```typescript
146await agent.call('com.atproto.repo.createRecord', {
147 repo: userDid,
148 collection: 'social.coves.community.subscription',
149 record: {
150 $type: 'social.coves.community.subscription',
151 subject: communityDid,
152 contentVisibility: 3,
153 createdAt: new Date().toISOString()
154 }
155})
156```
157
158**Backend Changes Needed:**
1591. Remove write-forward from `SubscribeToCommunity()` and `UnsubscribeFromCommunity()`
1602. Update handlers to return errors directing to client-direct pattern
1613. Verify Jetstream consumer indexes subscriptions (already working)
162
163**Files to Modify:**
164- `internal/core/communities/service.go`
165- `internal/api/handlers/community/subscribe.go`
166
167---
168
169### OAuth DPoP Token Architecture - Community Blocking
170**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER
171**Status:** 📋 TODO (Waiting for frontend implementation)
172
173**Problem:**
174Same DPoP token issue - backend cannot use user's DPoP-bound OAuth tokens to write block records to user's PDS.
175
176**Affected Operations:**
177- `BlockCommunity()` - [service.go:709-781](../internal/core/communities/service.go#L709-L781)
178- `UnblockCommunity()` - [service.go:783-816](../internal/core/communities/service.go#L783-L816)
179
180**Collection:** `social.coves.community.block`
181
182**Solution:**
183Client writes directly using `com.atproto.repo.createRecord`:
184```typescript
185await agent.call('com.atproto.repo.createRecord', {
186 repo: userDid,
187 collection: 'social.coves.community.block',
188 record: {
189 $type: 'social.coves.community.block',
190 subject: communityDid,
191 createdAt: new Date().toISOString()
192 }
193})
194```
195
196**Backend Changes Needed:**
1971. Remove write-forward from `BlockCommunity()` and `UnblockCommunity()`
1982. Update handlers to return errors directing to client-direct pattern
1993. Verify Jetstream consumer indexes blocks (already working)
200
201**Files to Modify:**
202- `internal/core/communities/service.go`
203- `internal/api/handlers/community/block.go`
204
205---
206
207## 🟡 P1: Important (Alpha Blockers)
208
209### at-identifier Handle Resolution in Endpoints
210**Added:** 2025-10-18 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER
211
212**Problem:**
213Current implementation rejects handles in endpoints that declare `"format": "at-identifier"` in their lexicon schemas, violating atProto best practices and breaking legitimate client usage.
214
215**Impact:**
216- ❌ Post creation fails when client sends community handle (e.g., `!gardening.communities.coves.social`)
217- ❌ Subscribe/unsubscribe endpoints reject handles despite lexicon declaring `at-identifier`
218- ❌ Block endpoints use `"format": "did"` but should use `at-identifier` for consistency
219- 🔴 **P0 Issue:** API contract violation - clients following the schema are rejected
220
221**Root Cause:**
222Handlers and services validate `strings.HasPrefix(req.Community, "did:")` instead of calling `ResolveCommunityIdentifier()`.
223
224**Affected Endpoints:**
2251. **Post Creation** - [create.go:54](../internal/api/handlers/post/create.go#L54), [service.go:51](../internal/core/posts/service.go#L51)
226 - Lexicon declares `at-identifier`: [post/create.json:16](../internal/atproto/lexicon/social/coves/post/create.json#L16)
227
2282. **Subscribe** - [subscribe.go:52](../internal/api/handlers/community/subscribe.go#L52)
229 - Lexicon declares `at-identifier`: [subscribe.json:16](../internal/atproto/lexicon/social/coves/community/subscribe.json#L16)
230
2313. **Unsubscribe** - [subscribe.go:120](../internal/api/handlers/community/subscribe.go#L120)
232 - Lexicon declares `at-identifier`: [unsubscribe.json:16](../internal/atproto/lexicon/social/coves/community/unsubscribe.json#L16)
233
2344. **Block/Unblock** - [block.go:58](../internal/api/handlers/community/block.go#L58), [block.go:132](../internal/api/handlers/community/block.go#L132)
235 - Lexicon declares `"format": "did"`: [block.json:15](../internal/atproto/lexicon/social/coves/community/block.json#L15)
236 - Should be changed to `at-identifier` for consistency and best practice
237
238**atProto Best Practice (from docs):**
239- ✅ API endpoints should accept both DIDs and handles via `at-identifier` format
240- ✅ Resolve handles to DIDs immediately at API boundary
241- ✅ Use DIDs internally for all business logic and storage
242- ✅ Handles are weak refs (changeable), DIDs are strong refs (permanent)
243- ⚠️ Bidirectional verification required (already handled by `identity.CachingResolver`)
244
245**Solution:**
246Replace direct DID validation with handle resolution using existing `ResolveCommunityIdentifier()`:
247
248```go
249// BEFORE (wrong) ❌
250if !strings.HasPrefix(req.Community, "did:") {
251 return error
252}
253
254// AFTER (correct) ✅
255communityDID, err := h.communityService.ResolveCommunityIdentifier(ctx, req.Community)
256if err != nil {
257 if communities.IsNotFound(err) {
258 writeError(w, http.StatusNotFound, "CommunityNotFound", "Community not found")
259 return
260 }
261 writeError(w, http.StatusBadRequest, "InvalidRequest", err.Error())
262 return
263}
264// Now use communityDID (guaranteed to be a DID)
265```
266
267**Implementation Plan:**
2681. ✅ **Phase 1 (Alpha Blocker):** Fix post creation endpoint - COMPLETE (2025-10-18)
269 - Post creation already uses `ResolveCommunityIdentifier()` at [service.go:100](../internal/core/posts/service.go#L100)
270 - Supports handles, DIDs, and scoped formats
271
2722. 📋 **Phase 2 (Beta):** Fix subscription endpoints
273 - Update subscribe/unsubscribe handlers
274 - Add tests for handle resolution in subscriptions
275
2763. ✅ **Phase 3 (Beta):** Fix block endpoints - COMPLETE (2025-11-16)
277 - Updated block/unblock handlers to use `ResolveCommunityIdentifier()`
278 - Accepts handles (`@gaming.community.coves.social`), DIDs, and scoped format (`!gaming@coves.social`)
279 - Added comprehensive tests: [block_handle_resolution_test.go](../tests/integration/block_handle_resolution_test.go)
280 - All 7 test cases passing
281
282**Files Modified (Phase 3 - Block Endpoints):**
283- `internal/api/handlers/community/block.go` - Added `ResolveCommunityIdentifier()` calls
284- `tests/integration/block_handle_resolution_test.go` - Comprehensive test coverage
285
286**Existing Infrastructure:**
287✅ `ResolveCommunityIdentifier()` already implemented at [service.go:852](../internal/core/communities/service.go#L852)
288✅ `identity.CachingResolver` handles bidirectional verification and caching
289✅ Supports both handle (`!name.communities.instance.com`) and DID formats
290
291**Current Status:**
292- ✅ Phase 1 (post creation) - Already implemented
293- 📋 Phase 2 (subscriptions) - Deferred to Beta (lower priority)
294- ✅ Phase 3 (block endpoints) - COMPLETE (2025-11-16)
295
296---
297
298### did:web Domain Verification & hostedByDID Auto-Population
299**Added:** 2025-10-11 | **Updated:** 2025-10-16 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER
300
301**Problem:**
3021. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains
3032. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing
304
305**Attack Scenarios:**
306- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves
307- Federation partners can't verify instance authenticity
308- AppView pollution with fake hosting claims
309
310**Solution:**
3111. **Basic Validation (Phase 1)**: Verify `did:web:` domain matches configured `instanceDomain`
3122. **Cryptographic Verification (Phase 2)**: Fetch `https://domain/.well-known/did.json` and verify:
313 - DID document exists and is valid
314 - Domain ownership proven via HTTPS hosting
315 - DID document matches claimed `instanceDID`
3163. **Auto-populate hostedByDID**: Remove from client API, derive from instance configuration in service layer
317
318**Current Status:**
319- ✅ Default changed from `coves.local` → `coves.social` (fixes `.local` TLD bug)
320- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131)
321- ✅ hostedByDID removed from client requests (2025-10-16)
322- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16)
323- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16)
324- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16)
325- ⚠️ **REMAINING**: Full DID document verification (cryptographic proof of ownership)
326
327**Implementation Notes:**
328- Phase 1 complete: Basic validation catches config errors, logs warnings
329- Phase 2 needed: Fetch `https://domain/.well-known/did.json` and verify ownership
330- Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode
331- Full verification blocks startup if domain ownership cannot be proven
332
333---
334
335### ✅ Token Refresh Logic for Community Credentials - COMPLETE
336**Added:** 2025-10-11 | **Completed:** 2025-10-17 | **Effort:** 1.5 days | **Status:** ✅ DONE
337
338**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
339
340**Solution Implemented:**
341- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration)
342- ✅ JWT expiration parsing without signature verification (`parseJWTExpiration`, `needsRefresh`)
343- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
344- ✅ Password fallback when refresh tokens expire (~2 months) via `atproto.ServerCreateSession`
345- ✅ Atomic credential updates (`UpdateCredentials` repository method)
346- ✅ Concurrency-safe with per-community mutex locking
347- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
348- ✅ Integration tests for token expiration detection and credential updates
349
350**Files Created:**
351- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) - JWT parsing utilities
352- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) - Refresh and re-auth logic
353- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) - Integration tests
354
355**Files Modified:**
356- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` + concurrency control
357- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
358- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
359
360**Documentation:** See [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) for full details
361
362**Impact:** ✅ Communities can now be updated 24+ hours after creation without manual intervention
363
364---
365
366### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE
367**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE
368
369**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it.
370
371**Solution Implemented:**
372- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3)
373- ✅ Store in subscription record on PDS (`social.coves.community.subscription`)
374- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint
375- ✅ Clamping at all layers (handler, service, consumer) for defense in depth
376- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount)
377- ✅ Idempotent operations (safe for Jetstream event replays)
378- ✅ Fixed critical collection name bug (was using wrong namespace)
379- ✅ Production Jetstream consumer now running
380- ✅ 13 comprehensive integration tests - all passing
381
382**Files Modified:**
383- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions
384- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility
385- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS
386- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes
387- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated
388- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes
389- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage
390
391**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details
392
393**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled)
394
395---
396
397### Community Blocking
398**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER
399
400**Problem:** Users have no way to block unwanted communities from their feeds.
401
402**Solution:**
4031. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only)
4042. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()`
4053. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock`
4064. **Repository:** Add methods to track blocked communities
4075. **Feed:** Filter blocked communities from feed queries (beta work)
408
409**Code:**
410- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs
411- Service: New methods needed
412- Handlers: New files needed
413
414**Impact:** Users can't avoid unwanted content without blocking
415
416---
417
418### ✅ Post comment_count Reconciliation - COMPLETE
419**Added:** 2025-11-04 | **Completed:** 2025-11-16 | **Effort:** 2 hours | **Status:** ✅ DONE
420
421**Problem:**
422When comments arrive before their parent post is indexed (common with cross-repo Jetstream ordering), the post's `comment_count` was never reconciled, causing posts to show permanently stale "0 comments" counters.
423
424**Solution Implemented:**
425- ✅ Post consumer reconciliation logic WAS already implemented at [post_consumer.go:210-226](../internal/atproto/jetstream/post_consumer.go#L210-L226)
426- ✅ Reconciliation query counts pre-existing comments when indexing new posts
427- ✅ Comprehensive test suite added: [post_consumer_test.go](../tests/integration/post_consumer_test.go)
428 - Single comment before post
429 - Multiple comments before post
430 - Mixed before/after ordering
431 - Idempotent indexing preserves counts
432- ✅ Updated outdated FIXME comment at [comment_consumer.go:362](../internal/atproto/jetstream/comment_consumer.go#L362)
433- ✅ All 4 test cases passing
434
435**Implementation:**
436```go
437// Post consumer reconciliation (lines 210-226)
438reconcileQuery := `
439 UPDATE posts
440 SET comment_count = (
441 SELECT COUNT(*)
442 FROM comments c
443 WHERE c.parent_uri = $1 AND c.deleted_at IS NULL
444 )
445 WHERE id = $2
446`
447_, reconcileErr := tx.ExecContext(ctx, reconcileQuery, post.URI, postID)
448```
449
450**Files Modified:**
451- `internal/atproto/jetstream/comment_consumer.go` - Updated documentation
452- `tests/integration/post_consumer_test.go` - Added comprehensive test coverage
453
454**Impact:** ✅ Post comment counters are now accurate regardless of Jetstream event ordering
455
456---
457
458## 🔴 P1.5: Federation Blockers (Beta Launch)
459
460### Cross-PDS Write-Forward Support for Community Service
461**Added:** 2025-10-17 | **Updated:** 2025-11-02 | **Effort:** 3-4 hours | **Priority:** FEDERATION BLOCKER (Beta)
462
463**Problem:** Community service write-forward methods assume all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to subscribe/block communities.
464
465**Current Behavior:**
466- User on `pds.bsky.social` subscribes to community on `coves.social`
467- Coves calls `s.pdsURL` (instance default: `http://localhost:3001`)
468- Write goes to WRONG PDS → fails with `{"error":"InvalidToken","message":"Malformed token"}`
469
470**Impact:**
471- ✅ **Alpha**: Works fine (single PDS deployment, no federation)
472- ❌ **Beta**: Breaks federation (users on different PDSs can't subscribe/block)
473
474**Root Cause:**
475- [service.go:1033](../internal/core/communities/service.go#L1033): `createRecordOnPDSAs` hardcodes `s.pdsURL`
476- [service.go:1050](../internal/core/communities/service.go#L1050): `putRecordOnPDSAs` hardcodes `s.pdsURL`
477- [service.go:1063](../internal/core/communities/service.go#L1063): `deleteRecordOnPDSAs` hardcodes `s.pdsURL`
478
479**Affected Operations:**
480- `SubscribeToCommunity` ([service.go:608](../internal/core/communities/service.go#L608))
481- `UnsubscribeFromCommunity` (calls `deleteRecordOnPDSAs`)
482- `BlockCommunity` ([service.go:739](../internal/core/communities/service.go#L739))
483- `UnblockCommunity` (calls `deleteRecordOnPDSAs`)
484
485**Solution:**
4861. Add `identityResolver identity.Resolver` to `communityService` struct
4872. Before write-forward, resolve user's DID → extract PDS URL
4883. Call user's actual PDS instead of hardcoded `s.pdsURL`
489
490**Implementation Pattern (from Vote Service):**
491```go
492// Add helper method to resolve user's PDS
493func (s *communityService) resolveUserPDS(ctx context.Context, userDID string) (string, error) {
494 identity, err := s.identityResolver.Resolve(ctx, userDID)
495 if err != nil {
496 return "", fmt.Errorf("failed to resolve user PDS: %w", err)
497 }
498 if identity.PDSURL == "" {
499 log.Printf("[COMMUNITY-PDS] WARNING: No PDS URL found for %s, using fallback: %s", userDID, s.pdsURL)
500 return s.pdsURL, nil
501 }
502 return identity.PDSURL, nil
503}
504
505// Update write-forward methods:
506func (s *communityService) createRecordOnPDSAs(ctx context.Context, repoDID, collection, rkey string, record map[string]interface{}, accessToken string) (string, string, error) {
507 // Resolve user's actual PDS (critical for federation)
508 pdsURL, err := s.resolveUserPDS(ctx, repoDID)
509 if err != nil {
510 return "", "", fmt.Errorf("failed to resolve user PDS: %w", err)
511 }
512 endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", strings.TrimSuffix(pdsURL, "/"))
513 // ... rest of method
514}
515```
516
517**Files to Modify:**
518- `internal/core/communities/service.go` - Add resolver field + `resolveUserPDS` helper
519- `internal/core/communities/service.go` - Update `createRecordOnPDSAs`, `putRecordOnPDSAs`, `deleteRecordOnPDSAs`
520- `cmd/server/main.go` - Pass identity resolver to community service constructor
521- Tests - Add cross-PDS subscription/block scenarios
522
523**Testing:**
524- User on external PDS subscribes to community → writes to their PDS
525- User on external PDS blocks community → writes to their PDS
526- Community profile updates still work (writes to community's own PDS)
527
528**Related:**
529- ✅ **Vote Service**: Fixed in Alpha (2025-11-02) - users can vote from any PDS
530- 🔴 **Community Service**: Deferred to Beta (no federation in Alpha)
531
532---
533
534## 🟢 P2: Nice-to-Have
535
536### Remove Categories from Community Lexicon
537**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup
538
539**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value.
540
541**Solution:**
542- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54)
543- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59)
544- Remove from [community.go:91](../internal/core/communities/community.go#L91)
545- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110))
546
547**Impact:** Simplifies lexicon, removes unused feature
548
549---
550
551### Improve .local TLD Error Messages
552**Added:** 2025-10-11 | **Effort:** 1 hour
553
554**Problem:** Generic error "TLD .local is not allowed" confuses developers.
555
556**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`.
557
558---
559
560### Self-Hosting Security Guide
561**Added:** 2025-10-11 | **Effort:** 1 day
562
563**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
564
565---
566
567### OAuth Session Cleanup Race Condition
568**Added:** 2025-10-11 | **Effort:** 2 hours
569
570**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
571
572**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout.
573
574---
575
576### Jetstream Consumer Race Condition
577**Added:** 2025-10-11 | **Effort:** 1 hour
578
579**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown.
580
581**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state.
582
583**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114)
584
585---
586
587### Unfurl Cache Cleanup Background Job
588**Added:** 2025-11-07 | **Effort:** 2-3 hours | **Priority:** Performance/Maintenance
589
590**Problem:** The `unfurl_cache` table will grow indefinitely as expired entries are not deleted. While the cache uses lazy expiration (checking `expires_at` on read), old records remain in the database consuming disk space.
591
592**Impact:**
593- 📊 ~1KB per cached URL
594- 📈 At 10K cached URLs = ~10MB (negligible for alpha)
595- ⚠️ At 1M cached URLs = ~1GB (potential issue at scale)
596- 🐌 Table bloat can slow down queries over time
597
598**Current Mitigation:**
599- ✅ Lazy expiration: Cache hits check `expires_at` and refetch if expired
600- ✅ Indexed on `expires_at` for efficient expiration queries
601- ✅ Not critical for alpha (growth is gradual)
602
603**Solution (Beta/Production):**
604Implement background cleanup job to delete expired entries:
605
606```go
607// Periodic cleanup (run daily or weekly)
608func (r *unfurlRepository) CleanupExpired(ctx context.Context) (int64, error) {
609 query := `DELETE FROM unfurl_cache WHERE expires_at < NOW()`
610 result, err := r.db.ExecContext(ctx, query)
611 if err != nil {
612 return 0, err
613 }
614 return result.RowsAffected()
615}
616```
617
618**Implementation Options:**
6191. **Cron job**: Separate process runs cleanup on schedule
6202. **Background goroutine**: Service-level background task with configurable interval
6213. **PostgreSQL pg_cron extension**: Database-level scheduled cleanup
622
623**Recommended Approach:**
624- Phase 1 (Beta): Background goroutine running weekly cleanup
625- Phase 2 (Production): Migrate to pg_cron or external cron for reliability
626
627**Configuration:**
628```bash
629UNFURL_CACHE_CLEANUP_ENABLED=true
630UNFURL_CACHE_CLEANUP_INTERVAL=168h # 7 days
631```
632
633**Monitoring:**
634- Log cleanup operations: `[UNFURL-CACHE-CLEANUP] Deleted 1234 expired entries`
635- Track table size growth over time
636- Alert if table exceeds threshold (e.g., 100MB)
637
638**Files to Create:**
639- `internal/core/unfurl/cleanup.go` - Background cleanup service
640
641**Related:**
642- Implemented in oEmbed unfurling feature (2025-11-07)
643- Cache table: [migration XXX_create_unfurl_cache.sql](../internal/db/migrations/)
644
645---
646
647## 🔵 P3: Technical Debt
648
649### Consolidate Environment Variable Validation
650**Added:** 2025-10-11 | **Effort:** 2-3 hours
651
652Create `internal/config` package with structured config validation. Fail fast with clear errors.
653
654---
655
656### Add Connection Pooling for PDS HTTP Clients
657**Added:** 2025-10-11 | **Effort:** 2 hours
658
659Create shared `http.Client` with connection pooling instead of new client per request.
660
661---
662
663### Architecture Decision Records (ADRs)
664**Added:** 2025-10-11 | **Effort:** Ongoing
665
666Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
667
668---
669
670### Replace log Package with Structured Logger
671**Added:** 2025-10-11 | **Effort:** 1 day
672
673**Problem:** Using standard `log` package. Need structured logging (JSON) with levels.
674
675**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields.
676
677**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46)
678
679---
680
681### PDS URL Resolution from DID
682**Added:** 2025-10-11 | **Effort:** 2-3 hours
683
684**Problem:** User consumer doesn't resolve PDS URL from DID document when missing.
685
686**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`.
687
688**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203)
689
690---
691
692## Recent Completions
693
694### ✅ Token Refresh for Community Credentials (2025-10-17)
695**Completed:** Automatic token refresh prevents communities from breaking after 2 hours
696
697**Implementation:**
698- ✅ JWT expiration parsing and refresh detection (5-minute buffer)
699- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
700- ✅ Password fallback when refresh tokens expire (`atproto.ServerCreateSession`)
701- ✅ Atomic credential updates in database (`UpdateCredentials`)
702- ✅ Concurrency-safe with per-community mutex locking
703- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
704- ✅ Integration tests for expiration detection and credential updates
705
706**Files Created:**
707- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go)
708- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go)
709- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go)
710
711**Files Modified:**
712- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` method
713- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
714- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
715
716**Documentation:** [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md)
717
718**Impact:** Communities now work indefinitely without manual token management
719
720---
721
722### ✅ OAuth Authentication for Community Actions (2025-10-16)
723**Completed:** Full OAuth JWT authentication flow for protected endpoints
724
725**Implementation:**
726- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling)
727- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints
728- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)`
729- ✅ Removed all X-User-DID header placeholders
730- ✅ E2E tests validate complete OAuth flow with real PDS tokens
731- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs
732
733**Files Modified:**
734- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility
735- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware
736- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated
737- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests
738
739**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above)
740
741---
742
743### ✅ Fix .local TLD Bug (2025-10-11)
744Changed default `INSTANCE_DID` from `did:web:coves.local` → `did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD.
745
746---
747
748## Prioritization
749
750- **P0:** Security vulns, data loss, prod blockers
751- **P1:** Major UX/reliability issues
752- **P2:** QOL improvements, minor bugs, docs
753- **P3:** Refactoring, code quality