A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt 2 3**Status:** Ongoing 4**Owner:** Platform Team 5**Last Updated:** 2025-10-17 6 7## Overview 8 9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs. 10 11--- 12 13## 🔴 P0: Critical (Alpha Blockers) 14 15### OAuth DPoP Token Architecture - Voting Write-Forward 16**Added:** 2025-11-02 | **Completed:** 2025-11-02 | **Effort:** 2 hours | **Priority:** ALPHA BLOCKER 17**Status:** ✅ COMPLETE 18 19**Problem:** 20Our backend is attempting to use DPoP-bound OAuth tokens to write votes to users' PDSs, causing "Malformed token" errors. This violates atProto architecture patterns. 21 22**Current (Incorrect) Flow:** 23``` 24Mobile Client (OAuth + DPoP) → Coves Backend → User's PDS ❌ 25 26 "Malformed token" error 27``` 28 29**Root Cause:** 30- Mobile app uses OAuth with DPoP (Demonstrating Proof of Possession) 31- DPoP tokens are cryptographically bound to client's private key via `cnf.jkt` claim 32- Each PDS request requires **both**: 33 - `Authorization: Bearer <token>` 34 - `DPoP: <signed-proof-jwt>` (signature proves client has private key) 35- Backend cannot create DPoP proofs (doesn't have client's private key) 36- **DPoP tokens are intentionally non-transferable** (security feature to prevent token theft) 37 38**Evidence:** 39```json 40// Token decoded from mobile app session 41{ 42 "sub": "did:plc:txrork7rurdueix27ulzi7ke", 43 "cnf": { 44 "jkt": "LSWROJhTkPn4yT18xUjiIz2Z7z7l_gozKfjjQTYgW9o" // ← DPoP binding 45 }, 46 "client_id": "https://lingering-darkness-50a6.brettmay0212.workers.dev/client-metadata.json", 47 "iss": "http://localhost:3001" 48} 49``` 50 51**atProto Best Practice (from Bluesky social-app analysis):** 52- ✅ Clients write **directly to their own PDS** (no backend proxy) 53- ✅ AppView **only indexes** from Jetstream (eventual consistency) 54- ✅ PDS = User's personal data store (user controls writes) 55- ✅ AppView = Read-only aggregator/indexer 56- ❌ Backend should NOT proxy user write operations 57 58**Correct Architecture:** 59``` 60Mobile Client → User's PDS (direct write with DPoP proof) ✓ 61 62 Jetstream (firehose) 63 64 Coves AppView (indexes votes from firehose) 65``` 66 67**Affected Endpoints:** 681. **Vote Creation** - [create_vote.go:76](../internal/api/handlers/vote/create_vote.go#L76) 69 - Currently: Backend writes to PDS using user's token 70 - Should: Return error directing client to write directly 71 722. **Vote Service** - [service.go:126](../internal/core/votes/service.go#L126) 73 - Currently: `createRecordOnPDSAs()` attempts write-forward 74 - Should: Remove write-forward, rely on Jetstream indexing only 75 76**Solution Options:** 77 78**Option A: Client Direct Write (RECOMMENDED - Follows Bluesky)** 79```typescript 80// Mobile client writes directly (like Bluesky social-app) 81const agent = new Agent(oauthSession) 82await agent.call('com.atproto.repo.createRecord', { 83 repo: userDid, 84 collection: 'social.coves.interaction.vote', 85 record: { 86 $type: 'social.coves.interaction.vote', 87 subject: { uri: postUri, cid: postCid }, 88 direction: 'up', 89 createdAt: new Date().toISOString() 90 } 91}) 92``` 93 94Backend changes: 95- Remove write-forward code from vote service 96- Return error from XRPC endpoint: "Votes must be created directly at your PDS" 97- Index votes from Jetstream consumer (already implemented) 98 99**Option B: Backend App Passwords (NOT RECOMMENDED)** 100- User creates app-specific password 101- Backend uses password auth (gets regular JWTs, not DPoP) 102- Security downgrade, poor UX 103 104**Option C: Service Auth Token (Complex)** 105- Backend gets its own service credentials 106- Requires PDS to trust our AppView as delegated writer 107- Non-standard atProto pattern 108 109**Recommendation:** Option A (Client Direct Write) 110- Matches atProto architecture 111- Follows Bluesky social-app pattern 112- Best security (user controls their data) 113- Simplest implementation 114 115**Implementation Tasks:** 1161. Update Flutter OAuth package to expose `agent.call()` for custom lexicons 1172. Update mobile vote UI to write directly to PDS 1183. Remove write-forward code from backend vote service 1194. Update vote XRPC handler to return helpful error message 1205. Verify Jetstream consumer correctly indexes votes 1216. Update integration tests to match new flow 122 123**References:** 124- Bluesky social-app: Direct PDS writes via agent 125- atProto OAuth spec: DPoP binding prevents token reuse 126- atProto architecture: AppView = read-only indexer 127 128--- 129 130### OAuth DPoP Token Architecture - Community Subscriptions 131**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER 132**Status:** 📋 TODO (Waiting for frontend implementation) 133 134**Problem:** 135Same DPoP token issue as voting - backend cannot use user's DPoP-bound OAuth tokens to write subscription records to user's PDS. 136 137**Affected Operations:** 138- `SubscribeToCommunity()` - [service.go:564-624](../internal/core/communities/service.go#L564-L624) 139- `UnsubscribeFromCommunity()` - [service.go:626-660](../internal/core/communities/service.go#L626-L660) 140 141**Collection:** `social.coves.community.subscription` 142 143**Solution:** 144Client writes directly using `com.atproto.repo.createRecord`: 145```typescript 146await agent.call('com.atproto.repo.createRecord', { 147 repo: userDid, 148 collection: 'social.coves.community.subscription', 149 record: { 150 $type: 'social.coves.community.subscription', 151 subject: communityDid, 152 contentVisibility: 3, 153 createdAt: new Date().toISOString() 154 } 155}) 156``` 157 158**Backend Changes Needed:** 1591. Remove write-forward from `SubscribeToCommunity()` and `UnsubscribeFromCommunity()` 1602. Update handlers to return errors directing to client-direct pattern 1613. Verify Jetstream consumer indexes subscriptions (already working) 162 163**Files to Modify:** 164- `internal/core/communities/service.go` 165- `internal/api/handlers/community/subscribe.go` 166 167--- 168 169### OAuth DPoP Token Architecture - Community Blocking 170**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER 171**Status:** 📋 TODO (Waiting for frontend implementation) 172 173**Problem:** 174Same DPoP token issue - backend cannot use user's DPoP-bound OAuth tokens to write block records to user's PDS. 175 176**Affected Operations:** 177- `BlockCommunity()` - [service.go:709-781](../internal/core/communities/service.go#L709-L781) 178- `UnblockCommunity()` - [service.go:783-816](../internal/core/communities/service.go#L783-L816) 179 180**Collection:** `social.coves.community.block` 181 182**Solution:** 183Client writes directly using `com.atproto.repo.createRecord`: 184```typescript 185await agent.call('com.atproto.repo.createRecord', { 186 repo: userDid, 187 collection: 'social.coves.community.block', 188 record: { 189 $type: 'social.coves.community.block', 190 subject: communityDid, 191 createdAt: new Date().toISOString() 192 } 193}) 194``` 195 196**Backend Changes Needed:** 1971. Remove write-forward from `BlockCommunity()` and `UnblockCommunity()` 1982. Update handlers to return errors directing to client-direct pattern 1993. Verify Jetstream consumer indexes blocks (already working) 200 201**Files to Modify:** 202- `internal/core/communities/service.go` 203- `internal/api/handlers/community/block.go` 204 205--- 206 207## 🟡 P1: Important (Alpha Blockers) 208 209### at-identifier Handle Resolution in Endpoints 210**Added:** 2025-10-18 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER 211 212**Problem:** 213Current implementation rejects handles in endpoints that declare `"format": "at-identifier"` in their lexicon schemas, violating atProto best practices and breaking legitimate client usage. 214 215**Impact:** 216- ❌ Post creation fails when client sends community handle (e.g., `!gardening.communities.coves.social`) 217- ❌ Subscribe/unsubscribe endpoints reject handles despite lexicon declaring `at-identifier` 218- ❌ Block endpoints use `"format": "did"` but should use `at-identifier` for consistency 219- 🔴 **P0 Issue:** API contract violation - clients following the schema are rejected 220 221**Root Cause:** 222Handlers and services validate `strings.HasPrefix(req.Community, "did:")` instead of calling `ResolveCommunityIdentifier()`. 223 224**Affected Endpoints:** 2251. **Post Creation** - [create.go:54](../internal/api/handlers/post/create.go#L54), [service.go:51](../internal/core/posts/service.go#L51) 226 - Lexicon declares `at-identifier`: [post/create.json:16](../internal/atproto/lexicon/social/coves/post/create.json#L16) 227 2282. **Subscribe** - [subscribe.go:52](../internal/api/handlers/community/subscribe.go#L52) 229 - Lexicon declares `at-identifier`: [subscribe.json:16](../internal/atproto/lexicon/social/coves/community/subscribe.json#L16) 230 2313. **Unsubscribe** - [subscribe.go:120](../internal/api/handlers/community/subscribe.go#L120) 232 - Lexicon declares `at-identifier`: [unsubscribe.json:16](../internal/atproto/lexicon/social/coves/community/unsubscribe.json#L16) 233 2344. **Block/Unblock** - [block.go:58](../internal/api/handlers/community/block.go#L58), [block.go:132](../internal/api/handlers/community/block.go#L132) 235 - Lexicon declares `"format": "did"`: [block.json:15](../internal/atproto/lexicon/social/coves/community/block.json#L15) 236 - Should be changed to `at-identifier` for consistency and best practice 237 238**atProto Best Practice (from docs):** 239- ✅ API endpoints should accept both DIDs and handles via `at-identifier` format 240- ✅ Resolve handles to DIDs immediately at API boundary 241- ✅ Use DIDs internally for all business logic and storage 242- ✅ Handles are weak refs (changeable), DIDs are strong refs (permanent) 243- ⚠️ Bidirectional verification required (already handled by `identity.CachingResolver`) 244 245**Solution:** 246Replace direct DID validation with handle resolution using existing `ResolveCommunityIdentifier()`: 247 248```go 249// BEFORE (wrong) ❌ 250if !strings.HasPrefix(req.Community, "did:") { 251 return error 252} 253 254// AFTER (correct) ✅ 255communityDID, err := h.communityService.ResolveCommunityIdentifier(ctx, req.Community) 256if err != nil { 257 if communities.IsNotFound(err) { 258 writeError(w, http.StatusNotFound, "CommunityNotFound", "Community not found") 259 return 260 } 261 writeError(w, http.StatusBadRequest, "InvalidRequest", err.Error()) 262 return 263} 264// Now use communityDID (guaranteed to be a DID) 265``` 266 267**Implementation Plan:** 2681.**Phase 1 (Alpha Blocker):** Fix post creation endpoint 269 - Update handler validation in `internal/api/handlers/post/create.go` 270 - Update service validation in `internal/core/posts/service.go` 271 - Add integration tests for handle resolution in post creation 272 2732. 📋 **Phase 2 (Beta):** Fix subscription endpoints 274 - Update subscribe/unsubscribe handlers 275 - Add tests for handle resolution in subscriptions 276 2773. 📋 **Phase 3 (Beta):** Fix block endpoints 278 - Update lexicon from `"format": "did"``"format": "at-identifier"` 279 - Update block/unblock handlers 280 - Add tests for handle resolution in blocking 281 282**Files to Modify (Phase 1 - Post Creation):** 283- `internal/api/handlers/post/create.go` - Remove DID validation, add handle resolution 284- `internal/core/posts/service.go` - Remove DID validation, add handle resolution 285- `internal/core/posts/interfaces.go` - Add `CommunityService` dependency 286- `cmd/server/main.go` - Pass community service to post service constructor 287- `tests/integration/post_creation_test.go` - Add handle resolution test cases 288 289**Existing Infrastructure:** 290`ResolveCommunityIdentifier()` already implemented at [service.go:843](../internal/core/communities/service.go#L843) 291`identity.CachingResolver` handles bidirectional verification and caching 292✅ Supports both handle (`!name.communities.instance.com`) and DID formats 293 294**Current Status:** 295- ⚠️ **BLOCKING POST CREATION PR**: Identified as P0 issue in code review 296- 📋 Phase 1 (post creation) - To be implemented immediately 297- 📋 Phase 2-3 (other endpoints) - Deferred to Beta 298 299--- 300 301### did:web Domain Verification & hostedByDID Auto-Population 302**Added:** 2025-10-11 | **Updated:** 2025-10-16 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER 303 304**Problem:** 3051. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains 3062. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing 307 308**Attack Scenarios:** 309- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves 310- Federation partners can't verify instance authenticity 311- AppView pollution with fake hosting claims 312 313**Solution:** 3141. **Basic Validation (Phase 1)**: Verify `did:web:` domain matches configured `instanceDomain` 3152. **Cryptographic Verification (Phase 2)**: Fetch `https://domain/.well-known/did.json` and verify: 316 - DID document exists and is valid 317 - Domain ownership proven via HTTPS hosting 318 - DID document matches claimed `instanceDID` 3193. **Auto-populate hostedByDID**: Remove from client API, derive from instance configuration in service layer 320 321**Current Status:** 322- ✅ Default changed from `coves.local``coves.social` (fixes `.local` TLD bug) 323- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131) 324- ✅ hostedByDID removed from client requests (2025-10-16) 325- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16) 326- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16) 327- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16) 328- ⚠️ **REMAINING**: Full DID document verification (cryptographic proof of ownership) 329 330**Implementation Notes:** 331- Phase 1 complete: Basic validation catches config errors, logs warnings 332- Phase 2 needed: Fetch `https://domain/.well-known/did.json` and verify ownership 333- Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode 334- Full verification blocks startup if domain ownership cannot be proven 335 336--- 337 338### ✅ Token Refresh Logic for Community Credentials - COMPLETE 339**Added:** 2025-10-11 | **Completed:** 2025-10-17 | **Effort:** 1.5 days | **Status:** ✅ DONE 340 341**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention. 342 343**Solution Implemented:** 344- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration) 345- ✅ JWT expiration parsing without signature verification (`parseJWTExpiration`, `needsRefresh`) 346- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`) 347- ✅ Password fallback when refresh tokens expire (~2 months) via `atproto.ServerCreateSession` 348- ✅ Atomic credential updates (`UpdateCredentials` repository method) 349- ✅ Concurrency-safe with per-community mutex locking 350- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events) 351- ✅ Integration tests for token expiration detection and credential updates 352 353**Files Created:** 354- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) - JWT parsing utilities 355- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) - Refresh and re-auth logic 356- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) - Integration tests 357 358**Files Modified:** 359- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` + concurrency control 360- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface 361- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials` 362 363**Documentation:** See [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) for full details 364 365**Impact:** ✅ Communities can now be updated 24+ hours after creation without manual intervention 366 367--- 368 369### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE 370**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE 371 372**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it. 373 374**Solution Implemented:** 375- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3) 376- ✅ Store in subscription record on PDS (`social.coves.community.subscription`) 377- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint 378- ✅ Clamping at all layers (handler, service, consumer) for defense in depth 379- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount) 380- ✅ Idempotent operations (safe for Jetstream event replays) 381- ✅ Fixed critical collection name bug (was using wrong namespace) 382- ✅ Production Jetstream consumer now running 383- ✅ 13 comprehensive integration tests - all passing 384 385**Files Modified:** 386- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions 387- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility 388- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS 389- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes 390- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated 391- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes 392- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage 393 394**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details 395 396**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled) 397 398--- 399 400### Community Blocking 401**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER 402 403**Problem:** Users have no way to block unwanted communities from their feeds. 404 405**Solution:** 4061. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only) 4072. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()` 4083. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock` 4094. **Repository:** Add methods to track blocked communities 4105. **Feed:** Filter blocked communities from feed queries (beta work) 411 412**Code:** 413- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs 414- Service: New methods needed 415- Handlers: New files needed 416 417**Impact:** Users can't avoid unwanted content without blocking 418 419--- 420 421### Post comment_count Reconciliation Missing 422**Added:** 2025-11-04 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER 423 424**Problem:** 425When comments arrive before their parent post is indexed (common with cross-repo Jetstream ordering), the post's `comment_count` is never reconciled. Later, when the post consumer indexes the post, there's no logic to count pre-existing comments. This causes posts to have permanently stale `comment_count` values. 426 427**End-User Impact:** 428- 🔴 Posts show "0 comments" when they actually have comments 429- ❌ Broken engagement signals (users don't know there are discussions) 430- ❌ UI inconsistency (thread page shows comments, but counter says "0") 431- ⚠️ Users may not click into posts thinking they're empty 432- 📉 Reduced engagement due to misleading counters 433 434**Root Cause:** 435- Comment consumer updates post counts when processing comment events ([comment_consumer.go:323-343](../internal/atproto/jetstream/comment_consumer.go#L323-L343)) 436- If comment arrives BEFORE post is indexed, update query returns 0 rows (only logs warning) 437- When post consumer later indexes the post, it sets `comment_count = 0` with NO reconciliation 438- Comments already exist in DB, but post never "discovers" them 439 440**Solution:** 441Post consumer MUST implement the same reconciliation pattern as comment consumer (see [comment_consumer.go:292-305](../internal/atproto/jetstream/comment_consumer.go#L292-L305)): 442 443```go 444// After inserting new post, reconcile comment_count for out-of-order comments 445reconcileQuery := ` 446 UPDATE posts 447 SET comment_count = ( 448 SELECT COUNT(*) 449 FROM comments c 450 WHERE c.parent_uri = $1 AND c.deleted_at IS NULL 451 ) 452 WHERE id = $2 453` 454_, reconcileErr := tx.ExecContext(ctx, reconcileQuery, postURI, postID) 455``` 456 457**Affected Operations:** 458- Post indexing from Jetstream ([post_consumer.go](../internal/atproto/jetstream/post_consumer.go)) 459- Any cross-repo event ordering (community DID ≠ author DID) 460 461**Current Status:** 462- 🔴 Issue documented with FIXME(P1) comment at [comment_consumer.go:311-321](../internal/atproto/jetstream/comment_consumer.go#L311-L321) 463- ⚠️ Test demonstrating limitation exists: `TestCommentConsumer_PostCountReconciliation_Limitation` 464- 📋 Fix required in post consumer (out of scope for comment system PR) 465 466**Files to Modify:** 467- `internal/atproto/jetstream/post_consumer.go` - Add reconciliation after post creation 468- `tests/integration/post_consumer_test.go` - Add test for out-of-order comment reconciliation 469 470**Similar Issue Fixed:** 471- ✅ Comment reply_count reconciliation - Fixed in comment system implementation (2025-11-04) 472 473--- 474 475## 🔴 P1.5: Federation Blockers (Beta Launch) 476 477### Cross-PDS Write-Forward Support for Community Service 478**Added:** 2025-10-17 | **Updated:** 2025-11-02 | **Effort:** 3-4 hours | **Priority:** FEDERATION BLOCKER (Beta) 479 480**Problem:** Community service write-forward methods assume all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to subscribe/block communities. 481 482**Current Behavior:** 483- User on `pds.bsky.social` subscribes to community on `coves.social` 484- Coves calls `s.pdsURL` (instance default: `http://localhost:3001`) 485- Write goes to WRONG PDS → fails with `{"error":"InvalidToken","message":"Malformed token"}` 486 487**Impact:** 488-**Alpha**: Works fine (single PDS deployment, no federation) 489-**Beta**: Breaks federation (users on different PDSs can't subscribe/block) 490 491**Root Cause:** 492- [service.go:1033](../internal/core/communities/service.go#L1033): `createRecordOnPDSAs` hardcodes `s.pdsURL` 493- [service.go:1050](../internal/core/communities/service.go#L1050): `putRecordOnPDSAs` hardcodes `s.pdsURL` 494- [service.go:1063](../internal/core/communities/service.go#L1063): `deleteRecordOnPDSAs` hardcodes `s.pdsURL` 495 496**Affected Operations:** 497- `SubscribeToCommunity` ([service.go:608](../internal/core/communities/service.go#L608)) 498- `UnsubscribeFromCommunity` (calls `deleteRecordOnPDSAs`) 499- `BlockCommunity` ([service.go:739](../internal/core/communities/service.go#L739)) 500- `UnblockCommunity` (calls `deleteRecordOnPDSAs`) 501 502**Solution:** 5031. Add `identityResolver identity.Resolver` to `communityService` struct 5042. Before write-forward, resolve user's DID → extract PDS URL 5053. Call user's actual PDS instead of hardcoded `s.pdsURL` 506 507**Implementation Pattern (from Vote Service):** 508```go 509// Add helper method to resolve user's PDS 510func (s *communityService) resolveUserPDS(ctx context.Context, userDID string) (string, error) { 511 identity, err := s.identityResolver.Resolve(ctx, userDID) 512 if err != nil { 513 return "", fmt.Errorf("failed to resolve user PDS: %w", err) 514 } 515 if identity.PDSURL == "" { 516 log.Printf("[COMMUNITY-PDS] WARNING: No PDS URL found for %s, using fallback: %s", userDID, s.pdsURL) 517 return s.pdsURL, nil 518 } 519 return identity.PDSURL, nil 520} 521 522// Update write-forward methods: 523func (s *communityService) createRecordOnPDSAs(ctx context.Context, repoDID, collection, rkey string, record map[string]interface{}, accessToken string) (string, string, error) { 524 // Resolve user's actual PDS (critical for federation) 525 pdsURL, err := s.resolveUserPDS(ctx, repoDID) 526 if err != nil { 527 return "", "", fmt.Errorf("failed to resolve user PDS: %w", err) 528 } 529 endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", strings.TrimSuffix(pdsURL, "/")) 530 // ... rest of method 531} 532``` 533 534**Files to Modify:** 535- `internal/core/communities/service.go` - Add resolver field + `resolveUserPDS` helper 536- `internal/core/communities/service.go` - Update `createRecordOnPDSAs`, `putRecordOnPDSAs`, `deleteRecordOnPDSAs` 537- `cmd/server/main.go` - Pass identity resolver to community service constructor 538- Tests - Add cross-PDS subscription/block scenarios 539 540**Testing:** 541- User on external PDS subscribes to community → writes to their PDS 542- User on external PDS blocks community → writes to their PDS 543- Community profile updates still work (writes to community's own PDS) 544 545**Related:** 546-**Vote Service**: Fixed in Alpha (2025-11-02) - users can vote from any PDS 547- 🔴 **Community Service**: Deferred to Beta (no federation in Alpha) 548 549--- 550 551## 🟢 P2: Nice-to-Have 552 553### Remove Categories from Community Lexicon 554**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup 555 556**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value. 557 558**Solution:** 559- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54) 560- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59) 561- Remove from [community.go:91](../internal/core/communities/community.go#L91) 562- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110)) 563 564**Impact:** Simplifies lexicon, removes unused feature 565 566--- 567 568### Improve .local TLD Error Messages 569**Added:** 2025-10-11 | **Effort:** 1 hour 570 571**Problem:** Generic error "TLD .local is not allowed" confuses developers. 572 573**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`. 574 575--- 576 577### Self-Hosting Security Guide 578**Added:** 2025-10-11 | **Effort:** 1 day 579 580**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring. 581 582--- 583 584### OAuth Session Cleanup Race Condition 585**Added:** 2025-10-11 | **Effort:** 2 hours 586 587**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections. 588 589**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout. 590 591--- 592 593### Jetstream Consumer Race Condition 594**Added:** 2025-10-11 | **Effort:** 1 hour 595 596**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown. 597 598**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state. 599 600**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114) 601 602--- 603 604### Unfurl Cache Cleanup Background Job 605**Added:** 2025-11-07 | **Effort:** 2-3 hours | **Priority:** Performance/Maintenance 606 607**Problem:** The `unfurl_cache` table will grow indefinitely as expired entries are not deleted. While the cache uses lazy expiration (checking `expires_at` on read), old records remain in the database consuming disk space. 608 609**Impact:** 610- 📊 ~1KB per cached URL 611- 📈 At 10K cached URLs = ~10MB (negligible for alpha) 612- ⚠️ At 1M cached URLs = ~1GB (potential issue at scale) 613- 🐌 Table bloat can slow down queries over time 614 615**Current Mitigation:** 616- ✅ Lazy expiration: Cache hits check `expires_at` and refetch if expired 617- ✅ Indexed on `expires_at` for efficient expiration queries 618- ✅ Not critical for alpha (growth is gradual) 619 620**Solution (Beta/Production):** 621Implement background cleanup job to delete expired entries: 622 623```go 624// Periodic cleanup (run daily or weekly) 625func (r *unfurlRepository) CleanupExpired(ctx context.Context) (int64, error) { 626 query := `DELETE FROM unfurl_cache WHERE expires_at < NOW()` 627 result, err := r.db.ExecContext(ctx, query) 628 if err != nil { 629 return 0, err 630 } 631 return result.RowsAffected() 632} 633``` 634 635**Implementation Options:** 6361. **Cron job**: Separate process runs cleanup on schedule 6372. **Background goroutine**: Service-level background task with configurable interval 6383. **PostgreSQL pg_cron extension**: Database-level scheduled cleanup 639 640**Recommended Approach:** 641- Phase 1 (Beta): Background goroutine running weekly cleanup 642- Phase 2 (Production): Migrate to pg_cron or external cron for reliability 643 644**Configuration:** 645```bash 646UNFURL_CACHE_CLEANUP_ENABLED=true 647UNFURL_CACHE_CLEANUP_INTERVAL=168h # 7 days 648``` 649 650**Monitoring:** 651- Log cleanup operations: `[UNFURL-CACHE-CLEANUP] Deleted 1234 expired entries` 652- Track table size growth over time 653- Alert if table exceeds threshold (e.g., 100MB) 654 655**Files to Create:** 656- `internal/core/unfurl/cleanup.go` - Background cleanup service 657 658**Related:** 659- Implemented in oEmbed unfurling feature (2025-11-07) 660- Cache table: [migration XXX_create_unfurl_cache.sql](../internal/db/migrations/) 661 662--- 663 664## 🔵 P3: Technical Debt 665 666### Consolidate Environment Variable Validation 667**Added:** 2025-10-11 | **Effort:** 2-3 hours 668 669Create `internal/config` package with structured config validation. Fail fast with clear errors. 670 671--- 672 673### Add Connection Pooling for PDS HTTP Clients 674**Added:** 2025-10-11 | **Effort:** 2 hours 675 676Create shared `http.Client` with connection pooling instead of new client per request. 677 678--- 679 680### Architecture Decision Records (ADRs) 681**Added:** 2025-10-11 | **Effort:** Ongoing 682 683Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field. 684 685--- 686 687### Replace log Package with Structured Logger 688**Added:** 2025-10-11 | **Effort:** 1 day 689 690**Problem:** Using standard `log` package. Need structured logging (JSON) with levels. 691 692**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields. 693 694**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46) 695 696--- 697 698### PDS URL Resolution from DID 699**Added:** 2025-10-11 | **Effort:** 2-3 hours 700 701**Problem:** User consumer doesn't resolve PDS URL from DID document when missing. 702 703**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`. 704 705**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203) 706 707--- 708 709## Recent Completions 710 711### ✅ Token Refresh for Community Credentials (2025-10-17) 712**Completed:** Automatic token refresh prevents communities from breaking after 2 hours 713 714**Implementation:** 715- ✅ JWT expiration parsing and refresh detection (5-minute buffer) 716- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`) 717- ✅ Password fallback when refresh tokens expire (`atproto.ServerCreateSession`) 718- ✅ Atomic credential updates in database (`UpdateCredentials`) 719- ✅ Concurrency-safe with per-community mutex locking 720- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events) 721- ✅ Integration tests for expiration detection and credential updates 722 723**Files Created:** 724- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) 725- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) 726- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) 727 728**Files Modified:** 729- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` method 730- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface 731- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials` 732 733**Documentation:** [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) 734 735**Impact:** Communities now work indefinitely without manual token management 736 737--- 738 739### ✅ OAuth Authentication for Community Actions (2025-10-16) 740**Completed:** Full OAuth JWT authentication flow for protected endpoints 741 742**Implementation:** 743- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling) 744- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints 745- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)` 746- ✅ Removed all X-User-DID header placeholders 747- ✅ E2E tests validate complete OAuth flow with real PDS tokens 748- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs 749 750**Files Modified:** 751- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility 752- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware 753- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated 754- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests 755 756**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above) 757 758--- 759 760### ✅ Fix .local TLD Bug (2025-10-11) 761Changed default `INSTANCE_DID` from `did:web:coves.local``did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD. 762 763--- 764 765## Prioritization 766 767- **P0:** Security vulns, data loss, prod blockers 768- **P1:** Major UX/reliability issues 769- **P2:** QOL improvements, minor bugs, docs 770- **P3:** Refactoring, code quality