A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt 2 3**Status:** Ongoing 4**Owner:** Platform Team 5**Last Updated:** 2025-10-17 6 7## Overview 8 9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs. 10 11--- 12 13## 🔴 P0: Critical (Alpha Blockers) 14 15### OAuth DPoP Token Architecture - Voting Write-Forward 16**Added:** 2025-11-02 | **Completed:** 2025-11-02 | **Effort:** 2 hours | **Priority:** ALPHA BLOCKER 17**Status:** ✅ COMPLETE 18 19**Problem:** 20Our backend is attempting to use DPoP-bound OAuth tokens to write votes to users' PDSs, causing "Malformed token" errors. This violates atProto architecture patterns. 21 22**Current (Incorrect) Flow:** 23``` 24Mobile Client (OAuth + DPoP) → Coves Backend → User's PDS ❌ 25 26 "Malformed token" error 27``` 28 29**Root Cause:** 30- Mobile app uses OAuth with DPoP (Demonstrating Proof of Possession) 31- DPoP tokens are cryptographically bound to client's private key via `cnf.jkt` claim 32- Each PDS request requires **both**: 33 - `Authorization: Bearer <token>` 34 - `DPoP: <signed-proof-jwt>` (signature proves client has private key) 35- Backend cannot create DPoP proofs (doesn't have client's private key) 36- **DPoP tokens are intentionally non-transferable** (security feature to prevent token theft) 37 38**Evidence:** 39```json 40// Token decoded from mobile app session 41{ 42 "sub": "did:plc:txrork7rurdueix27ulzi7ke", 43 "cnf": { 44 "jkt": "LSWROJhTkPn4yT18xUjiIz2Z7z7l_gozKfjjQTYgW9o" // ← DPoP binding 45 }, 46 "client_id": "https://lingering-darkness-50a6.brettmay0212.workers.dev/client-metadata.json", 47 "iss": "http://localhost:3001" 48} 49``` 50 51**atProto Best Practice (from Bluesky social-app analysis):** 52- ✅ Clients write **directly to their own PDS** (no backend proxy) 53- ✅ AppView **only indexes** from Jetstream (eventual consistency) 54- ✅ PDS = User's personal data store (user controls writes) 55- ✅ AppView = Read-only aggregator/indexer 56- ❌ Backend should NOT proxy user write operations 57 58**Correct Architecture:** 59``` 60Mobile Client → User's PDS (direct write with DPoP proof) ✓ 61 62 Jetstream (firehose) 63 64 Coves AppView (indexes votes from firehose) 65``` 66 67**Affected Endpoints:** 681. **Vote Creation** - [create_vote.go:76](../internal/api/handlers/vote/create_vote.go#L76) 69 - Currently: Backend writes to PDS using user's token 70 - Should: Return error directing client to write directly 71 722. **Vote Service** - [service.go:126](../internal/core/votes/service.go#L126) 73 - Currently: `createRecordOnPDSAs()` attempts write-forward 74 - Should: Remove write-forward, rely on Jetstream indexing only 75 76**Solution Options:** 77 78**Option A: Client Direct Write (RECOMMENDED - Follows Bluesky)** 79```typescript 80// Mobile client writes directly (like Bluesky social-app) 81const agent = new Agent(oauthSession) 82await agent.call('com.atproto.repo.createRecord', { 83 repo: userDid, 84 collection: 'social.coves.interaction.vote', 85 record: { 86 $type: 'social.coves.interaction.vote', 87 subject: { uri: postUri, cid: postCid }, 88 direction: 'up', 89 createdAt: new Date().toISOString() 90 } 91}) 92``` 93 94Backend changes: 95- Remove write-forward code from vote service 96- Return error from XRPC endpoint: "Votes must be created directly at your PDS" 97- Index votes from Jetstream consumer (already implemented) 98 99**Option B: Backend App Passwords (NOT RECOMMENDED)** 100- User creates app-specific password 101- Backend uses password auth (gets regular JWTs, not DPoP) 102- Security downgrade, poor UX 103 104**Option C: Service Auth Token (Complex)** 105- Backend gets its own service credentials 106- Requires PDS to trust our AppView as delegated writer 107- Non-standard atProto pattern 108 109**Recommendation:** Option A (Client Direct Write) 110- Matches atProto architecture 111- Follows Bluesky social-app pattern 112- Best security (user controls their data) 113- Simplest implementation 114 115**Implementation Tasks:** 1161. Update Flutter OAuth package to expose `agent.call()` for custom lexicons 1172. Update mobile vote UI to write directly to PDS 1183. Remove write-forward code from backend vote service 1194. Update vote XRPC handler to return helpful error message 1205. Verify Jetstream consumer correctly indexes votes 1216. Update integration tests to match new flow 122 123**References:** 124- Bluesky social-app: Direct PDS writes via agent 125- atProto OAuth spec: DPoP binding prevents token reuse 126- atProto architecture: AppView = read-only indexer 127 128--- 129 130### OAuth DPoP Token Architecture - Community Subscriptions 131**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER 132**Status:** 📋 TODO (Waiting for frontend implementation) 133 134**Problem:** 135Same DPoP token issue as voting - backend cannot use user's DPoP-bound OAuth tokens to write subscription records to user's PDS. 136 137**Affected Operations:** 138- `SubscribeToCommunity()` - [service.go:564-624](../internal/core/communities/service.go#L564-L624) 139- `UnsubscribeFromCommunity()` - [service.go:626-660](../internal/core/communities/service.go#L626-L660) 140 141**Collection:** `social.coves.community.subscription` 142 143**Solution:** 144Client writes directly using `com.atproto.repo.createRecord`: 145```typescript 146await agent.call('com.atproto.repo.createRecord', { 147 repo: userDid, 148 collection: 'social.coves.community.subscription', 149 record: { 150 $type: 'social.coves.community.subscription', 151 subject: communityDid, 152 contentVisibility: 3, 153 createdAt: new Date().toISOString() 154 } 155}) 156``` 157 158**Backend Changes Needed:** 1591. Remove write-forward from `SubscribeToCommunity()` and `UnsubscribeFromCommunity()` 1602. Update handlers to return errors directing to client-direct pattern 1613. Verify Jetstream consumer indexes subscriptions (already working) 162 163**Files to Modify:** 164- `internal/core/communities/service.go` 165- `internal/api/handlers/community/subscribe.go` 166 167--- 168 169### OAuth DPoP Token Architecture - Community Blocking 170**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER 171**Status:** 📋 TODO (Waiting for frontend implementation) 172 173**Problem:** 174Same DPoP token issue - backend cannot use user's DPoP-bound OAuth tokens to write block records to user's PDS. 175 176**Affected Operations:** 177- `BlockCommunity()` - [service.go:709-781](../internal/core/communities/service.go#L709-L781) 178- `UnblockCommunity()` - [service.go:783-816](../internal/core/communities/service.go#L783-L816) 179 180**Collection:** `social.coves.community.block` 181 182**Solution:** 183Client writes directly using `com.atproto.repo.createRecord`: 184```typescript 185await agent.call('com.atproto.repo.createRecord', { 186 repo: userDid, 187 collection: 'social.coves.community.block', 188 record: { 189 $type: 'social.coves.community.block', 190 subject: communityDid, 191 createdAt: new Date().toISOString() 192 } 193}) 194``` 195 196**Backend Changes Needed:** 1971. Remove write-forward from `BlockCommunity()` and `UnblockCommunity()` 1982. Update handlers to return errors directing to client-direct pattern 1993. Verify Jetstream consumer indexes blocks (already working) 200 201**Files to Modify:** 202- `internal/core/communities/service.go` 203- `internal/api/handlers/community/block.go` 204 205--- 206 207## 🟡 P1: Important (Alpha Blockers) 208 209### at-identifier Handle Resolution in Endpoints 210**Added:** 2025-10-18 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER 211 212**Problem:** 213Current implementation rejects handles in endpoints that declare `"format": "at-identifier"` in their lexicon schemas, violating atProto best practices and breaking legitimate client usage. 214 215**Impact:** 216- ❌ Post creation fails when client sends community handle (e.g., `!gardening.communities.coves.social`) 217- ❌ Subscribe/unsubscribe endpoints reject handles despite lexicon declaring `at-identifier` 218- ❌ Block endpoints use `"format": "did"` but should use `at-identifier` for consistency 219- 🔴 **P0 Issue:** API contract violation - clients following the schema are rejected 220 221**Root Cause:** 222Handlers and services validate `strings.HasPrefix(req.Community, "did:")` instead of calling `ResolveCommunityIdentifier()`. 223 224**Affected Endpoints:** 2251. **Post Creation** - [create.go:54](../internal/api/handlers/post/create.go#L54), [service.go:51](../internal/core/posts/service.go#L51) 226 - Lexicon declares `at-identifier`: [post/create.json:16](../internal/atproto/lexicon/social/coves/post/create.json#L16) 227 2282. **Subscribe** - [subscribe.go:52](../internal/api/handlers/community/subscribe.go#L52) 229 - Lexicon declares `at-identifier`: [subscribe.json:16](../internal/atproto/lexicon/social/coves/community/subscribe.json#L16) 230 2313. **Unsubscribe** - [subscribe.go:120](../internal/api/handlers/community/subscribe.go#L120) 232 - Lexicon declares `at-identifier`: [unsubscribe.json:16](../internal/atproto/lexicon/social/coves/community/unsubscribe.json#L16) 233 2344. **Block/Unblock** - [block.go:58](../internal/api/handlers/community/block.go#L58), [block.go:132](../internal/api/handlers/community/block.go#L132) 235 - Lexicon declares `"format": "did"`: [block.json:15](../internal/atproto/lexicon/social/coves/community/block.json#L15) 236 - Should be changed to `at-identifier` for consistency and best practice 237 238**atProto Best Practice (from docs):** 239- ✅ API endpoints should accept both DIDs and handles via `at-identifier` format 240- ✅ Resolve handles to DIDs immediately at API boundary 241- ✅ Use DIDs internally for all business logic and storage 242- ✅ Handles are weak refs (changeable), DIDs are strong refs (permanent) 243- ⚠️ Bidirectional verification required (already handled by `identity.CachingResolver`) 244 245**Solution:** 246Replace direct DID validation with handle resolution using existing `ResolveCommunityIdentifier()`: 247 248```go 249// BEFORE (wrong) ❌ 250if !strings.HasPrefix(req.Community, "did:") { 251 return error 252} 253 254// AFTER (correct) ✅ 255communityDID, err := h.communityService.ResolveCommunityIdentifier(ctx, req.Community) 256if err != nil { 257 if communities.IsNotFound(err) { 258 writeError(w, http.StatusNotFound, "CommunityNotFound", "Community not found") 259 return 260 } 261 writeError(w, http.StatusBadRequest, "InvalidRequest", err.Error()) 262 return 263} 264// Now use communityDID (guaranteed to be a DID) 265``` 266 267**Implementation Plan:** 2681.**Phase 1 (Alpha Blocker):** Fix post creation endpoint - COMPLETE (2025-10-18) 269 - Post creation already uses `ResolveCommunityIdentifier()` at [service.go:100](../internal/core/posts/service.go#L100) 270 - Supports handles, DIDs, and scoped formats 271 2722. 📋 **Phase 2 (Beta):** Fix subscription endpoints 273 - Update subscribe/unsubscribe handlers 274 - Add tests for handle resolution in subscriptions 275 2763.**Phase 3 (Beta):** Fix block endpoints - COMPLETE (2025-11-16) 277 - Updated block/unblock handlers to use `ResolveCommunityIdentifier()` 278 - Accepts handles (`@gaming.community.coves.social`), DIDs, and scoped format (`!gaming@coves.social`) 279 - Added comprehensive tests: [block_handle_resolution_test.go](../tests/integration/block_handle_resolution_test.go) 280 - All 7 test cases passing 281 282**Files Modified (Phase 3 - Block Endpoints):** 283- `internal/api/handlers/community/block.go` - Added `ResolveCommunityIdentifier()` calls 284- `tests/integration/block_handle_resolution_test.go` - Comprehensive test coverage 285 286**Existing Infrastructure:** 287`ResolveCommunityIdentifier()` already implemented at [service.go:852](../internal/core/communities/service.go#L852) 288`identity.CachingResolver` handles bidirectional verification and caching 289✅ Supports both handle (`!name.communities.instance.com`) and DID formats 290 291**Current Status:** 292- ✅ Phase 1 (post creation) - Already implemented 293- 📋 Phase 2 (subscriptions) - Deferred to Beta (lower priority) 294- ✅ Phase 3 (block endpoints) - COMPLETE (2025-11-16) 295 296--- 297 298### ✅ did:web Domain Verification & hostedByDID Auto-Population - COMPLETE 299**Added:** 2025-10-11 | **Updated:** 2025-11-16 | **Completed:** 2025-11-16 | **Status:** ✅ DONE 300 301**Problem:** 3021. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains 3032. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing 304 305**Attack Scenarios:** 306- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves 307- Federation partners can't verify instance authenticity 308- AppView pollution with fake hosting claims 309 310**Solution Implemented (Bluesky-Compatible):** 3111.**Domain Matching**: Verify `did:web:` domain matches configured `instanceDomain` 3122.**Bidirectional Verification**: Fetch `https://domain/.well-known/did.json` and verify: 313 - DID document exists and is valid 314 - DID document ID matches claimed `instanceDID` 315 - DID document claims handle domain in `alsoKnownAs` field (bidirectional binding) 316 - Domain ownership proven via HTTPS hosting (matches Bluesky's trust model) 3173.**Auto-populate hostedByDID**: Removed from client API, derived from instance configuration in service layer 318 319**Current Status:** 320- ✅ Default changed from `coves.local``coves.social` (fixes `.local` TLD bug) 321- ✅ hostedByDID removed from client requests (2025-10-16) 322- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16) 323- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16) 324- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16) 325-**MANDATORY bidirectional DID verification** (2025-11-16) 326- ✅ Cache TTL updated to 24h (matches Bluesky recommendations) (2025-11-16) 327 328**Implementation Details:** 329- **Security Model**: Matches Bluesky's approach - relies on DNS/HTTPS authority, not cryptographic proof 330- **Enforcement**: MANDATORY hard-fail in production (rejects communities with verification failures) 331- **Dev Mode**: Set `SKIP_DID_WEB_VERIFICATION=true` to bypass verification for local development 332- **Performance**: Bounded LRU cache (1000 entries), rate limiting (10 req/s), 24h cache TTL 333- **Bidirectional Check**: Prevents impersonation by requiring DID document to claim the handle 334- **Location**: [internal/atproto/jetstream/community_consumer.go](../internal/atproto/jetstream/community_consumer.go) 335 336--- 337 338### ✅ Token Refresh Logic for Community Credentials - COMPLETE 339**Added:** 2025-10-11 | **Completed:** 2025-10-17 | **Effort:** 1.5 days | **Status:** ✅ DONE 340 341**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention. 342 343**Solution Implemented:** 344- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration) 345- ✅ JWT expiration parsing without signature verification (`parseJWTExpiration`, `needsRefresh`) 346- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`) 347- ✅ Password fallback when refresh tokens expire (~2 months) via `atproto.ServerCreateSession` 348- ✅ Atomic credential updates (`UpdateCredentials` repository method) 349- ✅ Concurrency-safe with per-community mutex locking 350- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events) 351- ✅ Integration tests for token expiration detection and credential updates 352 353**Files Created:** 354- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) - JWT parsing utilities 355- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) - Refresh and re-auth logic 356- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) - Integration tests 357 358**Files Modified:** 359- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` + concurrency control 360- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface 361- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials` 362 363**Documentation:** See [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) for full details 364 365**Impact:** ✅ Communities can now be updated 24+ hours after creation without manual intervention 366 367--- 368 369### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE 370**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE 371 372**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it. 373 374**Solution Implemented:** 375- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3) 376- ✅ Store in subscription record on PDS (`social.coves.community.subscription`) 377- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint 378- ✅ Clamping at all layers (handler, service, consumer) for defense in depth 379- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount) 380- ✅ Idempotent operations (safe for Jetstream event replays) 381- ✅ Fixed critical collection name bug (was using wrong namespace) 382- ✅ Production Jetstream consumer now running 383- ✅ 13 comprehensive integration tests - all passing 384 385**Files Modified:** 386- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions 387- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility 388- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS 389- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes 390- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated 391- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes 392- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage 393 394**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details 395 396**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled) 397 398--- 399 400### Community Blocking 401**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER 402 403**Problem:** Users have no way to block unwanted communities from their feeds. 404 405**Solution:** 4061. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only) 4072. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()` 4083. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock` 4094. **Repository:** Add methods to track blocked communities 4105. **Feed:** Filter blocked communities from feed queries (beta work) 411 412**Code:** 413- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs 414- Service: New methods needed 415- Handlers: New files needed 416 417**Impact:** Users can't avoid unwanted content without blocking 418 419--- 420 421### ✅ Post comment_count Reconciliation - COMPLETE 422**Added:** 2025-11-04 | **Completed:** 2025-11-16 | **Effort:** 2 hours | **Status:** ✅ DONE 423 424**Problem:** 425When comments arrive before their parent post is indexed (common with cross-repo Jetstream ordering), the post's `comment_count` was never reconciled, causing posts to show permanently stale "0 comments" counters. 426 427**Solution Implemented:** 428- ✅ Post consumer reconciliation logic WAS already implemented at [post_consumer.go:210-226](../internal/atproto/jetstream/post_consumer.go#L210-L226) 429- ✅ Reconciliation query counts pre-existing comments when indexing new posts 430- ✅ Comprehensive test suite added: [post_consumer_test.go](../tests/integration/post_consumer_test.go) 431 - Single comment before post 432 - Multiple comments before post 433 - Mixed before/after ordering 434 - Idempotent indexing preserves counts 435- ✅ Updated outdated FIXME comment at [comment_consumer.go:362](../internal/atproto/jetstream/comment_consumer.go#L362) 436- ✅ All 4 test cases passing 437 438**Implementation:** 439```go 440// Post consumer reconciliation (lines 210-226) 441reconcileQuery := ` 442 UPDATE posts 443 SET comment_count = ( 444 SELECT COUNT(*) 445 FROM comments c 446 WHERE c.parent_uri = $1 AND c.deleted_at IS NULL 447 ) 448 WHERE id = $2 449` 450_, reconcileErr := tx.ExecContext(ctx, reconcileQuery, post.URI, postID) 451``` 452 453**Files Modified:** 454- `internal/atproto/jetstream/comment_consumer.go` - Updated documentation 455- `tests/integration/post_consumer_test.go` - Added comprehensive test coverage 456 457**Impact:** ✅ Post comment counters are now accurate regardless of Jetstream event ordering 458 459--- 460 461## 🔴 P1.5: Federation Blockers (Beta Launch) 462 463### Cross-PDS Write-Forward Support for Community Service 464**Added:** 2025-10-17 | **Updated:** 2025-11-02 | **Effort:** 3-4 hours | **Priority:** FEDERATION BLOCKER (Beta) 465 466**Problem:** Community service write-forward methods assume all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to subscribe/block communities. 467 468**Current Behavior:** 469- User on `pds.bsky.social` subscribes to community on `coves.social` 470- Coves calls `s.pdsURL` (instance default: `http://localhost:3001`) 471- Write goes to WRONG PDS → fails with `{"error":"InvalidToken","message":"Malformed token"}` 472 473**Impact:** 474-**Alpha**: Works fine (single PDS deployment, no federation) 475-**Beta**: Breaks federation (users on different PDSs can't subscribe/block) 476 477**Root Cause:** 478- [service.go:1033](../internal/core/communities/service.go#L1033): `createRecordOnPDSAs` hardcodes `s.pdsURL` 479- [service.go:1050](../internal/core/communities/service.go#L1050): `putRecordOnPDSAs` hardcodes `s.pdsURL` 480- [service.go:1063](../internal/core/communities/service.go#L1063): `deleteRecordOnPDSAs` hardcodes `s.pdsURL` 481 482**Affected Operations:** 483- `SubscribeToCommunity` ([service.go:608](../internal/core/communities/service.go#L608)) 484- `UnsubscribeFromCommunity` (calls `deleteRecordOnPDSAs`) 485- `BlockCommunity` ([service.go:739](../internal/core/communities/service.go#L739)) 486- `UnblockCommunity` (calls `deleteRecordOnPDSAs`) 487 488**Solution:** 4891. Add `identityResolver identity.Resolver` to `communityService` struct 4902. Before write-forward, resolve user's DID → extract PDS URL 4913. Call user's actual PDS instead of hardcoded `s.pdsURL` 492 493**Implementation Pattern (from Vote Service):** 494```go 495// Add helper method to resolve user's PDS 496func (s *communityService) resolveUserPDS(ctx context.Context, userDID string) (string, error) { 497 identity, err := s.identityResolver.Resolve(ctx, userDID) 498 if err != nil { 499 return "", fmt.Errorf("failed to resolve user PDS: %w", err) 500 } 501 if identity.PDSURL == "" { 502 log.Printf("[COMMUNITY-PDS] WARNING: No PDS URL found for %s, using fallback: %s", userDID, s.pdsURL) 503 return s.pdsURL, nil 504 } 505 return identity.PDSURL, nil 506} 507 508// Update write-forward methods: 509func (s *communityService) createRecordOnPDSAs(ctx context.Context, repoDID, collection, rkey string, record map[string]interface{}, accessToken string) (string, string, error) { 510 // Resolve user's actual PDS (critical for federation) 511 pdsURL, err := s.resolveUserPDS(ctx, repoDID) 512 if err != nil { 513 return "", "", fmt.Errorf("failed to resolve user PDS: %w", err) 514 } 515 endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", strings.TrimSuffix(pdsURL, "/")) 516 // ... rest of method 517} 518``` 519 520**Files to Modify:** 521- `internal/core/communities/service.go` - Add resolver field + `resolveUserPDS` helper 522- `internal/core/communities/service.go` - Update `createRecordOnPDSAs`, `putRecordOnPDSAs`, `deleteRecordOnPDSAs` 523- `cmd/server/main.go` - Pass identity resolver to community service constructor 524- Tests - Add cross-PDS subscription/block scenarios 525 526**Testing:** 527- User on external PDS subscribes to community → writes to their PDS 528- User on external PDS blocks community → writes to their PDS 529- Community profile updates still work (writes to community's own PDS) 530 531**Related:** 532-**Vote Service**: Fixed in Alpha (2025-11-02) - users can vote from any PDS 533- 🔴 **Community Service**: Deferred to Beta (no federation in Alpha) 534 535--- 536 537## 🟢 P2: Nice-to-Have 538 539### Remove Categories from Community Lexicon 540**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup 541 542**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value. 543 544**Solution:** 545- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54) 546- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59) 547- Remove from [community.go:91](../internal/core/communities/community.go#L91) 548- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110)) 549 550**Impact:** Simplifies lexicon, removes unused feature 551 552--- 553 554### Improve .local TLD Error Messages 555**Added:** 2025-10-11 | **Effort:** 1 hour 556 557**Problem:** Generic error "TLD .local is not allowed" confuses developers. 558 559**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`. 560 561--- 562 563### Self-Hosting Security Guide 564**Added:** 2025-10-11 | **Effort:** 1 day 565 566**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring. 567 568--- 569 570### OAuth Session Cleanup Race Condition 571**Added:** 2025-10-11 | **Effort:** 2 hours 572 573**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections. 574 575**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout. 576 577--- 578 579### Jetstream Consumer Race Condition 580**Added:** 2025-10-11 | **Effort:** 1 hour 581 582**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown. 583 584**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state. 585 586**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114) 587 588--- 589 590### Unfurl Cache Cleanup Background Job 591**Added:** 2025-11-07 | **Effort:** 2-3 hours | **Priority:** Performance/Maintenance 592 593**Problem:** The `unfurl_cache` table will grow indefinitely as expired entries are not deleted. While the cache uses lazy expiration (checking `expires_at` on read), old records remain in the database consuming disk space. 594 595**Impact:** 596- 📊 ~1KB per cached URL 597- 📈 At 10K cached URLs = ~10MB (negligible for alpha) 598- ⚠️ At 1M cached URLs = ~1GB (potential issue at scale) 599- 🐌 Table bloat can slow down queries over time 600 601**Current Mitigation:** 602- ✅ Lazy expiration: Cache hits check `expires_at` and refetch if expired 603- ✅ Indexed on `expires_at` for efficient expiration queries 604- ✅ Not critical for alpha (growth is gradual) 605 606**Solution (Beta/Production):** 607Implement background cleanup job to delete expired entries: 608 609```go 610// Periodic cleanup (run daily or weekly) 611func (r *unfurlRepository) CleanupExpired(ctx context.Context) (int64, error) { 612 query := `DELETE FROM unfurl_cache WHERE expires_at < NOW()` 613 result, err := r.db.ExecContext(ctx, query) 614 if err != nil { 615 return 0, err 616 } 617 return result.RowsAffected() 618} 619``` 620 621**Implementation Options:** 6221. **Cron job**: Separate process runs cleanup on schedule 6232. **Background goroutine**: Service-level background task with configurable interval 6243. **PostgreSQL pg_cron extension**: Database-level scheduled cleanup 625 626**Recommended Approach:** 627- Phase 1 (Beta): Background goroutine running weekly cleanup 628- Phase 2 (Production): Migrate to pg_cron or external cron for reliability 629 630**Configuration:** 631```bash 632UNFURL_CACHE_CLEANUP_ENABLED=true 633UNFURL_CACHE_CLEANUP_INTERVAL=168h # 7 days 634``` 635 636**Monitoring:** 637- Log cleanup operations: `[UNFURL-CACHE-CLEANUP] Deleted 1234 expired entries` 638- Track table size growth over time 639- Alert if table exceeds threshold (e.g., 100MB) 640 641**Files to Create:** 642- `internal/core/unfurl/cleanup.go` - Background cleanup service 643 644**Related:** 645- Implemented in oEmbed unfurling feature (2025-11-07) 646- Cache table: [migration XXX_create_unfurl_cache.sql](../internal/db/migrations/) 647 648--- 649 650## 🔵 P3: Technical Debt 651 652### Implement PutRecord in PDS Client 653**Added:** 2025-12-04 | **Effort:** 2-3 hours | **Priority:** Technical Debt 654**Status:** 📋 TODO 655 656**Problem:** 657The PDS client (`internal/atproto/pds/client.go`) only has `CreateRecord` but lacks `PutRecord`. This means updates use `CreateRecord` with an existing rkey, which: 6581. Loses optimistic locking (no CID swap check) 6592. Is semantically incorrect (creates vs updates) 6603. Could cause race conditions on concurrent updates 661 662**atProto Best Practice:** 663- `com.atproto.repo.putRecord` should be used for updates 664- Accepts `swapRecord` (expected CID) for optimistic locking 665- Returns conflict error if CID doesn't match (concurrent modification detected) 666 667**Solution:** 668Add `PutRecord` method to the PDS client interface: 669 670```go 671// Client interface addition 672type Client interface { 673 // ... existing methods ... 674 675 // PutRecord creates or updates a record with optional optimistic locking. 676 // If swapRecord is provided, the operation fails if the current CID doesn't match. 677 PutRecord(ctx context.Context, collection string, rkey string, record any, swapRecord string) (uri string, cid string, err error) 678} 679 680// Implementation 681func (c *client) PutRecord(ctx context.Context, collection string, rkey string, record any, swapRecord string) (string, string, error) { 682 payload := map[string]any{ 683 "repo": c.did, 684 "collection": collection, 685 "rkey": rkey, 686 "record": record, 687 } 688 689 // Optional: optimistic locking via CID swap check 690 if swapRecord != "" { 691 payload["swapRecord"] = swapRecord 692 } 693 694 var result struct { 695 URI string `json:"uri"` 696 CID string `json:"cid"` 697 } 698 699 err := c.apiClient.Post(ctx, syntax.NSID("com.atproto.repo.putRecord"), payload, &result) 700 if err != nil { 701 return "", "", wrapAPIError(err, "putRecord") 702 } 703 704 return result.URI, result.CID, nil 705} 706``` 707 708**Error Handling:** 709Add new error type for conflict detection: 710```go 711var ErrConflict = errors.New("record was modified by another operation") 712``` 713 714Map HTTP 409 in `wrapAPIError`: 715```go 716case 409: 717 return fmt.Errorf("%s: %w: %s", operation, ErrConflict, apiErr.Message) 718``` 719 720**Files to Modify:** 721- `internal/atproto/pds/client.go` - Add `PutRecord` method and interface 722- `internal/atproto/pds/errors.go` - Add `ErrConflict` error type 723 724**Testing:** 725- Unit test: Verify payload includes `swapRecord` when provided 726- Integration test: Concurrent updates detect conflict 727- Integration test: Update without `swapRecord` still works (backwards compatible) 728 729**Blocked By:** Nothing 730**Blocks:** "Migrate UpdateComment to use PutRecord" 731 732--- 733 734### Migrate UpdateComment to Use PutRecord 735**Added:** 2025-12-04 | **Effort:** 1 hour | **Priority:** Technical Debt 736**Status:** 📋 TODO (Blocked) 737**Blocked By:** "Implement PutRecord in PDS Client" 738 739**Problem:** 740`UpdateComment` in `internal/core/comments/comment_service.go` uses `CreateRecord` for updates instead of `PutRecord`. This lacks optimistic locking and is semantically incorrect. 741 742**Current Code (lines 687-690):** 743```go 744// TODO: Use PutRecord instead of CreateRecord for proper update semantics with optimistic locking. 745// PutRecord should accept the existing CID (existingRecord.CID) to ensure concurrent updates are detected. 746// However, PutRecord is not yet implemented in internal/atproto/pds/client.go. 747uri, cid, err := pdsClient.CreateRecord(ctx, commentCollection, rkey, updatedRecord) 748``` 749 750**Solution:** 751Once `PutRecord` is implemented in the PDS client, update to: 752```go 753// Use PutRecord with optimistic locking via existing CID 754uri, cid, err := pdsClient.PutRecord(ctx, commentCollection, rkey, updatedRecord, existingRecord.CID) 755if err != nil { 756 if errors.Is(err, pds.ErrConflict) { 757 // Record was modified by another operation - return appropriate error 758 return nil, fmt.Errorf("comment was modified, please refresh and try again: %w", err) 759 } 760 // ... existing error handling 761} 762``` 763 764**Files to Modify:** 765- `internal/core/comments/comment_service.go` - UpdateComment method 766- `internal/core/comments/errors.go` - Add `ErrConcurrentModification` if needed 767 768**Testing:** 769- Unit test: Verify `PutRecord` is called with correct CID 770- Integration test: Simulate concurrent update, verify conflict handling 771 772**Impact:** Proper optimistic locking prevents lost updates from race conditions 773 774--- 775 776### Consolidate Environment Variable Validation 777**Added:** 2025-10-11 | **Effort:** 2-3 hours 778 779Create `internal/config` package with structured config validation. Fail fast with clear errors. 780 781--- 782 783### Add Connection Pooling for PDS HTTP Clients 784**Added:** 2025-10-11 | **Effort:** 2 hours 785 786Create shared `http.Client` with connection pooling instead of new client per request. 787 788--- 789 790### Architecture Decision Records (ADRs) 791**Added:** 2025-10-11 | **Effort:** Ongoing 792 793Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field. 794 795--- 796 797### Replace log Package with Structured Logger 798**Added:** 2025-10-11 | **Effort:** 1 day 799 800**Problem:** Using standard `log` package. Need structured logging (JSON) with levels. 801 802**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields. 803 804**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46) 805 806--- 807 808### PDS URL Resolution from DID 809**Added:** 2025-10-11 | **Effort:** 2-3 hours 810 811**Problem:** User consumer doesn't resolve PDS URL from DID document when missing. 812 813**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`. 814 815**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203) 816 817--- 818 819## Recent Completions 820 821### ✅ Token Refresh for Community Credentials (2025-10-17) 822**Completed:** Automatic token refresh prevents communities from breaking after 2 hours 823 824**Implementation:** 825- ✅ JWT expiration parsing and refresh detection (5-minute buffer) 826- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`) 827- ✅ Password fallback when refresh tokens expire (`atproto.ServerCreateSession`) 828- ✅ Atomic credential updates in database (`UpdateCredentials`) 829- ✅ Concurrency-safe with per-community mutex locking 830- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events) 831- ✅ Integration tests for expiration detection and credential updates 832 833**Files Created:** 834- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) 835- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) 836- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) 837 838**Files Modified:** 839- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` method 840- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface 841- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials` 842 843**Documentation:** [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) 844 845**Impact:** Communities now work indefinitely without manual token management 846 847--- 848 849### ✅ OAuth Authentication for Community Actions (2025-10-16) 850**Completed:** Full OAuth JWT authentication flow for protected endpoints 851 852**Implementation:** 853- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling) 854- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints 855- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)` 856- ✅ Removed all X-User-DID header placeholders 857- ✅ E2E tests validate complete OAuth flow with real PDS tokens 858- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs 859 860**Files Modified:** 861- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility 862- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware 863- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated 864- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests 865 866**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above) 867 868--- 869 870### ✅ Fix .local TLD Bug (2025-10-11) 871Changed default `INSTANCE_DID` from `did:web:coves.local``did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD. 872 873--- 874 875## Prioritization 876 877- **P0:** Security vulns, data loss, prod blockers 878- **P1:** Major UX/reliability issues 879- **P2:** QOL improvements, minor bugs, docs 880- **P3:** Refactoring, code quality