A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt 2 3**Status:** Ongoing 4**Owner:** Platform Team 5**Last Updated:** 2025-10-17 6 7## Overview 8 9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs. 10 11--- 12 13## 🔴 P0: Critical (Alpha Blockers) 14 15### OAuth DPoP Token Architecture - Voting Write-Forward 16**Added:** 2025-11-02 | **Completed:** 2025-11-02 | **Effort:** 2 hours | **Priority:** ALPHA BLOCKER 17**Status:** ✅ COMPLETE 18 19**Problem:** 20Our backend is attempting to use DPoP-bound OAuth tokens to write votes to users' PDSs, causing "Malformed token" errors. This violates atProto architecture patterns. 21 22**Current (Incorrect) Flow:** 23``` 24Mobile Client (OAuth + DPoP) → Coves Backend → User's PDS ❌ 25 26 "Malformed token" error 27``` 28 29**Root Cause:** 30- Mobile app uses OAuth with DPoP (Demonstrating Proof of Possession) 31- DPoP tokens are cryptographically bound to client's private key via `cnf.jkt` claim 32- Each PDS request requires **both**: 33 - `Authorization: Bearer <token>` 34 - `DPoP: <signed-proof-jwt>` (signature proves client has private key) 35- Backend cannot create DPoP proofs (doesn't have client's private key) 36- **DPoP tokens are intentionally non-transferable** (security feature to prevent token theft) 37 38**Evidence:** 39```json 40// Token decoded from mobile app session 41{ 42 "sub": "did:plc:txrork7rurdueix27ulzi7ke", 43 "cnf": { 44 "jkt": "LSWROJhTkPn4yT18xUjiIz2Z7z7l_gozKfjjQTYgW9o" // ← DPoP binding 45 }, 46 "client_id": "https://lingering-darkness-50a6.brettmay0212.workers.dev/client-metadata.json", 47 "iss": "http://localhost:3001" 48} 49``` 50 51**atProto Best Practice (from Bluesky social-app analysis):** 52- ✅ Clients write **directly to their own PDS** (no backend proxy) 53- ✅ AppView **only indexes** from Jetstream (eventual consistency) 54- ✅ PDS = User's personal data store (user controls writes) 55- ✅ AppView = Read-only aggregator/indexer 56- ❌ Backend should NOT proxy user write operations 57 58**Correct Architecture:** 59``` 60Mobile Client → User's PDS (direct write with DPoP proof) ✓ 61 62 Jetstream (firehose) 63 64 Coves AppView (indexes votes from firehose) 65``` 66 67**Affected Endpoints:** 681. **Vote Creation** - [create_vote.go:76](../internal/api/handlers/vote/create_vote.go#L76) 69 - Currently: Backend writes to PDS using user's token 70 - Should: Return error directing client to write directly 71 722. **Vote Service** - [service.go:126](../internal/core/votes/service.go#L126) 73 - Currently: `createRecordOnPDSAs()` attempts write-forward 74 - Should: Remove write-forward, rely on Jetstream indexing only 75 76**Solution Options:** 77 78**Option A: Client Direct Write (RECOMMENDED - Follows Bluesky)** 79```typescript 80// Mobile client writes directly (like Bluesky social-app) 81const agent = new Agent(oauthSession) 82await agent.call('com.atproto.repo.createRecord', { 83 repo: userDid, 84 collection: 'social.coves.interaction.vote', 85 record: { 86 $type: 'social.coves.interaction.vote', 87 subject: { uri: postUri, cid: postCid }, 88 direction: 'up', 89 createdAt: new Date().toISOString() 90 } 91}) 92``` 93 94Backend changes: 95- Remove write-forward code from vote service 96- Return error from XRPC endpoint: "Votes must be created directly at your PDS" 97- Index votes from Jetstream consumer (already implemented) 98 99**Option B: Backend App Passwords (NOT RECOMMENDED)** 100- User creates app-specific password 101- Backend uses password auth (gets regular JWTs, not DPoP) 102- Security downgrade, poor UX 103 104**Option C: Service Auth Token (Complex)** 105- Backend gets its own service credentials 106- Requires PDS to trust our AppView as delegated writer 107- Non-standard atProto pattern 108 109**Recommendation:** Option A (Client Direct Write) 110- Matches atProto architecture 111- Follows Bluesky social-app pattern 112- Best security (user controls their data) 113- Simplest implementation 114 115**Implementation Tasks:** 1161. Update Flutter OAuth package to expose `agent.call()` for custom lexicons 1172. Update mobile vote UI to write directly to PDS 1183. Remove write-forward code from backend vote service 1194. Update vote XRPC handler to return helpful error message 1205. Verify Jetstream consumer correctly indexes votes 1216. Update integration tests to match new flow 122 123**References:** 124- Bluesky social-app: Direct PDS writes via agent 125- atProto OAuth spec: DPoP binding prevents token reuse 126- atProto architecture: AppView = read-only indexer 127 128--- 129 130### OAuth DPoP Token Architecture - Community Subscriptions 131**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER 132**Status:** 📋 TODO (Waiting for frontend implementation) 133 134**Problem:** 135Same DPoP token issue as voting - backend cannot use user's DPoP-bound OAuth tokens to write subscription records to user's PDS. 136 137**Affected Operations:** 138- `SubscribeToCommunity()` - [service.go:564-624](../internal/core/communities/service.go#L564-L624) 139- `UnsubscribeFromCommunity()` - [service.go:626-660](../internal/core/communities/service.go#L626-L660) 140 141**Collection:** `social.coves.community.subscription` 142 143**Solution:** 144Client writes directly using `com.atproto.repo.createRecord`: 145```typescript 146await agent.call('com.atproto.repo.createRecord', { 147 repo: userDid, 148 collection: 'social.coves.community.subscription', 149 record: { 150 $type: 'social.coves.community.subscription', 151 subject: communityDid, 152 contentVisibility: 3, 153 createdAt: new Date().toISOString() 154 } 155}) 156``` 157 158**Backend Changes Needed:** 1591. Remove write-forward from `SubscribeToCommunity()` and `UnsubscribeFromCommunity()` 1602. Update handlers to return errors directing to client-direct pattern 1613. Verify Jetstream consumer indexes subscriptions (already working) 162 163**Files to Modify:** 164- `internal/core/communities/service.go` 165- `internal/api/handlers/community/subscribe.go` 166 167--- 168 169### OAuth DPoP Token Architecture - Community Blocking 170**Added:** 2025-11-02 | **Effort:** 1-2 hours | **Priority:** ALPHA BLOCKER 171**Status:** 📋 TODO (Waiting for frontend implementation) 172 173**Problem:** 174Same DPoP token issue - backend cannot use user's DPoP-bound OAuth tokens to write block records to user's PDS. 175 176**Affected Operations:** 177- `BlockCommunity()` - [service.go:709-781](../internal/core/communities/service.go#L709-L781) 178- `UnblockCommunity()` - [service.go:783-816](../internal/core/communities/service.go#L783-L816) 179 180**Collection:** `social.coves.community.block` 181 182**Solution:** 183Client writes directly using `com.atproto.repo.createRecord`: 184```typescript 185await agent.call('com.atproto.repo.createRecord', { 186 repo: userDid, 187 collection: 'social.coves.community.block', 188 record: { 189 $type: 'social.coves.community.block', 190 subject: communityDid, 191 createdAt: new Date().toISOString() 192 } 193}) 194``` 195 196**Backend Changes Needed:** 1971. Remove write-forward from `BlockCommunity()` and `UnblockCommunity()` 1982. Update handlers to return errors directing to client-direct pattern 1993. Verify Jetstream consumer indexes blocks (already working) 200 201**Files to Modify:** 202- `internal/core/communities/service.go` 203- `internal/api/handlers/community/block.go` 204 205--- 206 207## 🟡 P1: Important (Alpha Blockers) 208 209### at-identifier Handle Resolution in Endpoints 210**Added:** 2025-10-18 | **Effort:** 2-3 hours | **Priority:** ALPHA BLOCKER 211 212**Problem:** 213Current implementation rejects handles in endpoints that declare `"format": "at-identifier"` in their lexicon schemas, violating atProto best practices and breaking legitimate client usage. 214 215**Impact:** 216- ❌ Post creation fails when client sends community handle (e.g., `!gardening.communities.coves.social`) 217- ❌ Subscribe/unsubscribe endpoints reject handles despite lexicon declaring `at-identifier` 218- ❌ Block endpoints use `"format": "did"` but should use `at-identifier` for consistency 219- 🔴 **P0 Issue:** API contract violation - clients following the schema are rejected 220 221**Root Cause:** 222Handlers and services validate `strings.HasPrefix(req.Community, "did:")` instead of calling `ResolveCommunityIdentifier()`. 223 224**Affected Endpoints:** 2251. **Post Creation** - [create.go:54](../internal/api/handlers/post/create.go#L54), [service.go:51](../internal/core/posts/service.go#L51) 226 - Lexicon declares `at-identifier`: [post/create.json:16](../internal/atproto/lexicon/social/coves/post/create.json#L16) 227 2282. **Subscribe** - [subscribe.go:52](../internal/api/handlers/community/subscribe.go#L52) 229 - Lexicon declares `at-identifier`: [subscribe.json:16](../internal/atproto/lexicon/social/coves/community/subscribe.json#L16) 230 2313. **Unsubscribe** - [subscribe.go:120](../internal/api/handlers/community/subscribe.go#L120) 232 - Lexicon declares `at-identifier`: [unsubscribe.json:16](../internal/atproto/lexicon/social/coves/community/unsubscribe.json#L16) 233 2344. **Block/Unblock** - [block.go:58](../internal/api/handlers/community/block.go#L58), [block.go:132](../internal/api/handlers/community/block.go#L132) 235 - Lexicon declares `"format": "did"`: [block.json:15](../internal/atproto/lexicon/social/coves/community/block.json#L15) 236 - Should be changed to `at-identifier` for consistency and best practice 237 238**atProto Best Practice (from docs):** 239- ✅ API endpoints should accept both DIDs and handles via `at-identifier` format 240- ✅ Resolve handles to DIDs immediately at API boundary 241- ✅ Use DIDs internally for all business logic and storage 242- ✅ Handles are weak refs (changeable), DIDs are strong refs (permanent) 243- ⚠️ Bidirectional verification required (already handled by `identity.CachingResolver`) 244 245**Solution:** 246Replace direct DID validation with handle resolution using existing `ResolveCommunityIdentifier()`: 247 248```go 249// BEFORE (wrong) ❌ 250if !strings.HasPrefix(req.Community, "did:") { 251 return error 252} 253 254// AFTER (correct) ✅ 255communityDID, err := h.communityService.ResolveCommunityIdentifier(ctx, req.Community) 256if err != nil { 257 if communities.IsNotFound(err) { 258 writeError(w, http.StatusNotFound, "CommunityNotFound", "Community not found") 259 return 260 } 261 writeError(w, http.StatusBadRequest, "InvalidRequest", err.Error()) 262 return 263} 264// Now use communityDID (guaranteed to be a DID) 265``` 266 267**Implementation Plan:** 2681.**Phase 1 (Alpha Blocker):** Fix post creation endpoint - COMPLETE (2025-10-18) 269 - Post creation already uses `ResolveCommunityIdentifier()` at [service.go:100](../internal/core/posts/service.go#L100) 270 - Supports handles, DIDs, and scoped formats 271 2722. 📋 **Phase 2 (Beta):** Fix subscription endpoints 273 - Update subscribe/unsubscribe handlers 274 - Add tests for handle resolution in subscriptions 275 2763.**Phase 3 (Beta):** Fix block endpoints - COMPLETE (2025-11-16) 277 - Updated block/unblock handlers to use `ResolveCommunityIdentifier()` 278 - Accepts handles (`@gaming.community.coves.social`), DIDs, and scoped format (`!gaming@coves.social`) 279 - Added comprehensive tests: [block_handle_resolution_test.go](../tests/integration/block_handle_resolution_test.go) 280 - All 7 test cases passing 281 282**Files Modified (Phase 3 - Block Endpoints):** 283- `internal/api/handlers/community/block.go` - Added `ResolveCommunityIdentifier()` calls 284- `tests/integration/block_handle_resolution_test.go` - Comprehensive test coverage 285 286**Existing Infrastructure:** 287`ResolveCommunityIdentifier()` already implemented at [service.go:852](../internal/core/communities/service.go#L852) 288`identity.CachingResolver` handles bidirectional verification and caching 289✅ Supports both handle (`!name.communities.instance.com`) and DID formats 290 291**Current Status:** 292- ✅ Phase 1 (post creation) - Already implemented 293- 📋 Phase 2 (subscriptions) - Deferred to Beta (lower priority) 294- ✅ Phase 3 (block endpoints) - COMPLETE (2025-11-16) 295 296--- 297 298### did:web Domain Verification & hostedByDID Auto-Population 299**Added:** 2025-10-11 | **Updated:** 2025-10-16 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER 300 301**Problem:** 3021. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains 3032. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing 304 305**Attack Scenarios:** 306- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves 307- Federation partners can't verify instance authenticity 308- AppView pollution with fake hosting claims 309 310**Solution:** 3111. **Basic Validation (Phase 1)**: Verify `did:web:` domain matches configured `instanceDomain` 3122. **Cryptographic Verification (Phase 2)**: Fetch `https://domain/.well-known/did.json` and verify: 313 - DID document exists and is valid 314 - Domain ownership proven via HTTPS hosting 315 - DID document matches claimed `instanceDID` 3163. **Auto-populate hostedByDID**: Remove from client API, derive from instance configuration in service layer 317 318**Current Status:** 319- ✅ Default changed from `coves.local``coves.social` (fixes `.local` TLD bug) 320- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131) 321- ✅ hostedByDID removed from client requests (2025-10-16) 322- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16) 323- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16) 324- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16) 325- ⚠️ **REMAINING**: Full DID document verification (cryptographic proof of ownership) 326 327**Implementation Notes:** 328- Phase 1 complete: Basic validation catches config errors, logs warnings 329- Phase 2 needed: Fetch `https://domain/.well-known/did.json` and verify ownership 330- Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode 331- Full verification blocks startup if domain ownership cannot be proven 332 333--- 334 335### ✅ Token Refresh Logic for Community Credentials - COMPLETE 336**Added:** 2025-10-11 | **Completed:** 2025-10-17 | **Effort:** 1.5 days | **Status:** ✅ DONE 337 338**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention. 339 340**Solution Implemented:** 341- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration) 342- ✅ JWT expiration parsing without signature verification (`parseJWTExpiration`, `needsRefresh`) 343- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`) 344- ✅ Password fallback when refresh tokens expire (~2 months) via `atproto.ServerCreateSession` 345- ✅ Atomic credential updates (`UpdateCredentials` repository method) 346- ✅ Concurrency-safe with per-community mutex locking 347- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events) 348- ✅ Integration tests for token expiration detection and credential updates 349 350**Files Created:** 351- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) - JWT parsing utilities 352- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) - Refresh and re-auth logic 353- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) - Integration tests 354 355**Files Modified:** 356- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` + concurrency control 357- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface 358- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials` 359 360**Documentation:** See [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) for full details 361 362**Impact:** ✅ Communities can now be updated 24+ hours after creation without manual intervention 363 364--- 365 366### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE 367**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE 368 369**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it. 370 371**Solution Implemented:** 372- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3) 373- ✅ Store in subscription record on PDS (`social.coves.community.subscription`) 374- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint 375- ✅ Clamping at all layers (handler, service, consumer) for defense in depth 376- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount) 377- ✅ Idempotent operations (safe for Jetstream event replays) 378- ✅ Fixed critical collection name bug (was using wrong namespace) 379- ✅ Production Jetstream consumer now running 380- ✅ 13 comprehensive integration tests - all passing 381 382**Files Modified:** 383- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions 384- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility 385- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS 386- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes 387- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated 388- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes 389- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage 390 391**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details 392 393**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled) 394 395--- 396 397### Community Blocking 398**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER 399 400**Problem:** Users have no way to block unwanted communities from their feeds. 401 402**Solution:** 4031. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only) 4042. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()` 4053. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock` 4064. **Repository:** Add methods to track blocked communities 4075. **Feed:** Filter blocked communities from feed queries (beta work) 408 409**Code:** 410- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs 411- Service: New methods needed 412- Handlers: New files needed 413 414**Impact:** Users can't avoid unwanted content without blocking 415 416--- 417 418### ✅ Post comment_count Reconciliation - COMPLETE 419**Added:** 2025-11-04 | **Completed:** 2025-11-16 | **Effort:** 2 hours | **Status:** ✅ DONE 420 421**Problem:** 422When comments arrive before their parent post is indexed (common with cross-repo Jetstream ordering), the post's `comment_count` was never reconciled, causing posts to show permanently stale "0 comments" counters. 423 424**Solution Implemented:** 425- ✅ Post consumer reconciliation logic WAS already implemented at [post_consumer.go:210-226](../internal/atproto/jetstream/post_consumer.go#L210-L226) 426- ✅ Reconciliation query counts pre-existing comments when indexing new posts 427- ✅ Comprehensive test suite added: [post_consumer_test.go](../tests/integration/post_consumer_test.go) 428 - Single comment before post 429 - Multiple comments before post 430 - Mixed before/after ordering 431 - Idempotent indexing preserves counts 432- ✅ Updated outdated FIXME comment at [comment_consumer.go:362](../internal/atproto/jetstream/comment_consumer.go#L362) 433- ✅ All 4 test cases passing 434 435**Implementation:** 436```go 437// Post consumer reconciliation (lines 210-226) 438reconcileQuery := ` 439 UPDATE posts 440 SET comment_count = ( 441 SELECT COUNT(*) 442 FROM comments c 443 WHERE c.parent_uri = $1 AND c.deleted_at IS NULL 444 ) 445 WHERE id = $2 446` 447_, reconcileErr := tx.ExecContext(ctx, reconcileQuery, post.URI, postID) 448``` 449 450**Files Modified:** 451- `internal/atproto/jetstream/comment_consumer.go` - Updated documentation 452- `tests/integration/post_consumer_test.go` - Added comprehensive test coverage 453 454**Impact:** ✅ Post comment counters are now accurate regardless of Jetstream event ordering 455 456--- 457 458## 🔴 P1.5: Federation Blockers (Beta Launch) 459 460### Cross-PDS Write-Forward Support for Community Service 461**Added:** 2025-10-17 | **Updated:** 2025-11-02 | **Effort:** 3-4 hours | **Priority:** FEDERATION BLOCKER (Beta) 462 463**Problem:** Community service write-forward methods assume all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to subscribe/block communities. 464 465**Current Behavior:** 466- User on `pds.bsky.social` subscribes to community on `coves.social` 467- Coves calls `s.pdsURL` (instance default: `http://localhost:3001`) 468- Write goes to WRONG PDS → fails with `{"error":"InvalidToken","message":"Malformed token"}` 469 470**Impact:** 471-**Alpha**: Works fine (single PDS deployment, no federation) 472-**Beta**: Breaks federation (users on different PDSs can't subscribe/block) 473 474**Root Cause:** 475- [service.go:1033](../internal/core/communities/service.go#L1033): `createRecordOnPDSAs` hardcodes `s.pdsURL` 476- [service.go:1050](../internal/core/communities/service.go#L1050): `putRecordOnPDSAs` hardcodes `s.pdsURL` 477- [service.go:1063](../internal/core/communities/service.go#L1063): `deleteRecordOnPDSAs` hardcodes `s.pdsURL` 478 479**Affected Operations:** 480- `SubscribeToCommunity` ([service.go:608](../internal/core/communities/service.go#L608)) 481- `UnsubscribeFromCommunity` (calls `deleteRecordOnPDSAs`) 482- `BlockCommunity` ([service.go:739](../internal/core/communities/service.go#L739)) 483- `UnblockCommunity` (calls `deleteRecordOnPDSAs`) 484 485**Solution:** 4861. Add `identityResolver identity.Resolver` to `communityService` struct 4872. Before write-forward, resolve user's DID → extract PDS URL 4883. Call user's actual PDS instead of hardcoded `s.pdsURL` 489 490**Implementation Pattern (from Vote Service):** 491```go 492// Add helper method to resolve user's PDS 493func (s *communityService) resolveUserPDS(ctx context.Context, userDID string) (string, error) { 494 identity, err := s.identityResolver.Resolve(ctx, userDID) 495 if err != nil { 496 return "", fmt.Errorf("failed to resolve user PDS: %w", err) 497 } 498 if identity.PDSURL == "" { 499 log.Printf("[COMMUNITY-PDS] WARNING: No PDS URL found for %s, using fallback: %s", userDID, s.pdsURL) 500 return s.pdsURL, nil 501 } 502 return identity.PDSURL, nil 503} 504 505// Update write-forward methods: 506func (s *communityService) createRecordOnPDSAs(ctx context.Context, repoDID, collection, rkey string, record map[string]interface{}, accessToken string) (string, string, error) { 507 // Resolve user's actual PDS (critical for federation) 508 pdsURL, err := s.resolveUserPDS(ctx, repoDID) 509 if err != nil { 510 return "", "", fmt.Errorf("failed to resolve user PDS: %w", err) 511 } 512 endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", strings.TrimSuffix(pdsURL, "/")) 513 // ... rest of method 514} 515``` 516 517**Files to Modify:** 518- `internal/core/communities/service.go` - Add resolver field + `resolveUserPDS` helper 519- `internal/core/communities/service.go` - Update `createRecordOnPDSAs`, `putRecordOnPDSAs`, `deleteRecordOnPDSAs` 520- `cmd/server/main.go` - Pass identity resolver to community service constructor 521- Tests - Add cross-PDS subscription/block scenarios 522 523**Testing:** 524- User on external PDS subscribes to community → writes to their PDS 525- User on external PDS blocks community → writes to their PDS 526- Community profile updates still work (writes to community's own PDS) 527 528**Related:** 529-**Vote Service**: Fixed in Alpha (2025-11-02) - users can vote from any PDS 530- 🔴 **Community Service**: Deferred to Beta (no federation in Alpha) 531 532--- 533 534## 🟢 P2: Nice-to-Have 535 536### Remove Categories from Community Lexicon 537**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup 538 539**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value. 540 541**Solution:** 542- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54) 543- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59) 544- Remove from [community.go:91](../internal/core/communities/community.go#L91) 545- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110)) 546 547**Impact:** Simplifies lexicon, removes unused feature 548 549--- 550 551### Improve .local TLD Error Messages 552**Added:** 2025-10-11 | **Effort:** 1 hour 553 554**Problem:** Generic error "TLD .local is not allowed" confuses developers. 555 556**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`. 557 558--- 559 560### Self-Hosting Security Guide 561**Added:** 2025-10-11 | **Effort:** 1 day 562 563**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring. 564 565--- 566 567### OAuth Session Cleanup Race Condition 568**Added:** 2025-10-11 | **Effort:** 2 hours 569 570**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections. 571 572**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout. 573 574--- 575 576### Jetstream Consumer Race Condition 577**Added:** 2025-10-11 | **Effort:** 1 hour 578 579**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown. 580 581**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state. 582 583**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114) 584 585--- 586 587### Unfurl Cache Cleanup Background Job 588**Added:** 2025-11-07 | **Effort:** 2-3 hours | **Priority:** Performance/Maintenance 589 590**Problem:** The `unfurl_cache` table will grow indefinitely as expired entries are not deleted. While the cache uses lazy expiration (checking `expires_at` on read), old records remain in the database consuming disk space. 591 592**Impact:** 593- 📊 ~1KB per cached URL 594- 📈 At 10K cached URLs = ~10MB (negligible for alpha) 595- ⚠️ At 1M cached URLs = ~1GB (potential issue at scale) 596- 🐌 Table bloat can slow down queries over time 597 598**Current Mitigation:** 599- ✅ Lazy expiration: Cache hits check `expires_at` and refetch if expired 600- ✅ Indexed on `expires_at` for efficient expiration queries 601- ✅ Not critical for alpha (growth is gradual) 602 603**Solution (Beta/Production):** 604Implement background cleanup job to delete expired entries: 605 606```go 607// Periodic cleanup (run daily or weekly) 608func (r *unfurlRepository) CleanupExpired(ctx context.Context) (int64, error) { 609 query := `DELETE FROM unfurl_cache WHERE expires_at < NOW()` 610 result, err := r.db.ExecContext(ctx, query) 611 if err != nil { 612 return 0, err 613 } 614 return result.RowsAffected() 615} 616``` 617 618**Implementation Options:** 6191. **Cron job**: Separate process runs cleanup on schedule 6202. **Background goroutine**: Service-level background task with configurable interval 6213. **PostgreSQL pg_cron extension**: Database-level scheduled cleanup 622 623**Recommended Approach:** 624- Phase 1 (Beta): Background goroutine running weekly cleanup 625- Phase 2 (Production): Migrate to pg_cron or external cron for reliability 626 627**Configuration:** 628```bash 629UNFURL_CACHE_CLEANUP_ENABLED=true 630UNFURL_CACHE_CLEANUP_INTERVAL=168h # 7 days 631``` 632 633**Monitoring:** 634- Log cleanup operations: `[UNFURL-CACHE-CLEANUP] Deleted 1234 expired entries` 635- Track table size growth over time 636- Alert if table exceeds threshold (e.g., 100MB) 637 638**Files to Create:** 639- `internal/core/unfurl/cleanup.go` - Background cleanup service 640 641**Related:** 642- Implemented in oEmbed unfurling feature (2025-11-07) 643- Cache table: [migration XXX_create_unfurl_cache.sql](../internal/db/migrations/) 644 645--- 646 647## 🔵 P3: Technical Debt 648 649### Consolidate Environment Variable Validation 650**Added:** 2025-10-11 | **Effort:** 2-3 hours 651 652Create `internal/config` package with structured config validation. Fail fast with clear errors. 653 654--- 655 656### Add Connection Pooling for PDS HTTP Clients 657**Added:** 2025-10-11 | **Effort:** 2 hours 658 659Create shared `http.Client` with connection pooling instead of new client per request. 660 661--- 662 663### Architecture Decision Records (ADRs) 664**Added:** 2025-10-11 | **Effort:** Ongoing 665 666Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field. 667 668--- 669 670### Replace log Package with Structured Logger 671**Added:** 2025-10-11 | **Effort:** 1 day 672 673**Problem:** Using standard `log` package. Need structured logging (JSON) with levels. 674 675**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields. 676 677**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46) 678 679--- 680 681### PDS URL Resolution from DID 682**Added:** 2025-10-11 | **Effort:** 2-3 hours 683 684**Problem:** User consumer doesn't resolve PDS URL from DID document when missing. 685 686**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`. 687 688**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203) 689 690--- 691 692## Recent Completions 693 694### ✅ Token Refresh for Community Credentials (2025-10-17) 695**Completed:** Automatic token refresh prevents communities from breaking after 2 hours 696 697**Implementation:** 698- ✅ JWT expiration parsing and refresh detection (5-minute buffer) 699- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`) 700- ✅ Password fallback when refresh tokens expire (`atproto.ServerCreateSession`) 701- ✅ Atomic credential updates in database (`UpdateCredentials`) 702- ✅ Concurrency-safe with per-community mutex locking 703- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events) 704- ✅ Integration tests for expiration detection and credential updates 705 706**Files Created:** 707- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) 708- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) 709- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) 710 711**Files Modified:** 712- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` method 713- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface 714- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials` 715 716**Documentation:** [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) 717 718**Impact:** Communities now work indefinitely without manual token management 719 720--- 721 722### ✅ OAuth Authentication for Community Actions (2025-10-16) 723**Completed:** Full OAuth JWT authentication flow for protected endpoints 724 725**Implementation:** 726- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling) 727- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints 728- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)` 729- ✅ Removed all X-User-DID header placeholders 730- ✅ E2E tests validate complete OAuth flow with real PDS tokens 731- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs 732 733**Files Modified:** 734- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility 735- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware 736- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated 737- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests 738 739**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above) 740 741--- 742 743### ✅ Fix .local TLD Bug (2025-10-11) 744Changed default `INSTANCE_DID` from `did:web:coves.local``did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD. 745 746--- 747 748## Prioritization 749 750- **P0:** Security vulns, data loss, prod blockers 751- **P1:** Major UX/reliability issues 752- **P2:** QOL improvements, minor bugs, docs 753- **P3:** Refactoring, code quality