A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt 2 3**Status:** Ongoing 4**Owner:** Platform Team 5**Last Updated:** 2025-10-16 6 7## Overview 8 9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs. 10 11--- 12 13## 🟡 P1: Important (Alpha Blockers) 14 15### did:web Domain Verification & hostedByDID Auto-Population 16**Added:** 2025-10-11 | **Updated:** 2025-10-16 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER 17 18**Problem:** 191. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains 202. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing 21 22**Attack Scenarios:** 23- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves 24- Federation partners can't verify instance authenticity 25- AppView pollution with fake hosting claims 26 27**Solution:** 281. **Basic Validation (Phase 1)**: Verify `did:web:` domain matches configured `instanceDomain` 292. **Cryptographic Verification (Phase 2)**: Fetch `https://domain/.well-known/did.json` and verify: 30 - DID document exists and is valid 31 - Domain ownership proven via HTTPS hosting 32 - DID document matches claimed `instanceDID` 333. **Auto-populate hostedByDID**: Remove from client API, derive from instance configuration in service layer 34 35**Current Status:** 36- ✅ Default changed from `coves.local``coves.social` (fixes `.local` TLD bug) 37- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131) 38- ✅ hostedByDID removed from client requests (2025-10-16) 39- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16) 40- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16) 41- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16) 42- ⚠️ **REMAINING**: Full DID document verification (cryptographic proof of ownership) 43 44**Implementation Notes:** 45- Phase 1 complete: Basic validation catches config errors, logs warnings 46- Phase 2 needed: Fetch `https://domain/.well-known/did.json` and verify ownership 47- Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode 48- Full verification blocks startup if domain ownership cannot be proven 49 50--- 51 52### Token Refresh Logic for Community Credentials 53**Added:** 2025-10-11 | **Effort:** 1-2 days | **Priority:** ALPHA BLOCKER 54 55**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention. 56 57**Solution:** Auto-refresh tokens before PDS operations. Parse JWT exp claim, use refresh token when expired, update DB. 58 59**Code:** TODO in [communities/service.go:123](../internal/core/communities/service.go#L123) 60 61--- 62 63### Subscription Visibility Level (Feed Slider 1-5 Scale) 64**Added:** 2025-10-15 | **Effort:** 4-6 hours | **Priority:** ALPHA BLOCKER 65 66**Problem:** Users can't control how much content they see from each community. Lexicon has `contentVisibility` (1-5 scale) but code doesn't use it. 67 68**Solution:** 69- Update subscribe handler to accept `contentVisibility` parameter (1-5, default 3) 70- Store in subscription record on PDS 71- Update feed generation to respect visibility level (beta work, but data structure needed now) 72 73**Code:** 74- Lexicon: [subscription.json:28-34](../internal/atproto/lexicon/social/coves/actor/subscription.json#L28-L34) ✅ Ready 75- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) - Add parameter 76- Service: [communities/service.go:373-376](../internal/core/communities/service.go#L373-L376) - Add to record 77 78**Impact:** Without this, users have no way to adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md) 79 80--- 81 82### Community Blocking 83**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER 84 85**Problem:** Users have no way to block unwanted communities from their feeds. 86 87**Solution:** 881. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only) 892. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()` 903. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock` 914. **Repository:** Add methods to track blocked communities 925. **Feed:** Filter blocked communities from feed queries (beta work) 93 94**Code:** 95- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs 96- Service: New methods needed 97- Handlers: New files needed 98 99**Impact:** Users can't avoid unwanted content without blocking 100 101--- 102 103## 🟢 P2: Nice-to-Have 104 105### Remove Categories from Community Lexicon 106**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup 107 108**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value. 109 110**Solution:** 111- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54) 112- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59) 113- Remove from [community.go:91](../internal/core/communities/community.go#L91) 114- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110)) 115 116**Impact:** Simplifies lexicon, removes unused feature 117 118--- 119 120### Improve .local TLD Error Messages 121**Added:** 2025-10-11 | **Effort:** 1 hour 122 123**Problem:** Generic error "TLD .local is not allowed" confuses developers. 124 125**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`. 126 127--- 128 129### Self-Hosting Security Guide 130**Added:** 2025-10-11 | **Effort:** 1 day 131 132**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring. 133 134--- 135 136### OAuth Session Cleanup Race Condition 137**Added:** 2025-10-11 | **Effort:** 2 hours 138 139**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections. 140 141**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout. 142 143--- 144 145### Jetstream Consumer Race Condition 146**Added:** 2025-10-11 | **Effort:** 1 hour 147 148**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown. 149 150**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state. 151 152**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114) 153 154--- 155 156## 🔵 P3: Technical Debt 157 158### Consolidate Environment Variable Validation 159**Added:** 2025-10-11 | **Effort:** 2-3 hours 160 161Create `internal/config` package with structured config validation. Fail fast with clear errors. 162 163--- 164 165### Add Connection Pooling for PDS HTTP Clients 166**Added:** 2025-10-11 | **Effort:** 2 hours 167 168Create shared `http.Client` with connection pooling instead of new client per request. 169 170--- 171 172### Architecture Decision Records (ADRs) 173**Added:** 2025-10-11 | **Effort:** Ongoing 174 175Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field. 176 177--- 178 179### Replace log Package with Structured Logger 180**Added:** 2025-10-11 | **Effort:** 1 day 181 182**Problem:** Using standard `log` package. Need structured logging (JSON) with levels. 183 184**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields. 185 186**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46) 187 188--- 189 190### PDS URL Resolution from DID 191**Added:** 2025-10-11 | **Effort:** 2-3 hours 192 193**Problem:** User consumer doesn't resolve PDS URL from DID document when missing. 194 195**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`. 196 197**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203) 198 199--- 200 201### PLC Directory Registration (Production) 202**Added:** 2025-10-11 | **Effort:** 1 day 203 204**Problem:** DID generator creates did:plc but doesn't register in prod mode. 205 206**Solution:** Implement PLC registration API call when `isDevEnv=false`. 207 208**Code:** TODO in [did/generator.go:46](../internal/atproto/did/generator.go#L46) 209 210--- 211 212## Recent Completions 213 214### ✅ OAuth Authentication for Community Actions (2025-10-16) 215**Completed:** Full OAuth JWT authentication flow for protected endpoints 216 217**Implementation:** 218- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling) 219- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints 220- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)` 221- ✅ Removed all X-User-DID header placeholders 222- ✅ E2E tests validate complete OAuth flow with real PDS tokens 223- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs 224 225**Files Modified:** 226- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility 227- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware 228- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated 229- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests 230 231**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above) 232 233--- 234 235### ✅ Fix .local TLD Bug (2025-10-11) 236Changed default `INSTANCE_DID` from `did:web:coves.local``did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD. 237 238--- 239 240## Prioritization 241 242- **P0:** Security vulns, data loss, prod blockers 243- **P1:** Major UX/reliability issues 244- **P2:** QOL improvements, minor bugs, docs 245- **P3:** Refactoring, code quality