A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt
2
3**Status:** Ongoing
4**Owner:** Platform Team
5**Last Updated:** 2025-10-16
6
7## Overview
8
9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
10
11---
12
13## 🟡 P1: Important (Alpha Blockers)
14
15### did:web Domain Verification & hostedByDID Auto-Population
16**Added:** 2025-10-11 | **Updated:** 2025-10-16 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER
17
18**Problem:**
191. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains
202. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing
21
22**Attack Scenarios:**
23- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves
24- Federation partners can't verify instance authenticity
25- AppView pollution with fake hosting claims
26
27**Solution:**
281. **Basic Validation (Phase 1)**: Verify `did:web:` domain matches configured `instanceDomain`
292. **Cryptographic Verification (Phase 2)**: Fetch `https://domain/.well-known/did.json` and verify:
30 - DID document exists and is valid
31 - Domain ownership proven via HTTPS hosting
32 - DID document matches claimed `instanceDID`
333. **Auto-populate hostedByDID**: Remove from client API, derive from instance configuration in service layer
34
35**Current Status:**
36- ✅ Default changed from `coves.local` → `coves.social` (fixes `.local` TLD bug)
37- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131)
38- ✅ hostedByDID removed from client requests (2025-10-16)
39- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16)
40- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16)
41- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16)
42- ⚠️ **REMAINING**: Full DID document verification (cryptographic proof of ownership)
43
44**Implementation Notes:**
45- Phase 1 complete: Basic validation catches config errors, logs warnings
46- Phase 2 needed: Fetch `https://domain/.well-known/did.json` and verify ownership
47- Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode
48- Full verification blocks startup if domain ownership cannot be proven
49
50---
51
52### Token Refresh Logic for Community Credentials
53**Added:** 2025-10-11 | **Effort:** 1-2 days | **Priority:** ALPHA BLOCKER
54
55**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
56
57**Solution:** Auto-refresh tokens before PDS operations. Parse JWT exp claim, use refresh token when expired, update DB.
58
59**Code:** TODO in [communities/service.go:123](../internal/core/communities/service.go#L123)
60
61---
62
63### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE
64**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE
65
66**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it.
67
68**Solution Implemented:**
69- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3)
70- ✅ Store in subscription record on PDS (`social.coves.community.subscription`)
71- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint
72- ✅ Clamping at all layers (handler, service, consumer) for defense in depth
73- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount)
74- ✅ Idempotent operations (safe for Jetstream event replays)
75- ✅ Fixed critical collection name bug (was using wrong namespace)
76- ✅ Production Jetstream consumer now running
77- ✅ 13 comprehensive integration tests - all passing
78
79**Files Modified:**
80- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions
81- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility
82- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS
83- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes
84- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated
85- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes
86- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage
87
88**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details
89
90**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled)
91
92---
93
94### Community Blocking
95**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER
96
97**Problem:** Users have no way to block unwanted communities from their feeds.
98
99**Solution:**
1001. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only)
1012. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()`
1023. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock`
1034. **Repository:** Add methods to track blocked communities
1045. **Feed:** Filter blocked communities from feed queries (beta work)
105
106**Code:**
107- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs
108- Service: New methods needed
109- Handlers: New files needed
110
111**Impact:** Users can't avoid unwanted content without blocking
112
113---
114
115## 🟢 P2: Nice-to-Have
116
117### Remove Categories from Community Lexicon
118**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup
119
120**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value.
121
122**Solution:**
123- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54)
124- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59)
125- Remove from [community.go:91](../internal/core/communities/community.go#L91)
126- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110))
127
128**Impact:** Simplifies lexicon, removes unused feature
129
130---
131
132### Improve .local TLD Error Messages
133**Added:** 2025-10-11 | **Effort:** 1 hour
134
135**Problem:** Generic error "TLD .local is not allowed" confuses developers.
136
137**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`.
138
139---
140
141### Self-Hosting Security Guide
142**Added:** 2025-10-11 | **Effort:** 1 day
143
144**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
145
146---
147
148### OAuth Session Cleanup Race Condition
149**Added:** 2025-10-11 | **Effort:** 2 hours
150
151**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
152
153**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout.
154
155---
156
157### Jetstream Consumer Race Condition
158**Added:** 2025-10-11 | **Effort:** 1 hour
159
160**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown.
161
162**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state.
163
164**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114)
165
166---
167
168## 🔵 P3: Technical Debt
169
170### Consolidate Environment Variable Validation
171**Added:** 2025-10-11 | **Effort:** 2-3 hours
172
173Create `internal/config` package with structured config validation. Fail fast with clear errors.
174
175---
176
177### Add Connection Pooling for PDS HTTP Clients
178**Added:** 2025-10-11 | **Effort:** 2 hours
179
180Create shared `http.Client` with connection pooling instead of new client per request.
181
182---
183
184### Architecture Decision Records (ADRs)
185**Added:** 2025-10-11 | **Effort:** Ongoing
186
187Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
188
189---
190
191### Replace log Package with Structured Logger
192**Added:** 2025-10-11 | **Effort:** 1 day
193
194**Problem:** Using standard `log` package. Need structured logging (JSON) with levels.
195
196**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields.
197
198**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46)
199
200---
201
202### PDS URL Resolution from DID
203**Added:** 2025-10-11 | **Effort:** 2-3 hours
204
205**Problem:** User consumer doesn't resolve PDS URL from DID document when missing.
206
207**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`.
208
209**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203)
210
211---
212
213### PLC Directory Registration (Production)
214**Added:** 2025-10-11 | **Effort:** 1 day
215
216**Problem:** DID generator creates did:plc but doesn't register in prod mode.
217
218**Solution:** Implement PLC registration API call when `isDevEnv=false`.
219
220**Code:** TODO in [did/generator.go:46](../internal/atproto/did/generator.go#L46)
221
222---
223
224## Recent Completions
225
226### ✅ OAuth Authentication for Community Actions (2025-10-16)
227**Completed:** Full OAuth JWT authentication flow for protected endpoints
228
229**Implementation:**
230- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling)
231- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints
232- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)`
233- ✅ Removed all X-User-DID header placeholders
234- ✅ E2E tests validate complete OAuth flow with real PDS tokens
235- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs
236
237**Files Modified:**
238- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility
239- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware
240- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated
241- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests
242
243**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above)
244
245---
246
247### ✅ Fix .local TLD Bug (2025-10-11)
248Changed default `INSTANCE_DID` from `did:web:coves.local` → `did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD.
249
250---
251
252## Prioritization
253
254- **P0:** Security vulns, data loss, prod blockers
255- **P1:** Major UX/reliability issues
256- **P2:** QOL improvements, minor bugs, docs
257- **P3:** Refactoring, code quality