A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt
2
3**Status:** Ongoing
4**Owner:** Platform Team
5**Last Updated:** 2025-10-17
6
7## Overview
8
9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
10
11---
12
13## 🟡 P1: Important (Alpha Blockers)
14
15### did:web Domain Verification & hostedByDID Auto-Population
16**Added:** 2025-10-11 | **Updated:** 2025-10-16 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER
17
18**Problem:**
191. **Domain Impersonation**: Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling attacks where communities appear hosted by trusted domains
202. **hostedByDID Spoofing**: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing
21
22**Attack Scenarios:**
23- Malicious instance sets `instanceDID="did:web:coves.social"` → communities show as hosted by official Coves
24- Federation partners can't verify instance authenticity
25- AppView pollution with fake hosting claims
26
27**Solution:**
281. **Basic Validation (Phase 1)**: Verify `did:web:` domain matches configured `instanceDomain`
292. **Cryptographic Verification (Phase 2)**: Fetch `https://domain/.well-known/did.json` and verify:
30 - DID document exists and is valid
31 - Domain ownership proven via HTTPS hosting
32 - DID document matches claimed `instanceDID`
333. **Auto-populate hostedByDID**: Remove from client API, derive from instance configuration in service layer
34
35**Current Status:**
36- ✅ Default changed from `coves.local` → `coves.social` (fixes `.local` TLD bug)
37- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131)
38- ✅ hostedByDID removed from client requests (2025-10-16)
39- ✅ Service layer auto-populates `hostedByDID` from `instanceDID` (2025-10-16)
40- ✅ Handler rejects client-provided `hostedByDID` (2025-10-16)
41- ✅ Basic validation: Logs warning if `did:web:` domain ≠ `instanceDomain` (2025-10-16)
42- ⚠️ **REMAINING**: Full DID document verification (cryptographic proof of ownership)
43
44**Implementation Notes:**
45- Phase 1 complete: Basic validation catches config errors, logs warnings
46- Phase 2 needed: Fetch `https://domain/.well-known/did.json` and verify ownership
47- Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode
48- Full verification blocks startup if domain ownership cannot be proven
49
50---
51
52### ✅ Token Refresh Logic for Community Credentials - COMPLETE
53**Added:** 2025-10-11 | **Completed:** 2025-10-17 | **Effort:** 1.5 days | **Status:** ✅ DONE
54
55**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
56
57**Solution Implemented:**
58- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration)
59- ✅ JWT expiration parsing without signature verification (`parseJWTExpiration`, `needsRefresh`)
60- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
61- ✅ Password fallback when refresh tokens expire (~2 months) via `atproto.ServerCreateSession`
62- ✅ Atomic credential updates (`UpdateCredentials` repository method)
63- ✅ Concurrency-safe with per-community mutex locking
64- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
65- ✅ Integration tests for token expiration detection and credential updates
66
67**Files Created:**
68- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go) - JWT parsing utilities
69- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go) - Refresh and re-auth logic
70- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go) - Integration tests
71
72**Files Modified:**
73- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` + concurrency control
74- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
75- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
76
77**Documentation:** See [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md) for full details
78
79**Impact:** ✅ Communities can now be updated 24+ hours after creation without manual intervention
80
81---
82
83### ✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE
84**Added:** 2025-10-15 | **Completed:** 2025-10-16 | **Effort:** 1 day | **Status:** ✅ DONE
85
86**Problem:** Users couldn't control how much content they see from each community. Lexicon had `contentVisibility` (1-5 scale) but code didn't use it.
87
88**Solution Implemented:**
89- ✅ Updated subscribe handler to accept `contentVisibility` parameter (1-5, default 3)
90- ✅ Store in subscription record on PDS (`social.coves.community.subscription`)
91- ✅ Migration 008 adds `content_visibility` column to database with CHECK constraint
92- ✅ Clamping at all layers (handler, service, consumer) for defense in depth
93- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount)
94- ✅ Idempotent operations (safe for Jetstream event replays)
95- ✅ Fixed critical collection name bug (was using wrong namespace)
96- ✅ Production Jetstream consumer now running
97- ✅ 13 comprehensive integration tests - all passing
98
99**Files Modified:**
100- Lexicon: [subscription.json](../internal/atproto/lexicon/social/coves/community/subscription.json) ✅ Updated to atProto conventions
101- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) ✅ Accepts contentVisibility
102- Service: [communities/service.go](../internal/core/communities/service.go) ✅ Clamps and passes to PDS
103- Consumer: [community_consumer.go](../internal/atproto/jetstream/community_consumer.go) ✅ Extracts and indexes
104- Repository: [community_repo_subscriptions.go](../internal/db/postgres/community_repo_subscriptions.go) ✅ All queries updated
105- Migration: [008_add_content_visibility_to_subscriptions.sql](../internal/db/migrations/008_add_content_visibility_to_subscriptions.sql) ✅ Schema changes
106- Tests: [subscription_indexing_test.go](../tests/integration/subscription_indexing_test.go) ✅ Comprehensive coverage
107
108**Documentation:** See [IMPLEMENTATION_SUBSCRIPTION_INDEXING.md](../docs/IMPLEMENTATION_SUBSCRIPTION_INDEXING.md) for full details
109
110**Impact:** ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled)
111
112---
113
114### Community Blocking
115**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER
116
117**Problem:** Users have no way to block unwanted communities from their feeds.
118
119**Solution:**
1201. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only)
1212. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()`
1223. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock`
1234. **Repository:** Add methods to track blocked communities
1245. **Feed:** Filter blocked communities from feed queries (beta work)
125
126**Code:**
127- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs
128- Service: New methods needed
129- Handlers: New files needed
130
131**Impact:** Users can't avoid unwanted content without blocking
132
133---
134
135## 🔴 P1.5: Federation Blockers (Beta Launch)
136
137### Cross-PDS Write-Forward Support
138**Added:** 2025-10-17 | **Effort:** 3-4 hours | **Priority:** FEDERATION BLOCKER (Beta)
139
140**Problem:** Current write-forward implementation assumes all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to interact with communities.
141
142**Current Behavior:**
143- User on `pds.bsky.social` subscribes to community on `coves.social`
144- Coves calls `s.pdsURL` (instance default: `http://localhost:3001`)
145- Write goes to WRONG PDS → fails with 401/403
146
147**Impact:**
148- ✅ **Alpha**: Works fine (single PDS deployment)
149- ❌ **Beta**: Breaks federation (users on different PDSs can't subscribe/interact)
150
151**Root Cause:**
152- [service.go:736](../internal/core/communities/service.go#L736): `createRecordOnPDSAs` hardcodes `s.pdsURL`
153- [service.go:753](../internal/core/communities/service.go#L753): `putRecordOnPDSAs` hardcodes `s.pdsURL`
154- [service.go:767](../internal/core/communities/service.go#L767): `deleteRecordOnPDSAs` hardcodes `s.pdsURL`
155
156**Solution:**
1571. Add identity resolver dependency to `CommunityService`
1582. Before write-forward, resolve user's DID → extract PDS URL
1593. Call user's actual PDS instead of `s.pdsURL`
160
161**Implementation:**
162```go
163// Before write-forward to user's repo:
164userIdentity, err := s.identityResolver.ResolveDID(ctx, userDID)
165if err != nil {
166 return fmt.Errorf("failed to resolve user PDS: %w", err)
167}
168
169// Use user's actual PDS URL
170endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", userIdentity.PDSURL)
171```
172
173**Files to Modify:**
174- `internal/core/communities/service.go` - Add resolver, modify write-forward methods
175- `cmd/server/main.go` - Pass identity resolver to community service constructor
176- Tests - Add cross-PDS scenarios
177
178**Testing:**
179- User on external PDS subscribes to community
180- User on external PDS blocks community
181- Community updates still work (communities ARE on instance PDS)
182
183---
184
185## 🟢 P2: Nice-to-Have
186
187### Remove Categories from Community Lexicon
188**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup
189
190**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value.
191
192**Solution:**
193- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54)
194- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59)
195- Remove from [community.go:91](../internal/core/communities/community.go#L91)
196- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110))
197
198**Impact:** Simplifies lexicon, removes unused feature
199
200---
201
202### Improve .local TLD Error Messages
203**Added:** 2025-10-11 | **Effort:** 1 hour
204
205**Problem:** Generic error "TLD .local is not allowed" confuses developers.
206
207**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`.
208
209---
210
211### Self-Hosting Security Guide
212**Added:** 2025-10-11 | **Effort:** 1 day
213
214**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
215
216---
217
218### OAuth Session Cleanup Race Condition
219**Added:** 2025-10-11 | **Effort:** 2 hours
220
221**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
222
223**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout.
224
225---
226
227### Jetstream Consumer Race Condition
228**Added:** 2025-10-11 | **Effort:** 1 hour
229
230**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown.
231
232**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state.
233
234**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114)
235
236---
237
238## 🔵 P3: Technical Debt
239
240### Consolidate Environment Variable Validation
241**Added:** 2025-10-11 | **Effort:** 2-3 hours
242
243Create `internal/config` package with structured config validation. Fail fast with clear errors.
244
245---
246
247### Add Connection Pooling for PDS HTTP Clients
248**Added:** 2025-10-11 | **Effort:** 2 hours
249
250Create shared `http.Client` with connection pooling instead of new client per request.
251
252---
253
254### Architecture Decision Records (ADRs)
255**Added:** 2025-10-11 | **Effort:** Ongoing
256
257Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
258
259---
260
261### Replace log Package with Structured Logger
262**Added:** 2025-10-11 | **Effort:** 1 day
263
264**Problem:** Using standard `log` package. Need structured logging (JSON) with levels.
265
266**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields.
267
268**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46)
269
270---
271
272### PDS URL Resolution from DID
273**Added:** 2025-10-11 | **Effort:** 2-3 hours
274
275**Problem:** User consumer doesn't resolve PDS URL from DID document when missing.
276
277**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`.
278
279**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203)
280
281---
282
283### PLC Directory Registration (Production)
284**Added:** 2025-10-11 | **Effort:** 1 day
285
286**Problem:** DID generator creates did:plc but doesn't register in prod mode.
287
288**Solution:** Implement PLC registration API call when `isDevEnv=false`.
289
290**Code:** TODO in [did/generator.go:46](../internal/atproto/did/generator.go#L46)
291
292---
293
294## Recent Completions
295
296### ✅ Token Refresh for Community Credentials (2025-10-17)
297**Completed:** Automatic token refresh prevents communities from breaking after 2 hours
298
299**Implementation:**
300- ✅ JWT expiration parsing and refresh detection (5-minute buffer)
301- ✅ Token refresh using Indigo SDK (`atproto.ServerRefreshSession`)
302- ✅ Password fallback when refresh tokens expire (`atproto.ServerCreateSession`)
303- ✅ Atomic credential updates in database (`UpdateCredentials`)
304- ✅ Concurrency-safe with per-community mutex locking
305- ✅ Structured logging for monitoring (`[TOKEN-REFRESH]` events)
306- ✅ Integration tests for expiration detection and credential updates
307
308**Files Created:**
309- [internal/core/communities/token_utils.go](../internal/core/communities/token_utils.go)
310- [internal/core/communities/token_refresh.go](../internal/core/communities/token_refresh.go)
311- [tests/integration/token_refresh_test.go](../tests/integration/token_refresh_test.go)
312
313**Files Modified:**
314- [internal/core/communities/service.go](../internal/core/communities/service.go) - Added `ensureFreshToken` method
315- [internal/core/communities/interfaces.go](../internal/core/communities/interfaces.go) - Added `UpdateCredentials` interface
316- [internal/db/postgres/community_repo.go](../internal/db/postgres/community_repo.go) - Implemented `UpdateCredentials`
317
318**Documentation:** [IMPLEMENTATION_TOKEN_REFRESH.md](../docs/IMPLEMENTATION_TOKEN_REFRESH.md)
319
320**Impact:** Communities now work indefinitely without manual token management
321
322---
323
324### ✅ OAuth Authentication for Community Actions (2025-10-16)
325**Completed:** Full OAuth JWT authentication flow for protected endpoints
326
327**Implementation:**
328- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling)
329- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints
330- ✅ Handler-level DID extraction from JWT tokens via `middleware.GetUserDID(r)`
331- ✅ Removed all X-User-DID header placeholders
332- ✅ E2E tests validate complete OAuth flow with real PDS tokens
333- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs
334
335**Files Modified:**
336- [internal/atproto/auth/jwt.go](../internal/atproto/auth/jwt.go) - JWT parsing with atProto compatibility
337- [internal/api/middleware/auth.go](../internal/api/middleware/auth.go) - Auth middleware
338- [internal/api/handlers/community/](../internal/api/handlers/community/) - All handlers updated
339- [tests/integration/community_e2e_test.go](../tests/integration/community_e2e_test.go) - OAuth E2E tests
340
341**Related:** Also implemented `hostedByDID` auto-population for security (see P1 item above)
342
343---
344
345### ✅ Fix .local TLD Bug (2025-10-11)
346Changed default `INSTANCE_DID` from `did:web:coves.local` → `did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD.
347
348---
349
350## Prioritization
351
352- **P0:** Security vulns, data loss, prod blockers
353- **P1:** Major UX/reliability issues
354- **P2:** QOL improvements, minor bugs, docs
355- **P3:** Refactoring, code quality