A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt
2
3**Status:** Ongoing
4**Owner:** Platform Team
5**Last Updated:** 2025-10-11
6
7## Overview
8
9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
10
11---
12
13## 🔴 P0: Critical Security
14
15### did:web Domain Verification
16**Added:** 2025-10-11 | **Effort:** 2-3 days | **Severity:** Medium
17
18**Problem:** Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling domain impersonation attacks (e.g., `mario.communities.nintendo.com` on malicious instance).
19
20**Solution:** Implement did:web verification per [atProto spec](https://atproto.com/specs/did-web) - fetch `https://domain/.well-known/did.json` on startup and verify it matches claimed DID. Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode.
21
22**Current Status:**
23- ✅ Default changed from `coves.local` → `coves.social` (fixes `.local` TLD bug)
24- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131)
25- ⚠️ Verification not implemented
26
27---
28
29## 🟡 P1: Important (Alpha Blockers)
30
31### Token Refresh Logic for Community Credentials
32**Added:** 2025-10-11 | **Effort:** 1-2 days | **Priority:** ALPHA BLOCKER
33
34**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
35
36**Solution:** Auto-refresh tokens before PDS operations. Parse JWT exp claim, use refresh token when expired, update DB.
37
38**Code:** TODO in [communities/service.go:123](../internal/core/communities/service.go#L123)
39
40---
41
42### OAuth Authentication for Community Actions
43**Added:** 2025-10-11 | **Effort:** 2-3 days | **Priority:** ALPHA BLOCKER
44
45**Problem:** Subscribe/unsubscribe and community creation need authenticated user DID. Currently using placeholder.
46
47**Solution:** Extract authenticated DID from OAuth session context. Requires OAuth middleware integration.
48
49**Code:** Multiple TODOs in [community/subscribe.go](../internal/api/handlers/community/subscribe.go#L46), [community/create.go](../internal/api/handlers/community/create.go#L38), [community/update.go](../internal/api/handlers/community/update.go#L47)
50
51---
52
53### Subscription Visibility Level (Feed Slider 1-5 Scale)
54**Added:** 2025-10-15 | **Effort:** 4-6 hours | **Priority:** ALPHA BLOCKER
55
56**Problem:** Users can't control how much content they see from each community. Lexicon has `contentVisibility` (1-5 scale) but code doesn't use it.
57
58**Solution:**
59- Update subscribe handler to accept `contentVisibility` parameter (1-5, default 3)
60- Store in subscription record on PDS
61- Update feed generation to respect visibility level (beta work, but data structure needed now)
62
63**Code:**
64- Lexicon: [subscription.json:28-34](../internal/atproto/lexicon/social/coves/actor/subscription.json#L28-L34) ✅ Ready
65- Handler: [community/subscribe.go](../internal/api/handlers/community/subscribe.go) - Add parameter
66- Service: [communities/service.go:373-376](../internal/core/communities/service.go#L373-L376) - Add to record
67
68**Impact:** Without this, users have no way to adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md)
69
70---
71
72### Community Blocking
73**Added:** 2025-10-15 | **Effort:** 1 day | **Priority:** ALPHA BLOCKER
74
75**Problem:** Users have no way to block unwanted communities from their feeds.
76
77**Solution:**
781. **Lexicon:** Extend `social.coves.actor.block` to support community DIDs (currently user-only)
792. **Service:** Implement `BlockCommunity(userDID, communityDID)` and `UnblockCommunity()`
803. **Handlers:** Add XRPC endpoints `social.coves.community.block` and `unblock`
814. **Repository:** Add methods to track blocked communities
825. **Feed:** Filter blocked communities from feed queries (beta work)
83
84**Code:**
85- Lexicon: [actor/block.json](../internal/atproto/lexicon/social/coves/actor/block.json) - Currently only supports user DIDs
86- Service: New methods needed
87- Handlers: New files needed
88
89**Impact:** Users can't avoid unwanted content without blocking
90
91---
92
93## 🟢 P2: Nice-to-Have
94
95### Remove Categories from Community Lexicon
96**Added:** 2025-10-15 | **Effort:** 30 minutes | **Priority:** Cleanup
97
98**Problem:** Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value.
99
100**Solution:**
101- Remove `categories` from [create.json](../internal/atproto/lexicon/social/coves/community/create.json#L46-L54)
102- Remove `categories` from [update.json](../internal/atproto/lexicon/social/coves/community/update.json#L51-L59)
103- Remove from [community.go:91](../internal/core/communities/community.go#L91)
104- Remove from service layer ([service.go:109-110](../internal/core/communities/service.go#L109-L110))
105
106**Impact:** Simplifies lexicon, removes unused feature
107
108---
109
110### Improve .local TLD Error Messages
111**Added:** 2025-10-11 | **Effort:** 1 hour
112
113**Problem:** Generic error "TLD .local is not allowed" confuses developers.
114
115**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`.
116
117---
118
119### Self-Hosting Security Guide
120**Added:** 2025-10-11 | **Effort:** 1 day
121
122**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
123
124---
125
126### OAuth Session Cleanup Race Condition
127**Added:** 2025-10-11 | **Effort:** 2 hours
128
129**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
130
131**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout.
132
133---
134
135### Jetstream Consumer Race Condition
136**Added:** 2025-10-11 | **Effort:** 1 hour
137
138**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown.
139
140**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state.
141
142**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114)
143
144---
145
146## 🔵 P3: Technical Debt
147
148### Consolidate Environment Variable Validation
149**Added:** 2025-10-11 | **Effort:** 2-3 hours
150
151Create `internal/config` package with structured config validation. Fail fast with clear errors.
152
153---
154
155### Add Connection Pooling for PDS HTTP Clients
156**Added:** 2025-10-11 | **Effort:** 2 hours
157
158Create shared `http.Client` with connection pooling instead of new client per request.
159
160---
161
162### Architecture Decision Records (ADRs)
163**Added:** 2025-10-11 | **Effort:** Ongoing
164
165Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
166
167---
168
169### Replace log Package with Structured Logger
170**Added:** 2025-10-11 | **Effort:** 1 day
171
172**Problem:** Using standard `log` package. Need structured logging (JSON) with levels.
173
174**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields.
175
176**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46)
177
178---
179
180### PDS URL Resolution from DID
181**Added:** 2025-10-11 | **Effort:** 2-3 hours
182
183**Problem:** User consumer doesn't resolve PDS URL from DID document when missing.
184
185**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`.
186
187**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203)
188
189---
190
191### PLC Directory Registration (Production)
192**Added:** 2025-10-11 | **Effort:** 1 day
193
194**Problem:** DID generator creates did:plc but doesn't register in prod mode.
195
196**Solution:** Implement PLC registration API call when `isDevEnv=false`.
197
198**Code:** TODO in [did/generator.go:46](../internal/atproto/did/generator.go#L46)
199
200---
201
202## Recent Completions
203
204### ✅ Fix .local TLD Bug (2025-10-11)
205Changed default `INSTANCE_DID` from `did:web:coves.local` → `did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD.
206
207---
208
209## Prioritization
210
211- **P0:** Security vulns, data loss, prod blockers
212- **P1:** Major UX/reliability issues
213- **P2:** QOL improvements, minor bugs, docs
214- **P3:** Refactoring, code quality