A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt
2
3**Status:** Ongoing
4**Owner:** Platform Team
5**Last Updated:** 2025-10-11
6
7## Overview
8
9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
10
11---
12
13## 🔴 P0: Critical Security
14
15### did:web Domain Verification
16**Added:** 2025-10-11 | **Effort:** 2-3 days | **Severity:** Medium
17
18**Problem:** Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling domain impersonation attacks (e.g., `mario.communities.nintendo.com` on malicious instance).
19
20**Solution:** Implement did:web verification per [atProto spec](https://atproto.com/specs/did-web) - fetch `https://domain/.well-known/did.json` on startup and verify it matches claimed DID. Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode.
21
22**Current Status:**
23- ✅ Default changed from `coves.local` → `coves.social` (fixes `.local` TLD bug)
24- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131)
25- ⚠️ Verification not implemented
26
27---
28
29## 🟡 P1: Important
30
31### Token Refresh Logic for Community Credentials
32**Added:** 2025-10-11 | **Effort:** 1-2 days
33
34**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
35
36**Solution:** Auto-refresh tokens before PDS operations. Parse JWT exp claim, use refresh token when expired, update DB.
37
38**Code:** TODO in [communities/service.go:123](../internal/core/communities/service.go#L123)
39
40---
41
42### OAuth Authentication for Community Actions
43**Added:** 2025-10-11 | **Effort:** 2-3 days
44
45**Problem:** Subscribe/unsubscribe and community creation need authenticated user DID. Currently using placeholder.
46
47**Solution:** Extract authenticated DID from OAuth session context. Requires OAuth middleware integration.
48
49**Code:** Multiple TODOs in [community/subscribe.go](../internal/api/handlers/community/subscribe.go#L46), [community/create.go](../internal/api/handlers/community/create.go#L38)
50
51---
52
53## 🟢 P2: Nice-to-Have
54
55### Improve .local TLD Error Messages
56**Added:** 2025-10-11 | **Effort:** 1 hour
57
58**Problem:** Generic error "TLD .local is not allowed" confuses developers.
59
60**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`.
61
62---
63
64### Self-Hosting Security Guide
65**Added:** 2025-10-11 | **Effort:** 1 day
66
67**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
68
69---
70
71### OAuth Session Cleanup Race Condition
72**Added:** 2025-10-11 | **Effort:** 2 hours
73
74**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
75
76**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout.
77
78---
79
80### Jetstream Consumer Race Condition
81**Added:** 2025-10-11 | **Effort:** 1 hour
82
83**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown.
84
85**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state.
86
87**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114)
88
89---
90
91## 🔵 P3: Technical Debt
92
93### Consolidate Environment Variable Validation
94**Added:** 2025-10-11 | **Effort:** 2-3 hours
95
96Create `internal/config` package with structured config validation. Fail fast with clear errors.
97
98---
99
100### Add Connection Pooling for PDS HTTP Clients
101**Added:** 2025-10-11 | **Effort:** 2 hours
102
103Create shared `http.Client` with connection pooling instead of new client per request.
104
105---
106
107### Architecture Decision Records (ADRs)
108**Added:** 2025-10-11 | **Effort:** Ongoing
109
110Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
111
112---
113
114### Replace log Package with Structured Logger
115**Added:** 2025-10-11 | **Effort:** 1 day
116
117**Problem:** Using standard `log` package. Need structured logging (JSON) with levels.
118
119**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields.
120
121**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46)
122
123---
124
125### PDS URL Resolution from DID
126**Added:** 2025-10-11 | **Effort:** 2-3 hours
127
128**Problem:** User consumer doesn't resolve PDS URL from DID document when missing.
129
130**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`.
131
132**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203)
133
134---
135
136### PLC Directory Registration (Production)
137**Added:** 2025-10-11 | **Effort:** 1 day
138
139**Problem:** DID generator creates did:plc but doesn't register in prod mode.
140
141**Solution:** Implement PLC registration API call when `isDevEnv=false`.
142
143**Code:** TODO in [did/generator.go:46](../internal/atproto/did/generator.go#L46)
144
145---
146
147## Recent Completions
148
149### ✅ Fix .local TLD Bug (2025-10-11)
150Changed default `INSTANCE_DID` from `did:web:coves.local` → `did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD.
151
152---
153
154## Prioritization
155
156- **P0:** Security vulns, data loss, prod blockers
157- **P1:** Major UX/reliability issues
158- **P2:** QOL improvements, minor bugs, docs
159- **P3:** Refactoring, code quality