A community based topic aggregation platform built on atproto
1# Backlog PRD: Platform Improvements & Technical Debt 2 3**Status:** Ongoing 4**Owner:** Platform Team 5**Last Updated:** 2025-10-11 6 7## Overview 8 9Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs. 10 11--- 12 13## 🔴 P0: Critical Security 14 15### did:web Domain Verification 16**Added:** 2025-10-11 | **Effort:** 2-3 days | **Severity:** Medium 17 18**Problem:** Self-hosters can set `INSTANCE_DID=did:web:nintendo.com` without owning the domain, enabling domain impersonation attacks (e.g., `mario.communities.nintendo.com` on malicious instance). 19 20**Solution:** Implement did:web verification per [atProto spec](https://atproto.com/specs/did-web) - fetch `https://domain/.well-known/did.json` on startup and verify it matches claimed DID. Add `SKIP_DID_WEB_VERIFICATION=true` for dev mode. 21 22**Current Status:** 23- ✅ Default changed from `coves.local``coves.social` (fixes `.local` TLD bug) 24- ✅ TODO comment in [cmd/server/main.go:126-131](../cmd/server/main.go#L126-L131) 25- ⚠️ Verification not implemented 26 27--- 28 29## 🟡 P1: Important 30 31### Token Refresh Logic for Community Credentials 32**Added:** 2025-10-11 | **Effort:** 1-2 days 33 34**Problem:** Community PDS access tokens expire (~2hrs). Updates fail until manual intervention. 35 36**Solution:** Auto-refresh tokens before PDS operations. Parse JWT exp claim, use refresh token when expired, update DB. 37 38**Code:** TODO in [communities/service.go:123](../internal/core/communities/service.go#L123) 39 40--- 41 42### OAuth Authentication for Community Actions 43**Added:** 2025-10-11 | **Effort:** 2-3 days 44 45**Problem:** Subscribe/unsubscribe and community creation need authenticated user DID. Currently using placeholder. 46 47**Solution:** Extract authenticated DID from OAuth session context. Requires OAuth middleware integration. 48 49**Code:** Multiple TODOs in [community/subscribe.go](../internal/api/handlers/community/subscribe.go#L46), [community/create.go](../internal/api/handlers/community/create.go#L38) 50 51--- 52 53## 🟢 P2: Nice-to-Have 54 55### Improve .local TLD Error Messages 56**Added:** 2025-10-11 | **Effort:** 1 hour 57 58**Problem:** Generic error "TLD .local is not allowed" confuses developers. 59 60**Solution:** Enhance `InvalidHandleError` to explain root cause and suggest fixing `INSTANCE_DID`. 61 62--- 63 64### Self-Hosting Security Guide 65**Added:** 2025-10-11 | **Effort:** 1 day 66 67**Needed:** Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring. 68 69--- 70 71### OAuth Session Cleanup Race Condition 72**Added:** 2025-10-11 | **Effort:** 2 hours 73 74**Problem:** Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections. 75 76**Solution:** Pass cancellable context, handle SIGTERM, add cleanup timeout. 77 78--- 79 80### Jetstream Consumer Race Condition 81**Added:** 2025-10-11 | **Effort:** 1 hour 82 83**Problem:** Multiple goroutines can call `close(done)` concurrently in consumer shutdown. 84 85**Solution:** Use `sync.Once` for channel close or atomic flag for shutdown state. 86 87**Code:** TODO in [jetstream/user_consumer.go:114](../internal/atproto/jetstream/user_consumer.go#L114) 88 89--- 90 91## 🔵 P3: Technical Debt 92 93### Consolidate Environment Variable Validation 94**Added:** 2025-10-11 | **Effort:** 2-3 hours 95 96Create `internal/config` package with structured config validation. Fail fast with clear errors. 97 98--- 99 100### Add Connection Pooling for PDS HTTP Clients 101**Added:** 2025-10-11 | **Effort:** 2 hours 102 103Create shared `http.Client` with connection pooling instead of new client per request. 104 105--- 106 107### Architecture Decision Records (ADRs) 108**Added:** 2025-10-11 | **Effort:** Ongoing 109 110Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field. 111 112--- 113 114### Replace log Package with Structured Logger 115**Added:** 2025-10-11 | **Effort:** 1 day 116 117**Problem:** Using standard `log` package. Need structured logging (JSON) with levels. 118 119**Solution:** Switch to `slog`, `zap`, or `zerolog`. Add request IDs, context fields. 120 121**Code:** TODO in [community/errors.go:46](../internal/api/handlers/community/errors.go#L46) 122 123--- 124 125### PDS URL Resolution from DID 126**Added:** 2025-10-11 | **Effort:** 2-3 hours 127 128**Problem:** User consumer doesn't resolve PDS URL from DID document when missing. 129 130**Solution:** Query PLC directory for DID document, extract `serviceEndpoint`. 131 132**Code:** TODO in [jetstream/user_consumer.go:203](../internal/atproto/jetstream/user_consumer.go#L203) 133 134--- 135 136### PLC Directory Registration (Production) 137**Added:** 2025-10-11 | **Effort:** 1 day 138 139**Problem:** DID generator creates did:plc but doesn't register in prod mode. 140 141**Solution:** Implement PLC registration API call when `isDevEnv=false`. 142 143**Code:** TODO in [did/generator.go:46](../internal/atproto/did/generator.go#L46) 144 145--- 146 147## Recent Completions 148 149### ✅ Fix .local TLD Bug (2025-10-11) 150Changed default `INSTANCE_DID` from `did:web:coves.local``did:web:coves.social`. Fixed community creation failure due to disallowed `.local` TLD. 151 152--- 153 154## Prioritization 155 156- **P0:** Security vulns, data loss, prod blockers 157- **P1:** Major UX/reliability issues 158- **P2:** QOL improvements, minor bugs, docs 159- **P3:** Refactoring, code quality