Backlog PRD: Platform Improvements & Technical Debt#
Status: Ongoing Owner: Platform Team Last Updated: 2025-10-17
Overview#
Miscellaneous platform improvements, bug fixes, and technical debt that don't fit into feature-specific PRDs.
🔴 P0: Critical (Alpha Blockers)#
OAuth DPoP Token Architecture - Voting Write-Forward#
Added: 2025-11-02 | Completed: 2025-11-02 | Effort: 2 hours | Priority: ALPHA BLOCKER Status: ✅ COMPLETE
Problem: Our backend is attempting to use DPoP-bound OAuth tokens to write votes to users' PDSs, causing "Malformed token" errors. This violates atProto architecture patterns.
Current (Incorrect) Flow:
Mobile Client (OAuth + DPoP) → Coves Backend → User's PDS ❌
↓
"Malformed token" error
Root Cause:
- Mobile app uses OAuth with DPoP (Demonstrating Proof of Possession)
- DPoP tokens are cryptographically bound to client's private key via
cnf.jktclaim - Each PDS request requires both:
Authorization: Bearer <token>DPoP: <signed-proof-jwt>(signature proves client has private key)
- Backend cannot create DPoP proofs (doesn't have client's private key)
- DPoP tokens are intentionally non-transferable (security feature to prevent token theft)
Evidence:
// Token decoded from mobile app session
{
"sub": "did:plc:txrork7rurdueix27ulzi7ke",
"cnf": {
"jkt": "LSWROJhTkPn4yT18xUjiIz2Z7z7l_gozKfjjQTYgW9o" // ← DPoP binding
},
"client_id": "https://lingering-darkness-50a6.brettmay0212.workers.dev/client-metadata.json",
"iss": "http://localhost:3001"
}
atProto Best Practice (from Bluesky social-app analysis):
- ✅ Clients write directly to their own PDS (no backend proxy)
- ✅ AppView only indexes from Jetstream (eventual consistency)
- ✅ PDS = User's personal data store (user controls writes)
- ✅ AppView = Read-only aggregator/indexer
- ❌ Backend should NOT proxy user write operations
Correct Architecture:
Mobile Client → User's PDS (direct write with DPoP proof) ✓
↓
Jetstream (firehose)
↓
Coves AppView (indexes votes from firehose)
Affected Endpoints:
-
Vote Creation - create_vote.go:76
- Currently: Backend writes to PDS using user's token
- Should: Return error directing client to write directly
-
Vote Service - service.go:126
- Currently:
createRecordOnPDSAs()attempts write-forward - Should: Remove write-forward, rely on Jetstream indexing only
- Currently:
Solution Options:
Option A: Client Direct Write (RECOMMENDED - Follows Bluesky)
// Mobile client writes directly (like Bluesky social-app)
const agent = new Agent(oauthSession)
await agent.call('com.atproto.repo.createRecord', {
repo: userDid,
collection: 'social.coves.interaction.vote',
record: {
$type: 'social.coves.interaction.vote',
subject: { uri: postUri, cid: postCid },
direction: 'up',
createdAt: new Date().toISOString()
}
})
Backend changes:
- Remove write-forward code from vote service
- Return error from XRPC endpoint: "Votes must be created directly at your PDS"
- Index votes from Jetstream consumer (already implemented)
Option B: Backend App Passwords (NOT RECOMMENDED)
- User creates app-specific password
- Backend uses password auth (gets regular JWTs, not DPoP)
- Security downgrade, poor UX
Option C: Service Auth Token (Complex)
- Backend gets its own service credentials
- Requires PDS to trust our AppView as delegated writer
- Non-standard atProto pattern
Recommendation: Option A (Client Direct Write)
- Matches atProto architecture
- Follows Bluesky social-app pattern
- Best security (user controls their data)
- Simplest implementation
Implementation Tasks:
- Update Flutter OAuth package to expose
agent.call()for custom lexicons - Update mobile vote UI to write directly to PDS
- Remove write-forward code from backend vote service
- Update vote XRPC handler to return helpful error message
- Verify Jetstream consumer correctly indexes votes
- Update integration tests to match new flow
References:
- Bluesky social-app: Direct PDS writes via agent
- atProto OAuth spec: DPoP binding prevents token reuse
- atProto architecture: AppView = read-only indexer
OAuth DPoP Token Architecture - Community Subscriptions#
Added: 2025-11-02 | Effort: 1-2 hours | Priority: ALPHA BLOCKER Status: 📋 TODO (Waiting for frontend implementation)
Problem: Same DPoP token issue as voting - backend cannot use user's DPoP-bound OAuth tokens to write subscription records to user's PDS.
Affected Operations:
SubscribeToCommunity()- service.go:564-624UnsubscribeFromCommunity()- service.go:626-660
Collection: social.coves.community.subscription
Solution:
Client writes directly using com.atproto.repo.createRecord:
await agent.call('com.atproto.repo.createRecord', {
repo: userDid,
collection: 'social.coves.community.subscription',
record: {
$type: 'social.coves.community.subscription',
subject: communityDid,
contentVisibility: 3,
createdAt: new Date().toISOString()
}
})
Backend Changes Needed:
- Remove write-forward from
SubscribeToCommunity()andUnsubscribeFromCommunity() - Update handlers to return errors directing to client-direct pattern
- Verify Jetstream consumer indexes subscriptions (already working)
Files to Modify:
internal/core/communities/service.gointernal/api/handlers/community/subscribe.go
OAuth DPoP Token Architecture - Community Blocking#
Added: 2025-11-02 | Effort: 1-2 hours | Priority: ALPHA BLOCKER Status: 📋 TODO (Waiting for frontend implementation)
Problem: Same DPoP token issue - backend cannot use user's DPoP-bound OAuth tokens to write block records to user's PDS.
Affected Operations:
BlockCommunity()- service.go:709-781UnblockCommunity()- service.go:783-816
Collection: social.coves.community.block
Solution:
Client writes directly using com.atproto.repo.createRecord:
await agent.call('com.atproto.repo.createRecord', {
repo: userDid,
collection: 'social.coves.community.block',
record: {
$type: 'social.coves.community.block',
subject: communityDid,
createdAt: new Date().toISOString()
}
})
Backend Changes Needed:
- Remove write-forward from
BlockCommunity()andUnblockCommunity() - Update handlers to return errors directing to client-direct pattern
- Verify Jetstream consumer indexes blocks (already working)
Files to Modify:
internal/core/communities/service.gointernal/api/handlers/community/block.go
🟡 P1: Important (Alpha Blockers)#
at-identifier Handle Resolution in Endpoints#
Added: 2025-10-18 | Effort: 2-3 hours | Priority: ALPHA BLOCKER
Problem:
Current implementation rejects handles in endpoints that declare "format": "at-identifier" in their lexicon schemas, violating atProto best practices and breaking legitimate client usage.
Impact:
- ❌ Post creation fails when client sends community handle (e.g.,
!gardening.communities.coves.social) - ❌ Subscribe/unsubscribe endpoints reject handles despite lexicon declaring
at-identifier - ❌ Block endpoints use
"format": "did"but should useat-identifierfor consistency - 🔴 P0 Issue: API contract violation - clients following the schema are rejected
Root Cause:
Handlers and services validate strings.HasPrefix(req.Community, "did:") instead of calling ResolveCommunityIdentifier().
Affected Endpoints:
-
Post Creation - create.go:54, service.go:51
- Lexicon declares
at-identifier: post/create.json:16
- Lexicon declares
-
Subscribe - subscribe.go:52
- Lexicon declares
at-identifier: subscribe.json:16
- Lexicon declares
-
Unsubscribe - subscribe.go:120
- Lexicon declares
at-identifier: unsubscribe.json:16
- Lexicon declares
-
Block/Unblock - block.go:58, block.go:132
- Lexicon declares
"format": "did": block.json:15 - Should be changed to
at-identifierfor consistency and best practice
- Lexicon declares
atProto Best Practice (from docs):
- ✅ API endpoints should accept both DIDs and handles via
at-identifierformat - ✅ Resolve handles to DIDs immediately at API boundary
- ✅ Use DIDs internally for all business logic and storage
- ✅ Handles are weak refs (changeable), DIDs are strong refs (permanent)
- ⚠️ Bidirectional verification required (already handled by
identity.CachingResolver)
Solution:
Replace direct DID validation with handle resolution using existing ResolveCommunityIdentifier():
// BEFORE (wrong) ❌
if !strings.HasPrefix(req.Community, "did:") {
return error
}
// AFTER (correct) ✅
communityDID, err := h.communityService.ResolveCommunityIdentifier(ctx, req.Community)
if err != nil {
if communities.IsNotFound(err) {
writeError(w, http.StatusNotFound, "CommunityNotFound", "Community not found")
return
}
writeError(w, http.StatusBadRequest, "InvalidRequest", err.Error())
return
}
// Now use communityDID (guaranteed to be a DID)
Implementation Plan:
-
✅ Phase 1 (Alpha Blocker): Fix post creation endpoint - COMPLETE (2025-10-18)
- Post creation already uses
ResolveCommunityIdentifier()at service.go:100 - Supports handles, DIDs, and scoped formats
- Post creation already uses
-
📋 Phase 2 (Beta): Fix subscription endpoints
- Update subscribe/unsubscribe handlers
- Add tests for handle resolution in subscriptions
-
✅ Phase 3 (Beta): Fix block endpoints - COMPLETE (2025-11-16)
- Updated block/unblock handlers to use
ResolveCommunityIdentifier() - Accepts handles (
@gaming.community.coves.social), DIDs, and scoped format (!gaming@coves.social) - Added comprehensive tests: block_handle_resolution_test.go
- All 7 test cases passing
- Updated block/unblock handlers to use
Files Modified (Phase 3 - Block Endpoints):
internal/api/handlers/community/block.go- AddedResolveCommunityIdentifier()callstests/integration/block_handle_resolution_test.go- Comprehensive test coverage
Existing Infrastructure:
✅ ResolveCommunityIdentifier() already implemented at service.go:852
✅ identity.CachingResolver handles bidirectional verification and caching
✅ Supports both handle (!name.communities.instance.com) and DID formats
Current Status:
- ✅ Phase 1 (post creation) - Already implemented
- 📋 Phase 2 (subscriptions) - Deferred to Beta (lower priority)
- ✅ Phase 3 (block endpoints) - COMPLETE (2025-11-16)
✅ did:web Domain Verification & hostedByDID Auto-Population - COMPLETE#
Added: 2025-10-11 | Updated: 2025-11-16 | Completed: 2025-11-16 | Status: ✅ DONE
Problem:
- Domain Impersonation: Self-hosters can set
INSTANCE_DID=did:web:nintendo.comwithout owning the domain, enabling attacks where communities appear hosted by trusted domains - hostedByDID Spoofing: Malicious instance operators can modify source code to claim communities are hosted by domains they don't own, enabling reputation hijacking and phishing
Attack Scenarios:
- Malicious instance sets
instanceDID="did:web:coves.social"→ communities show as hosted by official Coves - Federation partners can't verify instance authenticity
- AppView pollution with fake hosting claims
Solution Implemented (Bluesky-Compatible):
- ✅ Domain Matching: Verify
did:web:domain matches configuredinstanceDomain - ✅ Bidirectional Verification: Fetch
https://domain/.well-known/did.jsonand verify:- DID document exists and is valid
- DID document ID matches claimed
instanceDID - DID document claims handle domain in
alsoKnownAsfield (bidirectional binding) - Domain ownership proven via HTTPS hosting (matches Bluesky's trust model)
- ✅ Auto-populate hostedByDID: Removed from client API, derived from instance configuration in service layer
Current Status:
- ✅ Default changed from
coves.local→coves.social(fixes.localTLD bug) - ✅ hostedByDID removed from client requests (2025-10-16)
- ✅ Service layer auto-populates
hostedByDIDfrominstanceDID(2025-10-16) - ✅ Handler rejects client-provided
hostedByDID(2025-10-16) - ✅ Basic validation: Logs warning if
did:web:domain ≠instanceDomain(2025-10-16) - ✅ MANDATORY bidirectional DID verification (2025-11-16)
- ✅ Cache TTL updated to 24h (matches Bluesky recommendations) (2025-11-16)
Implementation Details:
- Security Model: Matches Bluesky's approach - relies on DNS/HTTPS authority, not cryptographic proof
- Enforcement: MANDATORY hard-fail in production (rejects communities with verification failures)
- Dev Mode: Set
SKIP_DID_WEB_VERIFICATION=trueto bypass verification for local development - Performance: Bounded LRU cache (1000 entries), rate limiting (10 req/s), 24h cache TTL
- Bidirectional Check: Prevents impersonation by requiring DID document to claim the handle
- Location: internal/atproto/jetstream/community_consumer.go
✅ Token Refresh Logic for Community Credentials - COMPLETE#
Added: 2025-10-11 | Completed: 2025-10-17 | Effort: 1.5 days | Status: ✅ DONE
Problem: Community PDS access tokens expire (~2hrs). Updates fail until manual intervention.
Solution Implemented:
- ✅ Automatic token refresh before PDS operations (5-minute buffer before expiration)
- ✅ JWT expiration parsing without signature verification (
parseJWTExpiration,needsRefresh) - ✅ Token refresh using Indigo SDK (
atproto.ServerRefreshSession) - ✅ Password fallback when refresh tokens expire (~2 months) via
atproto.ServerCreateSession - ✅ Atomic credential updates (
UpdateCredentialsrepository method) - ✅ Concurrency-safe with per-community mutex locking
- ✅ Structured logging for monitoring (
[TOKEN-REFRESH]events) - ✅ Integration tests for token expiration detection and credential updates
Files Created:
- internal/core/communities/token_utils.go - JWT parsing utilities
- internal/core/communities/token_refresh.go - Refresh and re-auth logic
- tests/integration/token_refresh_test.go - Integration tests
Files Modified:
- internal/core/communities/service.go - Added
ensureFreshToken+ concurrency control - internal/core/communities/interfaces.go - Added
UpdateCredentialsinterface - internal/db/postgres/community_repo.go - Implemented
UpdateCredentials
Documentation: See IMPLEMENTATION_TOKEN_REFRESH.md for full details
Impact: ✅ Communities can now be updated 24+ hours after creation without manual intervention
✅ Subscription Visibility Level (Feed Slider 1-5 Scale) - COMPLETE#
Added: 2025-10-15 | Completed: 2025-10-16 | Effort: 1 day | Status: ✅ DONE
Problem: Users couldn't control how much content they see from each community. Lexicon had contentVisibility (1-5 scale) but code didn't use it.
Solution Implemented:
- ✅ Updated subscribe handler to accept
contentVisibilityparameter (1-5, default 3) - ✅ Store in subscription record on PDS (
social.coves.community.subscription) - ✅ Migration 008 adds
content_visibilitycolumn to database with CHECK constraint - ✅ Clamping at all layers (handler, service, consumer) for defense in depth
- ✅ Atomic subscriber count updates (SubscribeWithCount/UnsubscribeWithCount)
- ✅ Idempotent operations (safe for Jetstream event replays)
- ✅ Fixed critical collection name bug (was using wrong namespace)
- ✅ Production Jetstream consumer now running
- ✅ 13 comprehensive integration tests - all passing
Files Modified:
- Lexicon: subscription.json ✅ Updated to atProto conventions
- Handler: community/subscribe.go ✅ Accepts contentVisibility
- Service: communities/service.go ✅ Clamps and passes to PDS
- Consumer: community_consumer.go ✅ Extracts and indexes
- Repository: community_repo_subscriptions.go ✅ All queries updated
- Migration: 008_add_content_visibility_to_subscriptions.sql ✅ Schema changes
- Tests: subscription_indexing_test.go ✅ Comprehensive coverage
Documentation: See IMPLEMENTATION_SUBSCRIPTION_INDEXING.md for full details
Impact: ✅ Users can now adjust feed volume per community (key feature from DOMAIN_KNOWLEDGE.md enabled)
Community Blocking#
Added: 2025-10-15 | Effort: 1 day | Priority: ALPHA BLOCKER
Problem: Users have no way to block unwanted communities from their feeds.
Solution:
- Lexicon: Extend
social.coves.actor.blockto support community DIDs (currently user-only) - Service: Implement
BlockCommunity(userDID, communityDID)andUnblockCommunity() - Handlers: Add XRPC endpoints
social.coves.community.blockandunblock - Repository: Add methods to track blocked communities
- Feed: Filter blocked communities from feed queries (beta work)
Code:
- Lexicon: actor/block.json - Currently only supports user DIDs
- Service: New methods needed
- Handlers: New files needed
Impact: Users can't avoid unwanted content without blocking
✅ Post comment_count Reconciliation - COMPLETE#
Added: 2025-11-04 | Completed: 2025-11-16 | Effort: 2 hours | Status: ✅ DONE
Problem:
When comments arrive before their parent post is indexed (common with cross-repo Jetstream ordering), the post's comment_count was never reconciled, causing posts to show permanently stale "0 comments" counters.
Solution Implemented:
- ✅ Post consumer reconciliation logic WAS already implemented at post_consumer.go:210-226
- ✅ Reconciliation query counts pre-existing comments when indexing new posts
- ✅ Comprehensive test suite added: post_consumer_test.go
- Single comment before post
- Multiple comments before post
- Mixed before/after ordering
- Idempotent indexing preserves counts
- ✅ Updated outdated FIXME comment at comment_consumer.go:362
- ✅ All 4 test cases passing
Implementation:
// Post consumer reconciliation (lines 210-226)
reconcileQuery := `
UPDATE posts
SET comment_count = (
SELECT COUNT(*)
FROM comments c
WHERE c.parent_uri = $1 AND c.deleted_at IS NULL
)
WHERE id = $2
`
_, reconcileErr := tx.ExecContext(ctx, reconcileQuery, post.URI, postID)
Files Modified:
internal/atproto/jetstream/comment_consumer.go- Updated documentationtests/integration/post_consumer_test.go- Added comprehensive test coverage
Impact: ✅ Post comment counters are now accurate regardless of Jetstream event ordering
🔴 P1.5: Federation Blockers (Beta Launch)#
Cross-PDS Write-Forward Support for Community Service#
Added: 2025-10-17 | Updated: 2025-11-02 | Effort: 3-4 hours | Priority: FEDERATION BLOCKER (Beta)
Problem: Community service write-forward methods assume all users are on the same PDS as the Coves instance. This breaks federation when users from external PDSs try to subscribe/block communities.
Current Behavior:
- User on
pds.bsky.socialsubscribes to community oncoves.social - Coves calls
s.pdsURL(instance default:http://localhost:3001) - Write goes to WRONG PDS → fails with
{"error":"InvalidToken","message":"Malformed token"}
Impact:
- ✅ Alpha: Works fine (single PDS deployment, no federation)
- ❌ Beta: Breaks federation (users on different PDSs can't subscribe/block)
Root Cause:
- service.go:1033:
createRecordOnPDSAshardcodess.pdsURL - service.go:1050:
putRecordOnPDSAshardcodess.pdsURL - service.go:1063:
deleteRecordOnPDSAshardcodess.pdsURL
Affected Operations:
SubscribeToCommunity(service.go:608)UnsubscribeFromCommunity(callsdeleteRecordOnPDSAs)BlockCommunity(service.go:739)UnblockCommunity(callsdeleteRecordOnPDSAs)
Solution:
- Add
identityResolver identity.ResolvertocommunityServicestruct - Before write-forward, resolve user's DID → extract PDS URL
- Call user's actual PDS instead of hardcoded
s.pdsURL
Implementation Pattern (from Vote Service):
// Add helper method to resolve user's PDS
func (s *communityService) resolveUserPDS(ctx context.Context, userDID string) (string, error) {
identity, err := s.identityResolver.Resolve(ctx, userDID)
if err != nil {
return "", fmt.Errorf("failed to resolve user PDS: %w", err)
}
if identity.PDSURL == "" {
log.Printf("[COMMUNITY-PDS] WARNING: No PDS URL found for %s, using fallback: %s", userDID, s.pdsURL)
return s.pdsURL, nil
}
return identity.PDSURL, nil
}
// Update write-forward methods:
func (s *communityService) createRecordOnPDSAs(ctx context.Context, repoDID, collection, rkey string, record map[string]interface{}, accessToken string) (string, string, error) {
// Resolve user's actual PDS (critical for federation)
pdsURL, err := s.resolveUserPDS(ctx, repoDID)
if err != nil {
return "", "", fmt.Errorf("failed to resolve user PDS: %w", err)
}
endpoint := fmt.Sprintf("%s/xrpc/com.atproto.repo.createRecord", strings.TrimSuffix(pdsURL, "/"))
// ... rest of method
}
Files to Modify:
internal/core/communities/service.go- Add resolver field +resolveUserPDShelperinternal/core/communities/service.go- UpdatecreateRecordOnPDSAs,putRecordOnPDSAs,deleteRecordOnPDSAscmd/server/main.go- Pass identity resolver to community service constructor- Tests - Add cross-PDS subscription/block scenarios
Testing:
- User on external PDS subscribes to community → writes to their PDS
- User on external PDS blocks community → writes to their PDS
- Community profile updates still work (writes to community's own PDS)
Related:
- ✅ Vote Service: Fixed in Alpha (2025-11-02) - users can vote from any PDS
- 🔴 Community Service: Deferred to Beta (no federation in Alpha)
🟢 P2: Nice-to-Have#
Remove Categories from Community Lexicon#
Added: 2025-10-15 | Effort: 30 minutes | Priority: Cleanup
Problem: Categories field exists in create/update lexicon but not in profile record. Adds complexity without clear value.
Solution:
- Remove
categoriesfrom create.json - Remove
categoriesfrom update.json - Remove from community.go:91
- Remove from service layer (service.go:109-110)
Impact: Simplifies lexicon, removes unused feature
Improve .local TLD Error Messages#
Added: 2025-10-11 | Effort: 1 hour
Problem: Generic error "TLD .local is not allowed" confuses developers.
Solution: Enhance InvalidHandleError to explain root cause and suggest fixing INSTANCE_DID.
Self-Hosting Security Guide#
Added: 2025-10-11 | Effort: 1 day
Needed: Document did:web setup, DNS config, secrets management, rate limiting, PostgreSQL hardening, monitoring.
OAuth Session Cleanup Race Condition#
Added: 2025-10-11 | Effort: 2 hours
Problem: Cleanup goroutine doesn't handle graceful shutdown, may orphan DB connections.
Solution: Pass cancellable context, handle SIGTERM, add cleanup timeout.
Jetstream Consumer Race Condition#
Added: 2025-10-11 | Effort: 1 hour
Problem: Multiple goroutines can call close(done) concurrently in consumer shutdown.
Solution: Use sync.Once for channel close or atomic flag for shutdown state.
Code: TODO in jetstream/user_consumer.go:114
Unfurl Cache Cleanup Background Job#
Added: 2025-11-07 | Effort: 2-3 hours | Priority: Performance/Maintenance
Problem: The unfurl_cache table will grow indefinitely as expired entries are not deleted. While the cache uses lazy expiration (checking expires_at on read), old records remain in the database consuming disk space.
Impact:
- 📊 ~1KB per cached URL
- 📈 At 10K cached URLs = ~10MB (negligible for alpha)
- ⚠️ At 1M cached URLs = ~1GB (potential issue at scale)
- 🐌 Table bloat can slow down queries over time
Current Mitigation:
- ✅ Lazy expiration: Cache hits check
expires_atand refetch if expired - ✅ Indexed on
expires_atfor efficient expiration queries - ✅ Not critical for alpha (growth is gradual)
Solution (Beta/Production): Implement background cleanup job to delete expired entries:
// Periodic cleanup (run daily or weekly)
func (r *unfurlRepository) CleanupExpired(ctx context.Context) (int64, error) {
query := `DELETE FROM unfurl_cache WHERE expires_at < NOW()`
result, err := r.db.ExecContext(ctx, query)
if err != nil {
return 0, err
}
return result.RowsAffected()
}
Implementation Options:
- Cron job: Separate process runs cleanup on schedule
- Background goroutine: Service-level background task with configurable interval
- PostgreSQL pg_cron extension: Database-level scheduled cleanup
Recommended Approach:
- Phase 1 (Beta): Background goroutine running weekly cleanup
- Phase 2 (Production): Migrate to pg_cron or external cron for reliability
Configuration:
UNFURL_CACHE_CLEANUP_ENABLED=true
UNFURL_CACHE_CLEANUP_INTERVAL=168h # 7 days
Monitoring:
- Log cleanup operations:
[UNFURL-CACHE-CLEANUP] Deleted 1234 expired entries - Track table size growth over time
- Alert if table exceeds threshold (e.g., 100MB)
Files to Create:
internal/core/unfurl/cleanup.go- Background cleanup service
Related:
- Implemented in oEmbed unfurling feature (2025-11-07)
- Cache table: migration XXX_create_unfurl_cache.sql
🔵 P3: Technical Debt#
Consolidate Environment Variable Validation#
Added: 2025-10-11 | Effort: 2-3 hours
Create internal/config package with structured config validation. Fail fast with clear errors.
Add Connection Pooling for PDS HTTP Clients#
Added: 2025-10-11 | Effort: 2 hours
Create shared http.Client with connection pooling instead of new client per request.
Architecture Decision Records (ADRs)#
Added: 2025-10-11 | Effort: Ongoing
Document: did:plc choice, pgcrypto encryption, Jetstream vs firehose, write-forward pattern, single handle field.
Replace log Package with Structured Logger#
Added: 2025-10-11 | Effort: 1 day
Problem: Using standard log package. Need structured logging (JSON) with levels.
Solution: Switch to slog, zap, or zerolog. Add request IDs, context fields.
Code: TODO in community/errors.go:46
PDS URL Resolution from DID#
Added: 2025-10-11 | Effort: 2-3 hours
Problem: User consumer doesn't resolve PDS URL from DID document when missing.
Solution: Query PLC directory for DID document, extract serviceEndpoint.
Code: TODO in jetstream/user_consumer.go:203
Recent Completions#
✅ Token Refresh for Community Credentials (2025-10-17)#
Completed: Automatic token refresh prevents communities from breaking after 2 hours
Implementation:
- ✅ JWT expiration parsing and refresh detection (5-minute buffer)
- ✅ Token refresh using Indigo SDK (
atproto.ServerRefreshSession) - ✅ Password fallback when refresh tokens expire (
atproto.ServerCreateSession) - ✅ Atomic credential updates in database (
UpdateCredentials) - ✅ Concurrency-safe with per-community mutex locking
- ✅ Structured logging for monitoring (
[TOKEN-REFRESH]events) - ✅ Integration tests for expiration detection and credential updates
Files Created:
- internal/core/communities/token_utils.go
- internal/core/communities/token_refresh.go
- tests/integration/token_refresh_test.go
Files Modified:
- internal/core/communities/service.go - Added
ensureFreshTokenmethod - internal/core/communities/interfaces.go - Added
UpdateCredentialsinterface - internal/db/postgres/community_repo.go - Implemented
UpdateCredentials
Documentation: IMPLEMENTATION_TOKEN_REFRESH.md
Impact: Communities now work indefinitely without manual token management
✅ OAuth Authentication for Community Actions (2025-10-16)#
Completed: Full OAuth JWT authentication flow for protected endpoints
Implementation:
- ✅ JWT parser compatible with atProto PDS tokens (aud/iss handling)
- ✅ Auth middleware protecting create/update/subscribe/unsubscribe endpoints
- ✅ Handler-level DID extraction from JWT tokens via
middleware.GetUserDID(r) - ✅ Removed all X-User-DID header placeholders
- ✅ E2E tests validate complete OAuth flow with real PDS tokens
- ✅ Security: Issuer validation supports both HTTPS URLs and DIDs
Files Modified:
- internal/atproto/auth/jwt.go - JWT parsing with atProto compatibility
- internal/api/middleware/auth.go - Auth middleware
- internal/api/handlers/community/ - All handlers updated
- tests/integration/community_e2e_test.go - OAuth E2E tests
Related: Also implemented hostedByDID auto-population for security (see P1 item above)
✅ Fix .local TLD Bug (2025-10-11)#
Changed default INSTANCE_DID from did:web:coves.local → did:web:coves.social. Fixed community creation failure due to disallowed .local TLD.
Prioritization#
- P0: Security vulns, data loss, prod blockers
- P1: Major UX/reliability issues
- P2: QOL improvements, minor bugs, docs
- P3: Refactoring, code quality