code
Clone this repository
https://tangled.org/bretton.dev/coves
git@knot.bretton.dev:bretton.dev/coves
For self-hosted knots, clone URLs may differ based on your setup.
Integrates hostedBy verification into the server with environment-based
configuration for development and production use.
Changes:
- Added SKIP_DID_WEB_VERIFICATION env var for dev mode bypass
- Updated consumer initialization with instance DID and skip flag
- Added warning logs when verification is disabled
- Configured .env.dev with skip flag enabled for local development
Server logs will now show:
- "⚠️ WARNING: did:web verification DISABLED (dev mode)" when skipped
- "🚨 SECURITY: Rejecting community" when domain mismatch detected
Production Deployment:
- Set SKIP_DID_WEB_VERIFICATION=false or leave unset
- Ensure .well-known/did.json is properly configured
Co-Authored-By: Claude <noreply@anthropic.com>
Implements hostedBy verification to prevent domain impersonation attacks
where malicious instances claim to host communities for domains they don't
own (e.g., gaming@nintendo.com on non-Nintendo servers).
Core Implementation:
- Added verifyHostedByClaim() to validate hostedBy domain matches handle
- Integrated golang.org/x/net/publicsuffix for proper eTLD+1 extraction
- Supports multi-part TLDs (.co.uk, .com.au, .org.uk, etc.)
- Added verifyDIDDocument() for .well-known/did.json verification
- Bounded LRU cache (max 1000 entries) prevents memory leaks
- Thread-safe operations (no deadlock risk)
- HTTP client connection pooling for performance
- Rate limiting (10 req/sec) prevents DoS attacks
- 15-second timeout prevents consumer blocking
- Cache TTL cleanup removes expired entries
Security Features:
- Hard-fail on domain mismatch (blocks indexing)
- Soft-fail on .well-known errors (network resilience)
- Skip verification flag for development mode
- Optimized struct field alignment for performance
Breaking Changes: None
- Constructor signature updated but all tests migrated
Co-Authored-By: Claude <noreply@anthropic.com>
Implements automatic refresh of community PDS access tokens to prevent
401 errors after 2-hour token expiration. Includes comprehensive security
hardening through multiple review iterations.
## Core Features
- Proactive token refresh (5-minute buffer before expiration)
- Automatic fallback to password re-auth when refresh tokens expire
- Concurrent-safe per-community mutex protection
- Atomic credential updates with retry logic
- Comprehensive structured logging for observability
## Security Hardening (3 Review Rounds)
### Round 1: Initial PR Review Fixes
- Added DB update retry logic (3 attempts, exponential backoff)
- Improved error detection with typed xrpc.Error checking
- Added comprehensive unit tests (8 test cases for NeedsRefresh)
- Enhanced logging for JWT parsing failures
- Memory-bounded mutex cache with warning threshold
### Round 2: Critical Race Condition Fixes
- **CRITICAL:** Eliminated race condition in mutex eviction
- Removed eviction entirely to prevent mutex map corruption
- Added read-lock fast path for performance
- Implemented double-check locking pattern
- **CRITICAL:** Fixed test-production code path mismatch
- Eliminated wrapper function, single exported NeedsRefresh()
- Tests now validate actual production code
### Round 3: Code Quality & Linting
- Fixed struct field alignment (8-byte memory optimization)
- Removed unused functions (splitToken)
- Added proper error handling for deferred Close() calls
- All golangci-lint checks passing
## Implementation Details
**Token Refresh Flow:**
1. Check if access token expires within 5 minutes
2. Acquire per-community mutex (prevent concurrent refresh)
3. Re-fetch from DB (double-check pattern)
4. Attempt refresh using refresh token
5. Fallback to password re-auth if refresh token expired
6. Update DB atomically with retry logic (3 attempts)
7. Return updated community with fresh credentials
**Concurrency Safety:**
- Per-community mutexes (non-blocking for different communities)
- Double-check pattern prevents duplicate refreshes
- Atomic DB updates (access + refresh token together)
- Refresh tokens are single-use (atproto spec compliance)
**Files Changed:**
- internal/core/communities/service.go - Main orchestration
- internal/core/communities/token_refresh.go - Indigo SDK integration
- internal/core/communities/token_utils.go - JWT parsing utilities
- internal/core/communities/interfaces.go - Repository interface
- internal/db/postgres/community_repo.go - UpdateCredentials method
- tests/integration/token_refresh_test.go - Comprehensive tests
- docs/PRD_BACKLOG.md - Documented Alpha blocker resolution
- docs/PRD_COMMUNITIES.md - Updated with token refresh feature
## Testing
- 8 unit tests for token expiration detection (all passing)
- Integration tests for UpdateCredentials (all passing)
- E2E test framework ready for PDS integration
- All linters passing (golangci-lint)
- Build verification successful
## Observability
Structured logging with events:
- token_refresh_started, token_refreshed
- refresh_token_expired, password_fallback_success
- db_update_retry, token_parse_failed
- CRITICAL alerts for lockout conditions
## Risk Mitigation
Before: 🔴 HIGH RISK - Communities lockout after 2 hours
After: 🟢 LOW RISK - Automatic refresh with multiple safety layers
- Race conditions: ELIMINATED (no mutex eviction)
- DB failures: MITIGATED (3-retry with exponential backoff)
- Refresh token expiry: HANDLED (password fallback)
- Test coverage: COMPREHENSIVE (unit + integration)
- Memory leaks: PREVENTED (warning at 10k communities, acceptable at 1M)
## Production Ready
✅ All critical issues resolved
✅ All tests passing
✅ All linters passing
✅ Comprehensive error handling
✅ Security hardened through 3 review rounds
Resolves Alpha blocker: Communities can now be updated indefinitely
without manual token management.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix P1 issue: properly bubble up database errors instead of masking as conflict
* Only return ErrBlockAlreadyExists when getErr is ErrBlockNotFound (race condition)
* Real DB errors (outages, connection failures) now propagate to operators
- Remove unused V1 functions flagged by linter:
* createRecordOnPDS, deleteRecordOnPDS, callPDS (replaced by *As versions)
- Apply automatic code formatting via golangci-lint --fix:
* Align struct field tags in CommunityBlock
* Fix comment alignment across test files
* Remove trailing whitespace
- All tests passing, linter clean
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>