🪻 distributed transcription service thistle.dunkirk.sh

feat: cleanup after murmur jobs

dunkirk.sh b406ff86 0c268ddd

verified
Changed files
+66 -5
src
+37 -1
CRUSH.md
···
VALUES ('test-sub', <user_id>, 'test-customer', 'active');
```
+
## Transcription Service Integration (Murmur)
+
+
The application uses [Murmur](https://github.com/taciturnaxolotl/murmur) as the transcription backend.
+
+
**Murmur API endpoints:**
+
- `POST /transcribe` - Upload audio file and create transcription job
+
- `GET /transcribe/:job_id` - Get job status and transcript (supports `?format=json|vtt`)
+
- `GET /transcribe/:job_id/stream` - Stream real-time progress via Server-Sent Events
+
- `GET /jobs` - List all jobs (newest first)
+
- `DELETE /transcribe/:job_id` - Delete a job from Murmur's database
+
+
**Job synchronization:**
+
The `TranscriptionService` runs periodic syncs to reconcile state between our database and Murmur:
+
- Reconnects to active jobs on server restart
+
- Syncs status updates for processing/transcribing jobs
+
- Handles completed jobs (fetches VTT, cleans transcript, saves to storage)
+
- **Cleans up finished jobs** - After successful completion or failure, jobs are deleted from Murmur
+
- **Cleans up orphaned jobs** - Jobs found in Murmur but not in our database are automatically deleted
+
+
**Job cleanup:**
+
- **Completed jobs**: After fetching transcript and saving to storage, the job is deleted from Murmur
+
- **Failed jobs**: After recording the error in our database, the job is deleted from Murmur
+
- **Orphaned jobs**: Jobs in Murmur but not in our database are deleted on discovery
+
- All deletions use `DELETE /transcribe/:job_id`
+
- This prevents Murmur's database from accumulating stale jobs (Murmur doesn't have automatic cleanup)
+
- Logs success/failure of deletion attempts for monitoring
+
+
**Job lifecycle:**
+
1. User uploads audio → creates transcription in our DB with `status='uploading'`
+
2. Audio uploaded to Murmur → get `whisper_job_id`, update to `status='processing'`
+
3. Murmur transcribes → stream progress updates, update to `status='transcribing'`
+
4. Job completes → fetch VTT, clean with LLM, save transcript, update to `status='completed'`, **delete from Murmur**
+
5. If job fails in Murmur → update to `status='failed'` with error message, **delete from Murmur**
+
+
**Configuration:**
+
Set `WHISPER_SERVICE_URL` in `.env` (default: `http://localhost:8000`)
+
## Future Additions
As the codebase grows, document:
- Database schema and migrations
- API endpoint patterns
- Authentication/authorization approach
-
- Transcription service integration details
- Deployment process
- Environment variables needed
+29 -4
src/lib/transcription.ts
···
}
}
+
private async deleteWhisperJob(jobId: string) {
+
try {
+
const response = await fetch(
+
`${this.serviceUrl}/transcribe/${jobId}`,
+
{
+
method: "DELETE",
+
},
+
);
+
if (response.ok) {
+
console.log(`[Cleanup] Deleted job ${jobId} from Murmur`);
+
} else {
+
console.warn(
+
`[Cleanup] Failed to delete job ${jobId}: ${response.status}`,
+
);
+
}
+
} catch (error) {
+
console.error(`[Cleanup] Error deleting job ${jobId}:`, error);
+
}
+
}
+
private async handleOrphanedWhisperJob(jobId: string) {
// Check if this Murmur job_id exists in our DB (either as id or whisper_job_id)
const jobExists = this.db
···
.get(jobId, jobId);
if (!jobExists) {
-
// Not our job - Murmur will keep it until explicitly deleted
+
// Not our job - delete it from Murmur
console.warn(
-
`[Sync] Found orphaned job ${jobId} in Murmur (not in our DB)`,
+
`[Sync] Found orphaned job ${jobId} in Murmur (not in our DB) - deleting...`,
);
+
await this.deleteWhisperJob(jobId);
}
}
···
status: "completed",
progress: 100,
});
+
+
// Clean up job from Murmur after successful completion
+
await this.deleteWhisperJob(whisperJob.id);
} else if (details.status === "failed") {
const errorMessage = (
details.error_message ?? "Transcription failed"
···
progress: 0,
error_message: errorMessage,
});
+
+
// Clean up failed job from Murmur
+
await this.deleteWhisperJob(whisperJob.id);
}
-
-
// Job persists in Murmur until explicitly deleted - we just sync state
} catch {
console.warn(
`[Sync] Failed to retrieve details for job ${whisperJob.id}`,