-1
src/lib/transcription.ts
-1
src/lib/transcription.ts
+53
-52
src/lib/vtt-cleaner.test.ts
+53
-52
src/lib/vtt-cleaner.test.ts
···So with this course packet, what quiz is and exams, and if I can study through here, what you talk about?···
···So with this course packet, what quiz is and exams, and if I can study through here, what you talk about?···
+97
-105
src/lib/vtt-cleaner.ts
+97
-105
src/lib/vtt-cleaner.ts
·········-// Attempt LLM-driven cleaning and paragraphing in one request, fallback to deterministic rules-// Use paragraph-based ID: "Paragraph N-M" where N is paragraph number, M is segment within paragraph-`[VTTCleaner] Completed for ${transcriptionId}: ${cleanedSegments.length} segments in ${paragraphBoundaries.length} paragraphs`,
·········+console.warn("[VTTCleaner] LLM configuration incomplete (need LLM_API_KEY, LLM_API_BASE_URL, LLM_MODEL), returning uncleaned VTT");+Use the format "Paragraph X-Y" where X is the paragraph number and Y is the segment number within that paragraph:+I want you to preserve sentences across paragraph breaks moving whatever is the smallest amount out to its own segment block.+Also go through and rewrite the words to extract the meaning and not necessarily the exact phrasing if it sounds unnatural when written. I want the text to remain lined up with the original though so don't rewrite entire paragraphs but you can remove ums, alrights, and similar. Also remove all contextual tags like [background noise]. Add punctuation if it's missing to make the text readable. If there is no more context to fit a segment then just skip it and move to the next one.+Return ONLY the VTT content starting with "WEBVTT" and nothing else. No explanations or additional text.`;