Thicket data repository for the EEG
1{
2 "id": "urn:uuid:062a1210-a952-48be-9d8d-f02c5c276682",
3 "title": "Week 3",
4 "link": "https://dakpro.github.io/project_feeds/low_power_speech_recognition/week3",
5 "updated": "2025-07-30T10:59:00",
6 "published": "2025-08-10T19:12:43.303312",
7 "summary": "<h2>Week 3</h2>\n<p>(Note: this blog will be updated throughout the week)</p>\n<p><code>nvidia/parakeet-tdt_ctc-110m</code> - cannot be run on rPi:\nwhen trying to run the program, it just exits after a while at the point of importing the\n<code>nemo.collections.asr</code>.</p>\n<p>As discovered later on, all nvidia models require nvidia GPU to run. Thus we are left with\n<code>moonhsine</code>.</p>\n<p>Also came across <code>vosk</code> and <code>faster-whisper</code> which are interesting to try.</p>\n<h3>Results and Comparison:</h3>\n<h4>Moonshine tiny</h4>\n<p>And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.</p>\n<h4>Moonshine/base</h4>\n<p>And so my fellow Americans ask not what your country can do for you ask what you can do for your country</p>\n\n\n \n Model\n 11s transcription time \n Word Error Rate \n \n \n whisper.cpp/base \n 21 s \n 10.32 \n \n \n whisper.cpp/base-Q4_K \n 12.6 s \n -- \n \n \n Moonshine/base \n 2.76 s \n 9.99 \n \n \n whisper.cpp/tiny \n 8.3 s \n 12.81 \n \n \n Moonshine/tiny \n 1.48 s \n 12.65 \n \n\n\n<h3>Connecting microphone to rPi</h3>\n<p>Just connect it via USB.\nRun <code>arecord -l</code> to see information about connected audio devices, say card X and device Y.</p>\n<p>To make it a default audio input device (strongly recommended), add this into ~/.asoundrc:</p>\n<pre><code>\npcm.!default{\n type hw\n card X\n}\n\nctl.!default{\n type hw\n card X\n}\n</code></pre>\n\n<p>You can test it with</p>\n<pre><code>\n# record\narecord -D plughw:X,Y -f cd -t wav -d 5 test.wav\n# play\naplay test.wav\n</code></pre>\n\n<h3>Moonshine in streaming mode</h3>\n<p>Simple demo:</p>\n<pre><code>\ngit clone https://github.com/moonshine-ai/moonshine\nuv pip install numba\nuv pip install -r moonshine/demo/moonshine-onnx/requirements.txt\nsudo apt update\nsudo apt upgrade -y\nsudo apt install -y portaudio19-dev\n# run:\npython3 moonshine/demo/moonshine-onnx/live_captions.py\n</code></pre>\n\n\n\n\n<h3>Testing on realisticly long audios</h3>\nDatasets used for the <a href=\"https://huggingface.co/spaces/hf-audio/open_asr_leaderboard\">model leaderboard</a>\n\n\n<p>From the listed above, I chose SPGISpeech, Earnings-22, and AMI for evalutaion of a model, as the model will be mostly used during meetings.</p>\n<p>The raw datasets are can be included</p>",
8 "content": "<h2>Week 3</h2>\n<p>(Note: this blog will be updated throughout the week)</p>\n<p><code>nvidia/parakeet-tdt_ctc-110m</code> - cannot be run on rPi:\nwhen trying to run the program, it just exits after a while at the point of importing the\n<code>nemo.collections.asr</code>.</p>\n<p>As discovered later on, all nvidia models require nvidia GPU to run. Thus we are left with\n<code>moonhsine</code>.</p>\n<p>Also came across <code>vosk</code> and <code>faster-whisper</code> which are interesting to try.</p>\n<h3>Results and Comparison:</h3>\n<h4>Moonshine tiny</h4>\n<p>And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.</p>\n<h4>Moonshine/base</h4>\n<p>And so my fellow Americans ask not what your country can do for you ask what you can do for your country</p>\n\n\n \n Model\n 11s transcription time \n Word Error Rate \n \n \n whisper.cpp/base \n 21 s \n 10.32 \n \n \n whisper.cpp/base-Q4_K \n 12.6 s \n -- \n \n \n Moonshine/base \n 2.76 s \n 9.99 \n \n \n whisper.cpp/tiny \n 8.3 s \n 12.81 \n \n \n Moonshine/tiny \n 1.48 s \n 12.65 \n \n\n\n<h3>Connecting microphone to rPi</h3>\n<p>Just connect it via USB.\nRun <code>arecord -l</code> to see information about connected audio devices, say card X and device Y.</p>\n<p>To make it a default audio input device (strongly recommended), add this into ~/.asoundrc:</p>\n<pre><code>\npcm.!default{\n type hw\n card X\n}\n\nctl.!default{\n type hw\n card X\n}\n</code></pre>\n\n<p>You can test it with</p>\n<pre><code>\n# record\narecord -D plughw:X,Y -f cd -t wav -d 5 test.wav\n# play\naplay test.wav\n</code></pre>\n\n<h3>Moonshine in streaming mode</h3>\n<p>Simple demo:</p>\n<pre><code>\ngit clone https://github.com/moonshine-ai/moonshine\nuv pip install numba\nuv pip install -r moonshine/demo/moonshine-onnx/requirements.txt\nsudo apt update\nsudo apt upgrade -y\nsudo apt install -y portaudio19-dev\n# run:\npython3 moonshine/demo/moonshine-onnx/live_captions.py\n</code></pre>\n\n\n\n\n<h3>Testing on realisticly long audios</h3>\nDatasets used for the <a href=\"https://huggingface.co/spaces/hf-audio/open_asr_leaderboard\">model leaderboard</a>\n\n\n<p>From the listed above, I chose SPGISpeech, Earnings-22, and AMI for evalutaion of a model, as the model will be mostly used during meetings.</p>\n<p>The raw datasets are can be included</p>",
9 "content_type": "html",
10 "categories": [],
11 "source": "https://dakpro.github.io/project_feeds/low_power_speech_recognition/feed.xml"
12}