dkvit/uuid_062a1210-a952-48be-9d8d-f02c5c276682.json at main · anil.recoil.org/thicket-eeg

Thicket data repository for the EEG
thicket-eeg / dkvit / uuid_062a1210-a952-48be-9d8d-f02c5c276682.json
at main 5.5 kB view raw
 1{
 2  "id": "urn:uuid:062a1210-a952-48be-9d8d-f02c5c276682",
 3  "title": "Week 3",
 4  "link": "https://dakpro.github.io/project_feeds/low_power_speech_recognition/week3",
 5  "updated": "2025-07-30T10:59:00",
 6  "published": "2025-08-10T19:12:43.303312",
 7  "summary": "<h2>Week 3</h2>\n<p>(Note: this blog will be updated throughout the week)</p>\n<p><code>nvidia/parakeet-tdt_ctc-110m</code> - cannot be run on rPi:\nwhen trying to run the program, it just exits after a while at the point of importing the\n<code>nemo.collections.asr</code>.</p>\n<p>As discovered later on, all nvidia models require nvidia GPU to run. Thus we are left with\n<code>moonhsine</code>.</p>\n<p>Also came across <code>vosk</code> and <code>faster-whisper</code> which are interesting to try.</p>\n<h3>Results and Comparison:</h3>\n<h4>Moonshine tiny</h4>\n<p>And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.</p>\n<h4>Moonshine/base</h4>\n<p>And so my fellow Americans ask not what your country can do for you ask what you can do for your country</p>\n\n\n    \n         Model\n         11s transcription time \n         Word Error Rate \n    \n    \n       whisper.cpp/base \n       21 s \n       10.32 \n    \n    \n       whisper.cpp/base-Q4_K \n       12.6 s \n       -- \n    \n    \n       Moonshine/base \n       2.76 s \n       9.99 \n    \n    \n       whisper.cpp/tiny \n       8.3 s \n       12.81 \n    \n    \n       Moonshine/tiny \n       1.48 s \n       12.65 \n    \n\n\n<h3>Connecting microphone to rPi</h3>\n<p>Just connect it via USB.\nRun <code>arecord -l</code> to see information about connected audio devices, say card X and device Y.</p>\n<p>To make it a default audio input device (strongly recommended), add this into ~/.asoundrc:</p>\n<pre><code>\npcm.!default{\n    type hw\n    card X\n}\n\nctl.!default{\n    type hw\n    card X\n}\n</code></pre>\n\n<p>You can test it with</p>\n<pre><code>\n# record\narecord -D plughw:X,Y -f cd -t wav -d 5 test.wav\n# play\naplay test.wav\n</code></pre>\n\n<h3>Moonshine in streaming mode</h3>\n<p>Simple demo:</p>\n<pre><code>\ngit clone https://github.com/moonshine-ai/moonshine\nuv pip install numba\nuv pip install -r moonshine/demo/moonshine-onnx/requirements.txt\nsudo apt update\nsudo apt upgrade -y\nsudo apt install -y portaudio19-dev\n# run:\npython3 moonshine/demo/moonshine-onnx/live_captions.py\n</code></pre>\n\n\n\n\n<h3>Testing on realisticly long audios</h3>\nDatasets used for the <a href=\"https://huggingface.co/spaces/hf-audio/open_asr_leaderboard\">model leaderboard</a>\n![Models](week3.png)\n\n<p>From the listed above, I chose SPGISpeech, Earnings-22, and AMI for evalutaion of a model, as the model will be mostly used during meetings.</p>\n<p>The raw datasets are can be included</p>",
 8  "content": "<h2>Week 3</h2>\n<p>(Note: this blog will be updated throughout the week)</p>\n<p><code>nvidia/parakeet-tdt_ctc-110m</code> - cannot be run on rPi:\nwhen trying to run the program, it just exits after a while at the point of importing the\n<code>nemo.collections.asr</code>.</p>\n<p>As discovered later on, all nvidia models require nvidia GPU to run. Thus we are left with\n<code>moonhsine</code>.</p>\n<p>Also came across <code>vosk</code> and <code>faster-whisper</code> which are interesting to try.</p>\n<h3>Results and Comparison:</h3>\n<h4>Moonshine tiny</h4>\n<p>And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.</p>\n<h4>Moonshine/base</h4>\n<p>And so my fellow Americans ask not what your country can do for you ask what you can do for your country</p>\n\n\n    \n         Model\n         11s transcription time \n         Word Error Rate \n    \n    \n       whisper.cpp/base \n       21 s \n       10.32 \n    \n    \n       whisper.cpp/base-Q4_K \n       12.6 s \n       -- \n    \n    \n       Moonshine/base \n       2.76 s \n       9.99 \n    \n    \n       whisper.cpp/tiny \n       8.3 s \n       12.81 \n    \n    \n       Moonshine/tiny \n       1.48 s \n       12.65 \n    \n\n\n<h3>Connecting microphone to rPi</h3>\n<p>Just connect it via USB.\nRun <code>arecord -l</code> to see information about connected audio devices, say card X and device Y.</p>\n<p>To make it a default audio input device (strongly recommended), add this into ~/.asoundrc:</p>\n<pre><code>\npcm.!default{\n    type hw\n    card X\n}\n\nctl.!default{\n    type hw\n    card X\n}\n</code></pre>\n\n<p>You can test it with</p>\n<pre><code>\n# record\narecord -D plughw:X,Y -f cd -t wav -d 5 test.wav\n# play\naplay test.wav\n</code></pre>\n\n<h3>Moonshine in streaming mode</h3>\n<p>Simple demo:</p>\n<pre><code>\ngit clone https://github.com/moonshine-ai/moonshine\nuv pip install numba\nuv pip install -r moonshine/demo/moonshine-onnx/requirements.txt\nsudo apt update\nsudo apt upgrade -y\nsudo apt install -y portaudio19-dev\n# run:\npython3 moonshine/demo/moonshine-onnx/live_captions.py\n</code></pre>\n\n\n\n\n<h3>Testing on realisticly long audios</h3>\nDatasets used for the <a href=\"https://huggingface.co/spaces/hf-audio/open_asr_leaderboard\">model leaderboard</a>\n![Models](week3.png)\n\n<p>From the listed above, I chose SPGISpeech, Earnings-22, and AMI for evalutaion of a model, as the model will be mostly used during meetings.</p>\n<p>The raw datasets are can be included</p>",
 9  "content_type": "html",
10  "categories": [],
11  "source": "https://dakpro.github.io/project_feeds/low_power_speech_recognition/feed.xml"
12}