production server interacting with specific whatsapp service
* We now record audio via vad which is inserted into the page, which allows detection of voice activity (auto end when stopped speaking instead of pressing a button to stop) * The audio is sent to the server, which sends it to open ai Whisper - which sends backs a transcription. the server then handle the transcription like before and sends back the response ---- > VAD library used: https://github.com/ricky0123/vad (which makes https://github.com/snakers4/silero-vad accessible in browser) > > openai reference: https://platform.openai.com/docs/api-reference/audio/createTranscription?lang=node ## cons: * potentially worse voice detection on chrome (fixable by not using this method on chrome) * big wasm files need caching via webworker (merge #12 first)
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by Araxeus and has received 2 comments.