GitHub Trending์ถ์ฒ: Hacker News Front์กฐํ์ 1
Show HN: I built a sub-500ms latency voice agent from scratch
By nicktikhonov2026๋
3์ 3์ผ
**Show HN: I built a sub-500ms latency voice agent from scratch**
I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop โ first syllable). Thatโs with full STT โ LLM โ TTS in the loop, clean barge-ins, and no precomputed responses.What moved the needle:Voice is a turn-taking problem, not a transcription problem. VAD alone fails; you need semantic end-of-turn detection.The system reduces to one loop: speaking vs listening. The two transitions - cancel instantly on barge-in, respond instantly on end-of-turn - define the experience.STT โ LLM โ TTS must stream. Sequential pipelines are dead on arrival for natural conversation.TTFT dominates everything...
---
**[devsupporter ํด์ค]**
์ด ๊ธฐ์ฌ๋ Hacker News Front์์ ์ ๊ณตํ๋ ์ต์ ๊ฐ๋ฐ ๋ํฅ์ ๋๋ค. ๊ด๋ จ ๋๊ตฌ๋ ๊ธฐ์ ์ ๋ํด ๋ ์์๋ณด์๋ ค๋ฉด ์๋ณธ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ์ธ์.
I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop โ first syllable). Thatโs with full STT โ LLM โ TTS in the loop, clean barge-ins, and no precomputed responses.What moved the needle:Voice is a turn-taking problem, not a transcription problem. VAD alone fails; you need semantic end-of-turn detection.The system reduces to one loop: speaking vs listening. The two transitions - cancel instantly on barge-in, respond instantly on end-of-turn - define the experience.STT โ LLM โ TTS must stream. Sequential pipelines are dead on arrival for natural conversation.TTFT dominates everything...
---
**[devsupporter ํด์ค]**
์ด ๊ธฐ์ฌ๋ Hacker News Front์์ ์ ๊ณตํ๋ ์ต์ ๊ฐ๋ฐ ๋ํฅ์ ๋๋ค. ๊ด๋ จ ๋๊ตฌ๋ ๊ธฐ์ ์ ๋ํด ๋ ์์๋ณด์๋ ค๋ฉด ์๋ณธ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ์ธ์.
