GitHub Trending์ถœ์ฒ˜: Hacker News Front์กฐํšŒ์ˆ˜ 1

Show HN: I built a sub-500ms latency voice agent from scratch

By nicktikhonov
2026๋…„ 3์›” 3์ผ
**Show HN: I built a sub-500ms latency voice agent from scratch**

I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop โ†’ first syllable). Thatโ€™s with full STT โ†’ LLM โ†’ TTS in the loop, clean barge-ins, and no precomputed responses.What moved the needle:Voice is a turn-taking problem, not a transcription problem. VAD alone fails; you need semantic end-of-turn detection.The system reduces to one loop: speaking vs listening. The two transitions - cancel instantly on barge-in, respond instantly on end-of-turn - define the experience.STT โ†’ LLM โ†’ TTS must stream. Sequential pipelines are dead on arrival for natural conversation.TTFT dominates everything...

---

**[devsupporter ํ•ด์„ค]**

์ด ๊ธฐ์‚ฌ๋Š” Hacker News Front์—์„œ ์ œ๊ณตํ•˜๋Š” ์ตœ์‹  ๊ฐœ๋ฐœ ๋™ํ–ฅ์ž…๋‹ˆ๋‹ค. ๊ด€๋ จ ๋„๊ตฌ๋‚˜ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ๋” ์•Œ์•„๋ณด์‹œ๋ ค๋ฉด ์›๋ณธ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.