GitHub Trending출처: Hacker News Front조회수 46

Show HN: I built a sub-500ms latency voice agent from scratch

By nicktikhonov

2026년 3월 3일

**Show HN: I built a sub-500ms latency voice agent from scratch**

I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop → first syllable). That’s with full STT → LLM → TTS in the loop, clean barge-ins, and no precomputed responses.What moved the needle:Voice is a turn-taking problem, not a transcription problem. VAD alone fails; you need semantic end-of-turn detection.The system reduces to one loop: speaking vs listening. The two transitions - cancel instantly on barge-in, respond instantly on end-of-turn - define the experience.STT → LLM → TTS must stream. Sequential pipelines are dead on arrival for natural conversation.TTFT dominates everything...

---

**[devsupporter 해설]**

이 기사는 Hacker News Front에서 제공하는 최신 개발 동향입니다. 관련 도구나 기술에 대해 더 알아보시려면 원본 링크를 참고하세요.

원본 보기

목록으로 돌아가기

Welcome back

Show HN: I built a sub-500ms latency voice agent from scratch

DevSupporter

Categories

게시글 정보