Project Templates출처: Show HN조회수 52

Show HN: oMLX – Native Mac inference server that persists KV cache to SSD

By jundot

2026년 2월 20일

**Show HN: oMLX – Native Mac inference server that persists KV cache to SSD**

I built an open-source LLM inference server optimized for Apple Silicon. The main motivation was coding agents - tools like Claude Code send requests where the context prefix keeps shifting, invalidating KV cache. A few turns later the agent circles back, and your Mac has to re-prefill the entire context from scratch.oMLX solves this with paged SSD caching. Every KV cache block is persisted to disk. When a previous prefix returns, it's restored instantly instead of being recomputed...

---

**[devsupporter 해설]**

이 기사는 Show HN에서 제공하는 최신 개발 동향입니다. 관련 도구나 기술에 대해 더 알아보시려면 원본 링크를 참고하세요.

원본 보기

목록으로 돌아가기

Welcome back

Show HN: oMLX – Native Mac inference server that persists KV cache to SSD

DevSupporter

Categories

게시글 정보