GitHub Trending์ถœ์ฒ˜: GitHub Trending Daily All์กฐํšŒ์ˆ˜ 6

p-e-w/heretic

By GitHub Trending Daily All
2026๋…„ 2์›” 19์ผ
**p-e-w/heretic**

Fully automatic censorship removal for language models Heretic: Fully automatic censorship removal for language models Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna. This approach enables Heretic to work completely automatically. Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model...

---

**[devsupporter ํ•ด์„ค]**

์ด ๊ธฐ์‚ฌ๋Š” GitHub Trending Daily All์—์„œ ์ œ๊ณตํ•˜๋Š” ์ตœ์‹  ๊ฐœ๋ฐœ ๋™ํ–ฅ์ž…๋‹ˆ๋‹ค. ๊ด€๋ จ ๋„๊ตฌ๋‚˜ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ๋” ์•Œ์•„๋ณด์‹œ๋ ค๋ฉด ์›๋ณธ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.