GitHub Trending์ถ์ฒ: GitHub Trending Daily All์กฐํ์ 6
p-e-w/heretic
By GitHub Trending Daily All2026๋
2์ 19์ผ
**p-e-w/heretic**
Fully automatic censorship removal for language models Heretic: Fully automatic censorship removal for language models Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna. This approach enables Heretic to work completely automatically. Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model...
---
**[devsupporter ํด์ค]**
์ด ๊ธฐ์ฌ๋ GitHub Trending Daily All์์ ์ ๊ณตํ๋ ์ต์ ๊ฐ๋ฐ ๋ํฅ์ ๋๋ค. ๊ด๋ จ ๋๊ตฌ๋ ๊ธฐ์ ์ ๋ํด ๋ ์์๋ณด์๋ ค๋ฉด ์๋ณธ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ์ธ์.
Fully automatic censorship removal for language models Heretic: Fully automatic censorship removal for language models Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna. This approach enables Heretic to work completely automatically. Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model...
---
**[devsupporter ํด์ค]**
์ด ๊ธฐ์ฌ๋ GitHub Trending Daily All์์ ์ ๊ณตํ๋ ์ต์ ๊ฐ๋ฐ ๋ํฅ์ ๋๋ค. ๊ด๋ จ ๋๊ตฌ๋ ๊ธฐ์ ์ ๋ํด ๋ ์์๋ณด์๋ ค๋ฉด ์๋ณธ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ์ธ์.