GitHub Trending출처: GitHub Trending Daily All조회수 42

p-e-w/heretic

By GitHub Trending Daily All

2026년 2월 19일

**p-e-w/heretic**

Fully automatic censorship removal for language models Heretic: Fully automatic censorship removal for language models Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna. This approach enables Heretic to work completely automatically. Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model...

---

**[devsupporter 해설]**

이 기사는 GitHub Trending Daily All에서 제공하는 최신 개발 동향입니다. 관련 도구나 기술에 대해 더 알아보시려면 원본 링크를 참고하세요.

원본 보기

목록으로 돌아가기

Welcome back

p-e-w/heretic

DevSupporter

Categories

게시글 정보