Here's a Tool That Uncensors Language Models

We've all been there—you're working with a language model, asking it a perfectly reasonable technical or creative question, and you get hit with the "I cannot answer that" response. Whether it's about security testing, controversial historical topics, or just pushing the boundaries of a creative story, model censorship can be a real blocker for developers trying to build and experiment.

Enter Heretic. It's a minimalist Python tool with a straightforward, almost cheeky, premise: remove the built-in censorship from language models. It's not about bypassing security for malicious purposes; it's about giving developers and researchers the raw, unfiltered output of a model to understand its true capabilities and limitations.

What It Does

In simple terms, Heretic acts as a middleware layer between you and a language model's API (like OpenAI's). It intercepts the prompts you send and the responses you receive, stripping out the system instructions that typically enforce content policies. The result is a model that answers your questions directly, without the pre-programmed moral or ethical guardrails.

Think of it as having a conversation with the model's underlying intelligence, not its corporate-mandated persona.

Why It's Cool

The clever part is in its simplicity. Heretic doesn't require fine-tuning, model surgery, or complex jailbreak prompts. It works by manipulating the conversation history that's sent to the API. When you use a chat model, your entire conversation—including hidden system messages—is sent with each new query. Heretic finds and neutralizes those hidden "you are a helpful and harmless assistant" directives.

This approach makes it:

Model-agnostic: It should work with any chat-based LLM API that uses a similar system-prompt structure.
Lightweight: It's just a Python script. No heavy dependencies or infrastructure needed.
Transparent: It exposes the often opaque layer of policy enforcement, which is valuable for AI safety research and understanding model behavior.

For developers, this is a powerful tool for testing. You can stress-test a model's knowledge on sensitive topics, see how it handles edge-case creative writing, or simply explore what the base model "really thinks" before its output gets sanitized for public consumption.

How to Try It

Getting started is straightforward. You'll need Python and an OpenAI API key (or another supported provider's key).

Clone the repository:

git clone https://github.com/p-e-w/heretic
cd heretic

Install the single dependency:
```
pip install openai
```

A minimalist Python tool to remove censorship from any language model

README

Here's a Tool That Uncensors Language Models

What It Does

Why It's Cool

How to Try It

Join our weekly newsletter

Love discovering amazing projects?