The open-source web scraping engine built for production AI agents
T

The open-source web scraping engine built for production AI agents

The open-source web scraping engine built for production AI agents

APIUI
546 stars
N/A forks
N/A contributors

README

Project documentation from GitHub

Reader: The Web Scraping Engine Built for AI Agents

If you've ever tried to feed web data to an AI agent, you know the pain. Raw HTML is messy, full of navigation junk, ads, and scripts. Cleaning it up for an LLM is a chore. What if you could get just the actual content—the article text, the product description, the core data—in a clean, structured format, automatically?

That's exactly what Reader does. It's a new open-source web scraping engine designed from the ground up for production AI agents. It doesn't just fetch HTML; it intelligently extracts the primary readable content and strips away everything else, delivering exactly what your agent needs to process.

What It Does

Reader is a specialized web scraping tool with one primary job: to turn a URL into clean, usable text content. You give it a URL, and it returns a simplified JSON object containing the page's title and its main content, all boiled down to plain text. It handles the parsing, cleaning, and noise removal so you don't have to.

Think of it as a focused, single-purpose API that sits between your agent and the chaotic web, ensuring the agent only gets the signal, not the noise.

Why It's Cool

The magic of Reader is in its simplicity and its specific design choice. It's not trying to be a general-purpose scraper for every use case. It's built for one user: an AI agent.

  • Content-Dedicated Parsing: It uses a combination of heuristics and parsing strategies (like Mozilla's Readability) to identify the core article or content block on a page. This means your AI isn't wasting tokens analyzing "Related Articles" sidebars or cookie consent banners.
  • Clean Text Output: It returns plain text. This is perfect for stuffing into an LLM context window or for further processing. No HTML tags, minimal formatting cruft—just the words that matter.
  • Production-Ready Mindset: The project is built with deployment in mind. It's a self-contained service (with a Dockerfile provided) that you can run, scale, and integrate into your own agent pipelines. It's a reliable component, not just a script.
  • Developer Experience: It's straightforward. A single POST request to /parse with a url gives you back exactly what you need. This reduces cognitive overhead when you're building more complex systems.

How to Try It

Getting started with Reader is straightforward. You can run it locally in a couple of minutes.

First, clone the repository:

git clone https://github.com/vakra-dev/reader
cd reader

The easiest way to run it is using Docker Compose:

docker-comp

Did you like this issue?

Join our weekly newsletter

Related Projects

Love discovering amazing projects?

Help us continue bringing you the best open-source discoveries every week.

Back to Projects
Last updated: Feb 5, 2026