Ferret v2: Scrape the Web Declaratively with Go
If you've ever tried to scrape a modern website using Go, you know it can be a pain. Dynamic content, JavaScript-rendered pages, and awkward APIs make simple tasks surprisingly complex. That's where Ferret v2 comes in.
Ferret is a declarative web data system built specifically for Go developers. You describe what data you want from a webpage, and Ferret handles the how — including waiting for JavaScript to load, navigating DOM elements, and extracting structured results.
What It Does
Ferret lets you write short, declarative queries (called "FQL" — Ferret Query Language) that extract data from web pages. Your query describes the data structure you want, and Ferret uses a headless Chromium browser under the hood to render the page and scrape it.
Think of it like SQL for the web, but with awareness of dynamic pages. You can scrape everything from simple static HTML to complex single-page apps that load content via XHR or WebSockets.
Why It's Cool
Declarative interface — You don't write JavaScript or Go boilerplate. You just describe the data shape and CSS selectors:
FETCH h1, p FROM documentBuilt for dynamic content — Many scraping tools fail on JS-heavy sites. Ferret runs a real browser engine, so it waits for network requests, async rendering, and even WebSockets to finish before extracting.
Go-native — It's not a wrapper around a Python library. Ferret is written in Go with a Go API, making it easy to integrate into existing Go pipelines, web scrapers, or data processing services.
Composable queries — You can chain queries, parse pagination, and even execute JavaScript inside page context for advanced interactions.
No headless browser setup — Just
go get github.com/MontFerret/ferretand you get a full Chromium driver baked in. No manual installs.
How to Try It
Install Ferret and its CLI:
go get github.com/MontFerret/ferret/cmd/ferret
Run a quick example:
ferret exec "LET doc = DOCUMENT('https://example.com'); RETURN doc.title"
Or write a .fql file and execute it:
LET doc = DOCUMENT('https://news.ycombinator.com')
LET stories = ELEMENTS(doc, '.storylink') FOR story IN stories RETURN { title: INNER_