CLI-native way to semantically grep everything, like code, images, pdfs and more
C

CLI-native way to semantically grep everything, like code, images, pdfs and more

CLI-native way to semantically grep everything, like code, images, pdfs and more

CLI
4,283 stars
N/A forks
N/A contributors

README

Project documentation from GitHub

mgrep: The CLI Tool That Greps Everything, Semantically

Ever found yourself grepping through a codebase for a specific concept, only to realize you need to search through PDFs, images, or documents too? Or maybe you've tried to find "that one function" but can only remember what it does, not its exact name. Traditional grep hits a wall when the search needs to be about meaning, not just raw text.

Enter mgrep from mixedbread-ai. It's a CLI-native tool that brings semantic search to your terminal, letting you grep not just code, but virtually any file type—using the meaning behind your words.

What It Does

mgrep is a command-line tool that performs semantic search across your files. You give it a natural language query (like "function that validates user login"), and it finds relevant content in your code, text files, PDFs, images, and more. It works by generating embeddings (vector representations) of both your query and your file contents, then finding the closest matches. It's grep, but for ideas and concepts.

Why It's Cool

The magic of mgrep isn't just that it searches semantically; it's that it does so across mixed modalities from your terminal. You can point it at a directory and it will intelligently process different file types using the appropriate models.

  • Truly Multi-Format: It uses dedicated models for different content. Code, text, PDF text, and images are all encoded into the same vector space, so you can search across all of them with one query.
  • CLI-Native: It feels like a classic Unix tool. Pipe it, redirect it, use it in your scripts. It slots right into a developer's existing workflow.
  • Offline-First (mostly): While it can use cloud embedding APIs for high performance, it also supports local, offline models, keeping your data private.
  • Smart Chunking: It breaks down large documents and images into meaningful chunks before creating embeddings, so your results are precise and relevant, not just a whole-file match.

Imagine searching your project for "database schema diagram" and having it return both the schema.sql file and the whiteboard screenshot you saved in your docs/ folder. That's the power mgrep unlocks.

How to Try It

Getting started is straightforward. You'll need Python (3.9+).

  1. Install it:

    pip install mgrep
    
  2. Run your first semantic search: The simplest way is to use the default, free API (requires an internet con

Did you like this issue?

Join our weekly newsletter

Related Projects

Love discovering amazing projects?

Help us continue bringing you the best open-source discoveries every week.

Back to Projects
Last updated: Mar 17, 2026