An open-source tool to crawl and scrape top social media platforms.
A

An open-source tool to crawl and scrape top social media platforms.

An open-source tool to crawl and scrape top social media platforms.

54,625 stars
N/A forks
N/A contributors

README

Project documentation from GitHub

MediaCrawler: Your Open-Source Toolkit for Social Media Data

Ever needed to gather data from social media platforms for a project, but found yourself stuck between restrictive official APIs and the murky waters of unreliable scrapers? It's a common developer headache. You want clean, structured data without jumping through endless hoops or worrying about your setup breaking with every platform update.

Enter MediaCrawler, an open-source tool that aims to cut through that frustration. It’s a Python-based crawler and scraper specifically built for top social media platforms, giving developers a transparent and customizable way to collect public data.

What It Does

MediaCrawler is a toolkit for programmatically extracting public data from several major social media platforms. Think of it as a unified, scriptable interface for data collection. You can point it at a target—like a specific user, hashtag, or trend—and it will handle the logic of navigating the platform, dealing with pagination, and parsing the HTML to return structured data (like posts, timestamps, engagement metrics, and media links) in a usable format, typically JSON.

Why It's Cool

The real appeal here is the open-source, developer-centric approach. Instead of a black-box service, you get a Python codebase you can inspect, modify, and extend. This is huge for a few reasons:

  • Transparency & Control: You see exactly how the data is being fetched and parsed. No hidden costs or surprise changes to terms.
  • Customizability: Need to extract a specific field or adapt to a slight change in a website's layout? You can modify the scraper logic directly.
  • Local-First: It runs on your machine or server. Your data pipeline isn't dependent on a third-party service's uptime or rate limits (though you must still respect the target platforms' robots.txt and terms of service).
  • Multi-Platform: Having a single tool that can handle multiple platforms with a somewhat consistent methodology can simplify projects that need data from more than one source.

It's a practical tool for developers building anything that needs social data as a feedstock—think research projects, trend analysis dashboards, content aggregators, or archival tools.

How to Try It

Getting started is straightforward if you're comfortable with Python and Git.

  1. Clone the repo:
    git clone https://github.com/NanmiCoder/MediaCrawler.git
    cd MediaCrawler
    
  2. Set up a virtual environment (recommended) and install the dependencies:
    pip install -r requirements.txt
    

Did you like this issue?

Join our weekly newsletter

Love discovering amazing projects?

Help us continue bringing you the best open-source discoveries every week.

Back to Projects
Last updated: Dec 26, 2025