You get OpenAI compatibility with streaming, tool calling, and automatic failove...
Y

You get OpenAI compatibility with streaming, tool calling, and automatic failove...

You get OpenAI compatibility with streaming, tool calling, and automatic failove...

14,332 stars
N/A forks
N/A contributors

README

Project documentation from GitHub

FreeLLMAPI: OpenAI-Compatible Streaming + Failover Across Google, Groq, and More

If you've ever built an app that depends on a single LLM provider, you know the pain: an outage, rate limit, or sudden pricing change can break everything. You either hard-code a fallback or write custom logic for each provider’s API shape. Enter FreeLLMAPI — a tiny proxy that gives you OpenAI-compatible endpoints with streaming, tool calling, and automatic failover across multiple providers.

What It Does

FreeLLMAPI is a lightweight reverse proxy that sits between your app and LLM backends. You send requests in the standard OpenAI Chat Completions format, and it routes them to providers like Google (Gemini), Groq, Cerebras, or others. If one provider fails or returns an error, it transparently retries the next one. Crucially, it preserves streaming — so your users still see tokens arrive in real time.

The repo is a single-file Python implementation (FastAPI-based) with minimal dependencies. You point it at a config file listing your API keys and provider preferences, and it handles the rest.

Why It’s Cool

  • Zero code changes for your app. Your existing OpenAI SDK code works — just change the base URL to point at FreeLLMAPI. Tool calls (function calling) also pass through without modification.
  • Automatic failover with configurable provider priority. Want to use Groq first, then fall back to Google Gemini, then Cerebras? Just define that order in your config file. If a provider is down or returns an error, the request moves to the next one.
  • Streaming works end-to-end. This is where most proxies break — they accumulate the full response and then send it. FreeLLMAPI streams tokens from the active provider directly to your client, so you keep the real-time UX.
  • Provider-agnostic tool calling. If your app uses function calling, it works across providers that support it (e.g., Groq, Google). The proxy maps the response format back to OpenAI's schema, so your code never knows there's a different engine underneath.

How to Try It

  1. Clone the repo:

    git clone https://github.com/tashfeenahmed/freellmapi
    cd freellmapi
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Create a config.yaml file (see the example in the repo) with your API keys and provider order:

    providers: - name: groq api_key: your_groq_key model: llama3-70b-8192 - name: google api_key: your_google_key model: gemini-1.5-flash
    

Did you like this issue?

Join our weekly newsletter

Love discovering amazing projects?

Help us continue bringing you the best open-source discoveries every week.

Back to Projects
Last updated: May 24, 2026