FreeLLMAPI: OpenAI-Compatible Streaming + Failover Across Google, Groq, and More

If you've ever built an app that depends on a single LLM provider, you know the pain: an outage, rate limit, or sudden pricing change can break everything. You either hard-code a fallback or write custom logic for each provider’s API shape. Enter FreeLLMAPI — a tiny proxy that gives you OpenAI-compatible endpoints with streaming, tool calling, and automatic failover across multiple providers.

What It Does

FreeLLMAPI is a lightweight reverse proxy that sits between your app and LLM backends. You send requests in the standard OpenAI Chat Completions format, and it routes them to providers like Google (Gemini), Groq, Cerebras, or others. If one provider fails or returns an error, it transparently retries the next one. Crucially, it preserves streaming — so your users still see tokens arrive in real time.

The repo is a single-file Python implementation (FastAPI-based) with minimal dependencies. You point it at a config file listing your API keys and provider preferences, and it handles the rest.

Why It’s Cool

Zero code changes for your app. Your existing OpenAI SDK code works — just change the base URL to point at FreeLLMAPI. Tool calls (function calling) also pass through without modification.
Automatic failover with configurable provider priority. Want to use Groq first, then fall back to Google Gemini, then Cerebras? Just define that order in your config file. If a provider is down or returns an error, the request moves to the next one.
Streaming works end-to-end. This is where most proxies break — they accumulate the full response and then send it. FreeLLMAPI streams tokens from the active provider directly to your client, so you keep the real-time UX.
Provider-agnostic tool calling. If your app uses function calling, it works across providers that support it (e.g., Groq, Google). The proxy maps the response format back to OpenAI's schema, so your code never knows there's a different engine underneath.

How to Try It

Clone the repo:

git clone https://github.com/tashfeenahmed/freellmapi
cd freellmapi

Install dependencies:
```
pip install -r requirements.txt
```

Create a config.yaml file (see the example in the repo) with your API keys and provider order:

providers: - name: groq api_key: your_groq_key model: llama3-70b-8192 - name: google api_key: your_google_key model: gemini-1.5-flash

You get OpenAI compatibility with streaming, tool calling, and automatic failove...

README

FreeLLMAPI: OpenAI-Compatible Streaming + Failover Across Google, Groq, and More

What It Does

Why It’s Cool

How to Try It

Join our weekly newsletter

Love discovering amazing projects?