FairBite is a dataset-level representation bias auditing tool for dataset owners, data scientists, and ML teams. It identifies sensitive attributes, evaluates subgroup coverage, and helps find representation gaps before model training.
This repository is a monorepo containing:
fairbite-frontend/— React web application for dataset review and audit workflows.fairbite-backend/— FastAPI backend for dataset ingestion, sensitive characteristic analysis, and representation audit orchestration.evaluation/— expert annotations, sensitive attribute evaluation artifacts, and evaluation results.
FairBite uses a configurable LLM to identify and score sensitive dataset columns during the backend data-processing pipeline. The provider, model, and API key are set via environment variables — no code changes needed to switch between providers.
- Node.js and npm for frontend development.
- Python 3.11+ for the backend.
- An API key for your chosen LLM provider (not required for Ollama).
- Clone the repository:
git clone https://github.com/your-org/fairbite.git
cd fairbite- Install backend dependencies:
cd fairbite-backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt- Install frontend dependencies:
cd ..\fairbite-frontend
npm installCopy fairbite-backend/.env.example to fairbite-backend/.env and fill in the values for your chosen provider:
cp fairbite-backend/.env.example fairbite-backend/.envThe required variables are:
| Variable | Description |
|---|---|
LLM_PROVIDER |
Provider name: gemini, openai, anthropic, or ollama |
LLM_MODEL |
Model name as accepted by the provider (e.g. gemini-2.5-flash, gpt-4o, mistral) |
LLM_API_KEY |
API key for the provider — not required for Ollama |
LLM_TEMPERATURE |
Sampling temperature (default: 0) |
OLLAMA_BASE_URL |
Ollama server URL (default: http://localhost:11434) — only for Ollama |
See fairbite-backend/.env.example for ready-to-use examples for each provider.
The Docker setup uses only fairbite-backend/ and fairbite-frontend/ as build
contexts. The evaluation/ directory is not copied into either image.
From the repository root:
docker compose up --buildThen open:
- Frontend:
http://localhost:3000 - Backend health check:
http://localhost:8000/health
The frontend image is built with REACT_APP_API_BASE=http://localhost:8000 by
default. To point it at a different backend URL, rebuild with:
$env:REACT_APP_API_BASE = "http://your-backend-host:8000"
docker compose up --buildTo stop the app:
docker compose downFrom fairbite-backend/:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The backend exposes endpoints for dataset ingestion, status checks, reports, and representation audits.
From fairbite-frontend/:
npm startThe React app will launch on http://localhost:3000 and communicate with the backend.
Build the frontend production bundle from fairbite-frontend/:
npm run buildThere is no centralized root build command; the frontend and backend are managed separately inside the monorepo.
fairbite-frontend/— React SPA, component files, icons, and UI assets.fairbite-backend/— FastAPI server, dataset processing service, and Google Gemini integration.evaluation/— results, expert annotations, and scripts used for dataset evaluation.
Key backend endpoints:
GET /health— health check.POST /datasets— submit a Croissant dataset URL for processing.GET /datasets/{dataset_id}— check dataset processing status.GET /datasets/{dataset_id}/report— retrieve the completed dataset report.POST /datasets/{dataset_id}/representation-audit— run a representation audit on a completed dataset.GET /representation-audits/{audit_id}— check audit status.GET /representation-audits/{audit_id}/results— retrieve audit results.
FairBite supports multiple LLM providers out of the box: Google Gemini, OpenAI, Anthropic, and Ollama (local). The provider is selected at runtime via the LLM_PROVIDER environment variable — no code changes needed.
Edit fairbite-backend/.env and set LLM_PROVIDER, LLM_MODEL, and LLM_API_KEY. Restart the backend.
If the provider you want is not in the list above, you can add it in two steps inside fairbite-backend/llm_provider.py:
1. Create a class that extends LLMProvider:
class MyProvider(LLMProvider):
def __init__(self):
# Initialize your client here using os.environ values
self._client = MySDK(api_key=os.environ["LLM_API_KEY"])
self._model = os.environ["LLM_MODEL"]
def _generate(self, prompt: str, system_msg: str) -> str:
# Call your provider’s API and return the raw response as a string
response = self._client.complete(model=self._model, prompt=prompt)
return response.textNote: implement _generate, not generate_content. The base class wraps _generate to automatically strip <think>...</think> blocks produced by reasoning models.
2. Register it in the PROVIDERS dict:
PROVIDERS = {
"gemini": GeminiProvider,
"openai": OpenAIProvider,
"anthropic": AnthropicProvider,
"ollama": OllamaProvider,
"myprovider": MyProvider, # add your entry here
}Then set LLM_PROVIDER=myprovider in .env and restart.
- This repository is intentionally structured as a monorepo.
- The frontend and backend are developed and run separately.
- When using Ollama with Docker, set
OLLAMA_BASE_URL=http://host.docker.internal:11434so the container can reach the host.