Repository Summarizer API

This is a FastAPI-based service that takes a GitHub repository URL and returns a human-readable summary of the project: what it does, what technologies are used, and how it's structured.

It leverages the Nebius Token Factory API (or OpenAI API) to generate the summary.

Prerequisites

Python 3.10 to 3.14
An API Key from Nebius Token Factory (or OpenAI).

Step-by-Step Setup

Clone the project / Extract the archive

# Navigate to the project directory
cd repo-summarizer

Create a virtual environment (optional but recommended)

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Set Environment Variables Set your LLM provider API key. The application requires NEBIUS_API_KEY.
```
export NEBIUS_API_KEY="your-nebius-api-key"
```
(Optional) If you plan to heavily test the API locally and encounter GitHub API rate limits, you can export a GITHUB_TOKEN.
Run the Application Start the FastAPI server using Uvicorn:
```
uvicorn app.main:app --host 0.0.0.0 --port 8000
```

Test the Endpoint The server exposes a POST /summarize endpoint. You can test it using curl:

curl -X POST http://localhost:8000/summarize \
  -H "Content-Type: application/json" \
  -d '{"github_url": "https://github.com/psf/requests"}'

Design Decisions

LLM Model Choice

The application defaults to MiniMax-M2.1 because its agentic logic and interleaved-thinking capabilities are highly effective for cross-referencing multiple configuration files before outputting the final summary.

Handling Repository Contents (Context Management)

Sending an entire codebase to an LLM is both impossible (due to context window limits) and inefficient. My approach centers on Information Density:

GitHub API Ingestion: Instead of doing an expensive git clone, the app uses the GitHub API to fetch the repository tree.
Noise Filtering: We immediately exclude build artifacts (dist, build), environments (venv), and binary files (.png, .pdf, .lock).
High-Priority Selection: We concurrently fetch the raw content of up to 10 key files. We prioritize "Documentation" (README.md), "Dependency Configurations" (package.json, requirements.txt), and "Core Entry Points" (main.py, app.py). These files contain the highest density of information regarding the project's purpose and technology stack.
Context Budgeting: Instead of complex token counting, the app enforces strict character limits on the fetched files (e.g., 10,000 chars for a README, 5,000 for dependencies).
ASCII Tree: The remaining repository structure is converted into a simplified ASCII directory tree (max depth of 3) to provide the LLM with architectural context without token bloat.

This strategy guarantees that the prompt never exceeds standard 8k-16k context limits, runs incredibly fast via asynchronous I/O, and provides the LLM with exactly the right information to generate accurate summaries.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
tests		tests
.gitignore		.gitignore
README.md		README.md
proposal.md		proposal.md
requirements.txt		requirements.txt
sample_output.json		sample_output.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository Summarizer API

Prerequisites

Step-by-Step Setup

Design Decisions

LLM Model Choice

Handling Repository Contents (Context Management)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Repository Summarizer API

Prerequisites

Step-by-Step Setup

Design Decisions

LLM Model Choice

Handling Repository Contents (Context Management)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages