🅱️ig 🅱️enchmark

A solution for benchmarking many LLMs under many different configurations in parallel on Modal.

Setup

Install dependencies

pip install -e .

Run and plot multiple benchmarks

To run multiple benchmarks at once, first deploy the Datasette UI, which will let you easily view the results later:

(cd src && modal deploy -m big_benchmark);

Then, start a benchmark suite from a configuration file:

bb configs/llama3.yaml

Once the suite has finished, you will be given a URL to a UI where you can view your results, and a command to download a JSONL file with your results.

Contributing

We welcome contributions, including those that add tuned benchmarks to our collection. See the CONTRIBUTING file and the Getting Started document for more details on contributing to Big Benchmark.

License

Big Benchmark is available under the MIT license. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
src/big_benchmark		src/big_benchmark
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🅱️ig 🅱️enchmark

Setup

Install dependencies

Run and plot multiple benchmarks

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

modal-labs/big-benchmark

Folders and files

Latest commit

History

Repository files navigation

🅱️ig 🅱️enchmark

Setup

Install dependencies

Run and plot multiple benchmarks

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages