Skip to content
@Toloka

Toloka

Data labeling platform for ML

Pinned Loading

  1. tendem-evaluation tendem-evaluation Public

    Tendem hybrid AI+Human system benchmarking

    Python 2

  2. beemo beemo Public

    Benchmark for fine-grained machine-generated text detection. 6.5k texts written by humans, generated by ten open-source instruction-finetuned LLMs and edited by expert annotators.

    8 1

  3. u-math u-math Public

    Official evaluation code for the U-MATH and μ-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of LLMs on university-level problems.

    Python 10 3

  4. crowd-kit crowd-kit Public

    Control the quality of your labeled data with the Python tools you already know.

    Python 236 19

Repositories

Showing 10 of 30 repositories
  • crowd-kit Public

    Control the quality of your labeled data with the Python tools you already know.

    Toloka/crowd-kit’s past year of commit activity
    Python 236 19 2 2 Updated Dec 1, 2025
  • tendem-evaluation Public

    Tendem hybrid AI+Human system benchmarking

    Toloka/tendem-evaluation’s past year of commit activity
    Python 2 0 0 0 Updated Nov 19, 2025
  • .github Public

    Niceties for GitHub

    Toloka/.github’s past year of commit activity
    0 0 0 0 Updated Nov 18, 2025
  • dbt-af Public

    Distributed run of dbt models using Airflow

    Toloka/dbt-af’s past year of commit activity
    Python 167 15 1 1 Updated Nov 15, 2025
  • pg-queue-playground Public

    Playground for transactional queues in PostgreSQL

    Toloka/pg-queue-playground’s past year of commit activity
    Java 4 0 0 0 Updated Sep 18, 2025
  • primeape Public

    Multilingual human preference prediction and explanation

    Toloka/primeape’s past year of commit activity
    0 0 0 0 Updated Sep 8, 2025
  • surveying-prof-writers-on-ai Public

    Questionnaire and results of surveying professional authors

    Toloka/surveying-prof-writers-on-ai’s past year of commit activity
    4 MIT 0 0 0 Updated Jul 30, 2025
  • u-math Public

    Official evaluation code for the U-MATH and μ-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of LLMs on university-level problems.

    Toloka/u-math’s past year of commit activity
    Python 10 MIT 3 0 0 Updated Jun 10, 2025
  • dbxio Public

    High-level Databricks client

    Toloka/dbxio’s past year of commit activity
    Python 13 0 1 0 Updated Mar 5, 2025
  • beemo Public

    Benchmark for fine-grained machine-generated text detection. 6.5k texts written by humans, generated by ten open-source instruction-finetuned LLMs and edited by expert annotators.

    Toloka/beemo’s past year of commit activity
    8 MIT 1 0 1 Updated Feb 4, 2025