Tendem Evaluation

This repository contains the evaluation dataset and results for Tendem — a hybrid AI+Human system where AI agents handle structured work and Human Experts ensure quality.

Product: tendem.ai
Full Paper: Tendem: A Hybrid AI+Human Agentic Platform

Overview

Tendem combines AI automation with human expertise:

AI Agents execute routine tasks (web browsing, data processing, file operations)
Human Experts verify results, handle ambiguous cases, and ensure quality
Multi-layer QA validates every deliverable before client delivery

We evaluated 94 real-world tasks comparing Tendem against ChatGPT Agent (AI-only) and Upwork freelancers (human-only).

Main Results

System	Quality (% Good)	Median Time (hours)	Median Price (USD)
Tendem	74.5%	16.4	$32
Upwork	53.2%	35.0	$50
ChatGPT Agent	40.4%	0.13	subscription

Key Findings:

+21.3pp higher quality vs Upwork
53% faster delivery than Upwork
36% lower median cost than Upwork

For detailed quality breakdown (Accuracy, Completeness, Style & Formatting), external benchmark results, and methodology, see the full paper.

Repository Structure

tendem-benchmark/
├── input_tasks.jsonl          # 94 task descriptions
├── output_results.jsonl       # Results with quality ratings & timing
├── input_files/               # Input files by task_id
│   └── {task_id}/
└── output_files/              # System outputs
    ├── chatgpt_agent/         # ChatGPT Agent
    ├── tendem/                # Tendem
    └── upwork/                # Upwork freelancers

Quality Scale: Good (client-ready) | Mediocre (needs edits) | Bad (needs rework) | Decline (refused)

Task Distribution

94 tasks across 4 areas:

Operations (28): Data collection, format conversion, automation
Marketing (24): Content creation, competitive research
Analyst (22): Data analysis, dashboards, research
Sales (20): Contact data, enrichment

Contact

Questions? Visit tendem.ai or see the full paper.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
input_files		input_files
output_files		output_files
.gitignore		.gitignore
README.md		README.md
input_tasks.jsonl		input_tasks.jsonl
output_results.jsonl		output_results.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tendem Evaluation

Overview

Main Results

Repository Structure

Task Distribution

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Toloka/tendem-evaluation

Folders and files

Latest commit

History

Repository files navigation

Tendem Evaluation

Overview

Main Results

Repository Structure

Task Distribution

Contact

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages