NodeBench benchmarks decentralized AI nodes to verify their actual hardware (e.g., GPU vs CPU) using only public API response times and throughput.
Embeddings endpoints are preferred for benchmarking because they are more stable and deterministic than completions, making them ideal for performance fingerprinting.
- Sends large, diverse prompts to each node’s embeddings endpoint.
- Measures throughput (tokens/sec) and latency for each request.
- Benchmarks run with configurable concurrency, prompt size, and burst patterns.
- All nodes use the same set of prompts, generated once (typically by the second node in your list), for fair comparison.
- Results show per-node metrics and percent differences.
- Hardware class is inferred from response time and throughput under load.
- High-end GPUs should show much higher throughput and lower latency than CPUs.
- Low GPU utilization or similar CPU/GPU results suggest CPU-bound or misconfigured nodes.
Note: Results may be affected by server throttling, caching, or network bottlenecks.
python nodebench.py --urls <urlA>,<urlB>,...<urlN>For recent updates, see docs/UPDATES.md.