DINOS is a framework for evaluating instruction-following capabilities in Large Language Models (LLMs). It provides a robust, verifiable benchmark focused on compositional constraints and precise validation.
DINOS generates and validates instruction tasks across multiple categories:
- Mathematical expressions and calculations
- Text format constraints (Fibonacci length, isograms, etc.)
- Logical operations and boolean expressions
- Multi-constraint compositions
The framework is designed to:
- Generate verifiable instructions deterministically from a seed
- Evaluate LLM responses automatically
- Provide detailed performance analytics