Include semantic eval tests for testing performance of API design desicisions on LLM calls

Concrete motivation: If we are injecting things into the system prompt that are severely reducing performance, then we should have tests that would capture this behaviour.

These should be small but domain specific semantic eval tests that check the performance of the effectful LLM api against the actual `litellm/openai` API.

These should test specific semantic features/functionalities that our downstream projects rely on, for example:

- generating instances of custom classes
- generating terms in effectful sublanguages
- making use of lexical variables, or tools.

etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include semantic eval tests for testing performance of API design desicisions on LLM calls #581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Include semantic eval tests for testing performance of API design desicisions on LLM calls #581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions