Skip to content

Include semantic eval tests for testing performance of API design desicisions on LLM calls #581

@kiranandcode

Description

@kiranandcode

Concrete motivation: If we are injecting things into the system prompt that are severely reducing performance, then we should have tests that would capture this behaviour.

These should be small but domain specific semantic eval tests that check the performance of the effectful LLM api against the actual litellm/openai API.

These should test specific semantic features/functionalities that our downstream projects rely on, for example:

  • generating instances of custom classes
  • generating terms in effectful sublanguages
  • making use of lexical variables, or tools.

etc.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions