Implement capability to compare response against different LLM

Currently `promptimize` only evaluate results as pre-defined by developers. We could potentially leverage the current tool, to compare response against different LLMs. For example, we could use GPT-4 as a benchmark, to evaluate the responses from a custom model, on whether they produce similar results. 

- [ ] Add in optional parameter for defining a "target" LLMs.
- [ ] Add in function to compare the "similiarity" of the results
- [ ] Allow user to still utilise manual test cases to compare the two LLMs being evaluated.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement capability to compare response against different LLM #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement capability to compare response against different LLM #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions