Skip to content

Run the typos tool against the codebase #8560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docs/api/optimizers/MIPROv2.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,6 @@ These steps are broken down in more detail below:

2) **Propose Instruction Candidates**. The instruction proposer includes (1) a generated summary of properties of the training dataset, (2) a generated summary of your LM program's code and the specific predictor that an instruction is being generated for, (3) the previously bootstrapped few-shot examples to show reference inputs / outputs for a given predictor and (4) a randomly sampled tip for generation (i.e. "be creative", "be concise", etc.) to help explore the feature space of potential instructions. This context is provided to a `prompt_model` which writes high quality instruction candidates.

3) **Find an Optimized Combination of Few-Shot Examples & Instructions**. Finally, we use Bayesian Optimization to choose which combinations of instructions and demonstrations work best for each predictor in our program. This works by running a series of `num_trials` trials, where a new set of prompts are evaluated over our validation set at each trial. The new set of prompts are only evaluated on a minibatch of size `minibatch_size` at each trial (when `minibatch`=`True`). The best averaging set of prompts is then evalauted on the full validation set every `minibatch_full_eval_steps`. At the end of the optimization process, the LM program with the set of prompts that performed best on the full validation set is returned.
3) **Find an Optimized Combination of Few-Shot Examples & Instructions**. Finally, we use Bayesian Optimization to choose which combinations of instructions and demonstrations work best for each predictor in our program. This works by running a series of `num_trials` trials, where a new set of prompts are evaluated over our validation set at each trial. The new set of prompts are only evaluated on a minibatch of size `minibatch_size` at each trial (when `minibatch`=`True`). The best averaging set of prompts is then evaluated on the full validation set every `minibatch_full_eval_steps`. At the end of the optimization process, the LM program with the set of prompts that performed best on the full validation set is returned.

For those interested in more details, more information on `MIPROv2` along with a study on `MIPROv2` compared with other DSPy optimizers can be found in [this paper](https://arxiv.org/abs/2406.11695).
2 changes: 1 addition & 1 deletion docs/docs/learn/evaluation/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_position: 1

# Evaluation in DSPy

Once you have an initial system, it's time to **collect an initial development set** so you can refine it more systematically. Even 20 input examples of your task can be useful, though 200 goes a long way. Depending on your _metric_, you either just need inputs and no labels at all, or you need inputs and the _final_ outputs of your system. (You almost never need labels for the intermediate steps in your program in DSPy.) You can probably find datasets that are adjacent to your task on, say, HuggingFace datasets or in a naturally occuring source like StackExchange. If there's data whose licenses are permissive enough, we suggest you use them. Otherwise, you can label a few examples by hand or start deploying a demo of your system and collect initial data that way.
Once you have an initial system, it's time to **collect an initial development set** so you can refine it more systematically. Even 20 input examples of your task can be useful, though 200 goes a long way. Depending on your _metric_, you either just need inputs and no labels at all, or you need inputs and the _final_ outputs of your system. (You almost never need labels for the intermediate steps in your program in DSPy.) You can probably find datasets that are adjacent to your task on, say, HuggingFace datasets or in a naturally occurring source like StackExchange. If there's data whose licenses are permissive enough, we suggest you use them. Otherwise, you can label a few examples by hand or start deploying a demo of your system and collect initial data that way.

Next, you should **define your DSPy metric**. What makes outputs from your system good or bad? Invest in defining metrics and improving them incrementally over time; it's hard to consistently improve what you aren't able to define. A metric is a function that takes examples from your data and takes the output of your system, and returns a score. For simple tasks, this could be just "accuracy", e.g. for simple classification or short-form QA tasks. For most applications, your system will produce long-form outputs, so your metric will be a smaller DSPy program that checks multiple properties of the output. Getting this right on the first try is unlikely: start with something simple and iterate.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/custom_module/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@
"id": "DziTWwT8_TrY"
},
"source": [
"Let's illustrate this with a practical code example. We will build a simple Retrieval-Augmented Generation (RAG) application with mulitple stages:\n",
"Let's illustrate this with a practical code example. We will build a simple Retrieval-Augmented Generation (RAG) application with multiple stages:\n",
"\n",
"1. **Query Generation:** Generate a suitable query based on the user's question to retrieve relevant context.\n",
"2. **Context Retrieval:** Fetch context using the generated query.\n",
Expand Down
20 changes: 10 additions & 10 deletions docs/docs/tutorials/customer_service_agent/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@
"source": [
"### Create Dummy Data\n",
"\n",
"Let's also create some dummy data so that the airline agent can do the work. We need to create a few flights and a few users, and initilize empty dictionaries for the itineraries and custom support tickets."
"Let's also create some dummy data so that the airline agent can do the work. We need to create a few flights and a few users, and initialize empty dictionaries for the itineraries and custom support tickets."
]
},
{
Expand Down Expand Up @@ -295,11 +295,11 @@
"source": [
"import dspy\n",
"\n",
"class DSPyAirlineCustomerSerice(dspy.Signature):\n",
"class DSPyAirlineCustomerService(dspy.Signature):\n",
" \"\"\"You are an airline customer service agent that helps user book and manage flights.\n",
"\n",
" You are given a list of tools to handle user request, and you should decide the right tool to use in order to\n",
" fullfil users' request.\"\"\"\n",
" fulfill users' request.\"\"\"\n",
"\n",
" user_request: str = dspy.InputField()\n",
" process_result: str = dspy.OutputField(\n",
Expand All @@ -319,7 +319,7 @@
"outputs": [],
"source": [
"agent = dspy.ReAct(\n",
" DSPyAirlineCustomerSerice,\n",
" DSPyAirlineCustomerService,\n",
" tools = [\n",
" fetch_flight_info,\n",
" fetch_itinerary,\n",
Expand Down Expand Up @@ -497,7 +497,7 @@
" You are an airline customer service agent that helps user book and manage flights. \n",
" \n",
" You are given a list of tools to handle user request, and you should decide the right tool to use in order to\n",
" fullfil users' request.\n",
" fulfill users' request.\n",
" \n",
" You are an Agent. In each episode, you will be given the fields `user_request` as input. And you can see your past trajectory so far.\n",
" Your goal is to use one or more of the supplied tools to collect any necessary information for producing `process_result`.\n",
Expand Down Expand Up @@ -579,7 +579,7 @@
" You are an airline customer service agent that helps user book and manage flights. \n",
" \n",
" You are given a list of tools to handle user request, and you should decide the right tool to use in order to\n",
" fullfil users' request.\n",
" fulfill users' request.\n",
" \n",
" You are an Agent. In each episode, you will be given the fields `user_request` as input. And you can see your past trajectory so far.\n",
" Your goal is to use one or more of the supplied tools to collect any necessary information for producing `process_result`.\n",
Expand Down Expand Up @@ -672,7 +672,7 @@
" You are an airline customer service agent that helps user book and manage flights. \n",
" \n",
" You are given a list of tools to handle user request, and you should decide the right tool to use in order to\n",
" fullfil users' request.\n",
" fulfill users' request.\n",
" \n",
" You are an Agent. In each episode, you will be given the fields `user_request` as input. And you can see your past trajectory so far.\n",
" Your goal is to use one or more of the supplied tools to collect any necessary information for producing `process_result`.\n",
Expand Down Expand Up @@ -777,7 +777,7 @@
" You are an airline customer service agent that helps user book and manage flights. \n",
" \n",
" You are given a list of tools to handle user request, and you should decide the right tool to use in order to\n",
" fullfil users' request.\n",
" fulfill users' request.\n",
" \n",
" You are an Agent. In each episode, you will be given the fields `user_request` as input. And you can see your past trajectory so far.\n",
" Your goal is to use one or more of the supplied tools to collect any necessary information for producing `process_result`.\n",
Expand Down Expand Up @@ -894,7 +894,7 @@
" You are an airline customer service agent that helps user book and manage flights. \n",
" \n",
" You are given a list of tools to handle user request, and you should decide the right tool to use in order to\n",
" fullfil users' request.\n",
" fulfill users' request.\n",
" \n",
" You are an Agent. In each episode, you will be given the fields `user_request` as input. And you can see your past trajectory so far.\n",
" Your goal is to use one or more of the supplied tools to collect any necessary information for producing `process_result`.\n",
Expand Down Expand Up @@ -1019,7 +1019,7 @@
" You are an airline customer service agent that helps user book and manage flights. \n",
" \n",
" You are given a list of tools to handle user request, and you should decide the right tool to use in order to\n",
" fullfil users' request.\n",
" fulfill users' request.\n",
"\n",
"\n",
"\u001b[31mUser message:\u001b[0m\n",
Expand Down
Loading