Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions notebooks/how_to/tests/run_tests/2_run_comparison_tests.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -829,9 +829,9 @@
"\n",
"#### Run classifier performance test with multiple models\n",
"\n",
"Now, we'll use the `input_grid` to run the [`ClassifierPerformance` test](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html) on all four models using the testing dataset (`vm_test_ds`).\n",
"Now, we'll use the `input_grid` to run the `ClassifierPerformance` test on all four models using the testing dataset (`vm_test_ds`).\n",
"\n",
"When running individual tests, you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier to signify that this test was run on `all_models` to differentiate this test run from other runs:"
"When running individual tests, you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier to signify that this test was run on `all_models` to differentiate this test run from other runs:\n"
]
},
{
Expand Down Expand Up @@ -898,9 +898,9 @@
"\n",
"#### Run comparison test with multiple datasets\n",
"\n",
"Let's also run the [ROCCurve test](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) using `input_grid` to iterate through multiple datasets, which plots the ROC curves for the training (`vm_train_ds`) and test (`vm_test_ds`) datasets side by side — a common scenario when you want to compare the performance of a model on the training and test datasets and visually assess how much performance is lost in the test dataset.\n",
"Let's also run the ROCCurve test using `input_grid` to iterate through multiple datasets, which plots the ROC curves for the training (`vm_train_ds`) and test (`vm_test_ds`) datasets side by side — a common scenario when you want to compare the performance of a model on the training and test datasets and visually assess how much performance is lost in the test dataset.\n",
"\n",
"We'll also need to assign predictions to the training dataset for the random forest classifier model, since we didn't do that in our earlier setup:"
"We'll also need to assign predictions to the training dataset for the random forest classifier model, since we didn't do that in our earlier setup:\n"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -226,9 +226,9 @@
"\n",
"### Using `RawData` from the ROC Curve Test\n",
"\n",
"In this introductory example, we run the [ROC Curve](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) test, inspect its `RawData` output, and then create a custom ROC curve using the raw data values.\n",
"In this introductory example, we run the ROC Curve test, inspect its `RawData` output, and then create a custom ROC curve using the raw data values.\n",
"\n",
"First, let's run the default ROC Curve test for comparsion with later iterations:"
"First, let's run the default ROC Curve test for comparsion with later iterations:\n"
]
},
{
Expand Down Expand Up @@ -411,7 +411,7 @@
"\n",
"### Precision-Recall Curve\n",
"\n",
"Then, let's try the same thing with the [Precision-Recall Curve](https://docs.validmind.ai/tests/model_validation/sklearn/PrecisionRecallCurve.html) test:"
"Then, let's try the same thing with the Precision-Recall Curve test:\n"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion notebooks/quickstart/quickstart_model_validation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -651,7 +651,7 @@
"- You run validation tests by calling [the `run_test` function](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) provided by the `validmind.tests` module.\n",
"- Every test result returned by the `run_test()` function has a [`.log()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#TestResult.log) that can be used to send the test results to the ValidMind Platform.\n",
"\n",
"Here, we'll use the [`ClassImbalance` test](https://docs.validmind.ai/tests/data_validation/ClassImbalance.html) as an example:"
"Here, we'll use the `ClassImbalance` test as an example:\n"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@
"\n",
"You'll become familiar with the individual tests available in ValidMind, as well as how to run them and change parameters as necessary. Using ValidMind's repository of individual tests as building blocks helps you ensure that a model is being built appropriately. \n",
"\n",
"**For a full list of out-of-the-box tests,** refer to our [Test descriptions](https://docs.validmind.ai/developer/model-testing/test-descriptions.html) or try the interactive [Test sandbox](https://docs.validmind.ai/developer/model-testing/test-sandbox.html).\n",
"**For a full list of out-of-the-box tests and descriptions,** use the interactive [ValidMind test sandbox](https://docs.validmind.ai/developer/how-to/test-sandbox.html).\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;\"><span style=\"color: #083E44;\"><b>Learn by doing</b></span>\n",
"<br></br>\n",
"Our course tailor-made for developers new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — <a href=\"https://docs.validmind.ai/training/developer-fundamentals/developer-fundamentals-register.html\" style=\"color: #DE257E;\"><b>Developer Fundamentals</b></a></div>"
"Our course tailor-made for developers new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — <a href=\"https://docs.validmind.ai/training/developer-fundamentals/developer-fundamentals-register.html\" style=\"color: #DE257E;\"><b>Developer Fundamentals</b></a></div>\n"
]
},
{
Expand Down Expand Up @@ -276,9 +276,9 @@
"\n",
"### Run tabular data tests\n",
"\n",
"The inputs expected by a test can also be found in the test definition — let's take [`validmind.data_validation.DescriptiveStatistics`](https://docs.validmind.ai/tests/data_validation/DescriptiveStatistics.html) as an example.\n",
"The inputs expected by a test can also be found in the test definition — let's take `validmind.data_validation.DescriptiveStatistics` as an example.\n",
"\n",
"Note that the output of the [`describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) below shows that this test expects a `dataset` as input:"
"Note that the output of the [`describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) below shows that this test expects a `dataset` as input:\n"
]
},
{
Expand Down Expand Up @@ -326,9 +326,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The output above shows that [the class imbalance test](https://docs.validmind.ai/tests/data_validation/ClassImbalance.html) did not pass according to the value we set for `min_percent_threshold`.\n",
"The output above shows that the class imbalance test did not pass according to the value we set for `min_percent_threshold`.\n",
"\n",
"To address this issue, we'll re-run the test on some processed data. In this case let's apply a very simple rebalancing technique to the dataset:"
"To address this issue, we'll re-run the test on some processed data. In this case let's apply a very simple rebalancing technique to the dataset:\n"
]
},
{
Expand Down Expand Up @@ -398,7 +398,7 @@
"\n",
"Below we demonstrate how to retrieve the list of features with the highest correlation coefficients and use them to reduce the final list of features for modeling.\n",
"\n",
"First, we'll run [`validmind.data_validation.HighPearsonCorrelation`](https://docs.validmind.ai/tests/data_validation/HighPearsonCorrelation.html) with the `balanced_raw_dataset` we initialized previously as input as is for comparison with later runs:"
"First, we'll run `validmind.data_validation.HighPearsonCorrelation` with the `balanced_raw_dataset` we initialized previously as input as is for comparison with later runs:\n"
]
},
{
Expand Down Expand Up @@ -911,7 +911,7 @@
"In this next example, we'll focus on running the tests within the Model Development section of the model documentation. Only tests associated with this section will be executed, and the corresponding results will be updated in the model documentation.\n",
"\n",
"- Note the additional config that is passed to `run_documentation_tests()` — this allows you to override `inputs` or `params` in certain tests.\n",
"- In our case, we want to explicitly use the `vm_train_ds` for the [`validmind.model_validation.sklearn.ClassifierPerformance:in_sample` test](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html), since it's supposed to run on the training dataset and not the test dataset."
"- In our case, we want to explicitly use the `vm_train_ds` for the `validmind.model_validation.sklearn.ClassifierPerformance:in_sample` test, since it's supposed to run on the training dataset and not the test dataset.\n"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@
"- Ensuring that data used for training and testing the model is of appropriate data quality\n",
"- Ensuring that the raw data has been preprocessed appropriately and that the resulting final datasets reflects this\n",
"\n",
"**For a full list of out-of-the-box tests,** refer to our [Test descriptions](https://docs.validmind.ai/developer/model-testing/test-descriptions.html) or try the interactive [Test sandbox](https://docs.validmind.ai/developer/model-testing/test-sandbox.html).\n",
"**For a full list of out-of-the-box tests and descriptions,** use the interactive [ValidMind test sandbox](https://docs.validmind.ai/developer/how-to/test-sandbox.html).\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;\"><span style=\"color: #083E44;\"><b>Learn by doing</b></span>\n",
"<br></br>\n",
"Our course tailor-made for validators new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — <a href=\"https://docs.validmind.ai/training/validator-fundamentals/validator-fundamentals-register.html\" style=\"color: #DE257E;\"><b>Validator Fundamentals</b></a></div>"
"Our course tailor-made for validators new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — <a href=\"https://docs.validmind.ai/training/validator-fundamentals/validator-fundamentals-register.html\" style=\"color: #DE257E;\"><b>Validator Fundamentals</b></a></div>\n"
]
},
{
Expand Down Expand Up @@ -292,9 +292,9 @@
"\n",
"#### Run tabular data tests\n",
"\n",
"The inputs expected by a test can also be found in the test definition — let's take [`validmind.data_validation.DescriptiveStatistics`](https://docs.validmind.ai/tests/data_validation/DescriptiveStatistics.html) as an example.\n",
"The inputs expected by a test can also be found in the test definition — let's take `validmind.data_validation.DescriptiveStatistics` as an example.\n",
"\n",
"Note that the output of the [`describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) below shows that this test expects a `dataset` as input:"
"Note that the output of the [`describe_test()` function](https://docs.validmind.ai/validmind/validmind/tests.html#describe_test) below shows that this test expects a `dataset` as input:\n"
]
},
{
Expand Down Expand Up @@ -330,9 +330,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The output above shows that [the class imbalance test](https://docs.validmind.ai/tests/data_validation/ClassImbalance.html) did not pass according to the value we set for `min_percent_threshold` — great, this matches what was reported by the model development team.\n",
"The output above shows that the class imbalance test did not pass according to the value we set for `min_percent_threshold` — great, this matches what was reported by the model development team.\n",
"\n",
"To address this issue, we'll re-run the test on some processed data. In this case let's apply a very simple rebalancing technique to the dataset:"
"To address this issue, we'll re-run the test on some processed data. In this case let's apply a very simple rebalancing technique to the dataset:\n"
]
},
{
Expand Down Expand Up @@ -402,7 +402,7 @@
"\n",
"You can utilize the output from a ValidMind test for further use — in this below example, to retrieve the list of features with the highest correlation coefficients and use them to reduce the final list of features for modeling.\n",
"\n",
"First, we'll run [`validmind.data_validation.HighPearsonCorrelation`](https://docs.validmind.ai/tests/data_validation/HighPearsonCorrelation.html) with the `balanced_raw_dataset` we initialized previously as input as is for comparison with later runs:"
"First, we'll run `validmind.data_validation.HighPearsonCorrelation` with the `balanced_raw_dataset` we initialized previously as input as is for comparison with later runs:\n"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -549,13 +549,13 @@
"source": [
"We'll isolate the specific tests we want to run in `mpt`:\n",
"\n",
"- [`ClassifierPerformance`](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html)\n",
"- [`ConfusionMatrix`](https://docs.validmind.ai/tests/model_validation/sklearn/ConfusionMatrix.html)\n",
"- [`MinimumAccuracy`](https://docs.validmind.ai/tests/model_validation/sklearn/MinimumAccuracy.html)\n",
"- [`MinimumF1Score`](https://docs.validmind.ai/tests/model_validation/sklearn/MinimumF1Score.html)\n",
"- [`ROCCurve`](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html)\n",
"- `ClassifierPerformance`\n",
"- `ConfusionMatrix`\n",
"- `MinimumAccuracy`\n",
"- `MinimumF1Score`\n",
"- `ROCCurve`\n",
"\n",
"As we learned in the previous notebook [2 — Start the model validation process](2-start_validation_process.ipynb), you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier for our champion model here:"
"As we learned in the previous notebook [2 — Start the model validation process](2-start_validation_process.ipynb), you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier for our champion model here:\n"
]
},
{
Expand Down Expand Up @@ -735,12 +735,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let’s now assess the models for potential signs of *overfitting* and identify any sub-segments where performance may inconsistent with the [`OverfitDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/OverfitDiagnosis.html).\n",
"Let’s now assess the models for potential signs of *overfitting* and identify any sub-segments where performance may inconsistent with the `OverfitDiagnosis` test.\n",
"\n",
"Overfitting occurs when a model learns the training data too well, capturing not only the true pattern but noise and random fluctuations resulting in excellent performance on the training dataset but poor generalization to new, unseen data:\n",
"\n",
"- Since the training dataset (`vm_train_ds`) was used to fit the model, we use this set to establish a baseline performance for how well the model performs on data it has already seen.\n",
"- The testing dataset (`vm_test_ds`) was never seen during training, and here simulates real-world generalization, or how well the model performs on new, unseen data. "
"- The testing dataset (`vm_test_ds`) was never seen during training, and here simulates real-world generalization, or how well the model performs on new, unseen data. \n"
]
},
{
Expand All @@ -762,9 +762,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's also conduct *robustness* and *stability* testing of the two models with the [`RobustnessDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/RobustnessDiagnosis.html). Robustness refers to a model's ability to maintain consistent performance, and stability refers to a model's ability to produce consistent outputs over time across different data subsets.\n",
"Let's also conduct *robustness* and *stability* testing of the two models with the `RobustnessDiagnosis` test. Robustness refers to a model's ability to maintain consistent performance, and stability refers to a model's ability to produce consistent outputs over time across different data subsets.\n",
"\n",
"Again, we'll use both the training and testing datasets to establish baseline performance and to simulate real-world generalization:"
"Again, we'll use both the training and testing datasets to establish baseline performance and to simulate real-world generalization:\n"
]
},
{
Expand Down
10 changes: 5 additions & 5 deletions notebooks/use_cases/agents/document_agentic_ai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1031,11 +1031,11 @@
"\n",
"You run individual tests by calling [the `run_test` function](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) provided by the `validmind.tests` module. Passing in our agentic model as an input, the tests below rate the prompt on a scale of 1-10 against the following criteria:\n",
"\n",
"- **[Clarity](https://docs.validmind.ai/tests/prompt_validation/Clarity.html)** — How clearly the prompt states the task.\n",
"- **[Conciseness](https://docs.validmind.ai/tests/prompt_validation/Conciseness.html)** — How succinctly the prompt states the task.\n",
"- **[Delimitation](https://docs.validmind.ai/tests/prompt_validation/Delimitation.html)** — When using complex prompts containing examples, contextual information, or other elements, is the prompt formatted in such a way that each element is clearly separated?\n",
"- **[NegativeInstruction](https://docs.validmind.ai/tests/prompt_validation/NegativeInstruction.html)** — Whether the prompt contains negative instructions.\n",
"- **[Specificity](https://docs.validmind.ai/tests/prompt_validation/NegativeInstruction.html)** — How specific the prompt defines the task."
"- **Clarity** — How clearly the prompt states the task.\n",
"- **Conciseness** — How succinctly the prompt states the task.\n",
"- **Delimitation** — When using complex prompts containing examples, contextual information, or other elements, is the prompt formatted in such a way that each element is clearly separated?\n",
"- **NegativeInstruction** — Whether the prompt contains negative instructions.\n",
"- **Specificity** — How specific the prompt defines the task.\n"
]
},
{
Expand Down
Loading
Loading