-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Spark 3.4 upgrade #3602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Spark 3.4 upgrade #3602
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you consider using a global variable or constant to manage version upgrades? I have noticed that the version was recently updated from 3.2 to 3.3, and now it is being upgraded to 3.4. Implementing a global variable or constant for the version number could simplify future modifications by reducing the number of files that need to be updated.
@Azure/ai-platform-docs can you please approve? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR upgrades the Spark runtime version from 3.3.x to 3.4.x across SDK examples, pipelines, feature store samples, and CLI templates.
- Bumps
runtime_version
in Jupyter notebooks and YAML configurations from 3.3 to 3.4 - Removes the deprecated 3.3 version from documentation tables
- Clears notebook
execution_count
to nil for fresh runs
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
sdk/python/jobs/spark/submit_spark_standalone_jobs_managed_vnet.ipynb | Bumped Spark runtime to 3.4.0 |
sdk/python/jobs/spark/submit_spark_standalone_jobs.ipynb | Bumped Spark runtime to 3.4.0 |
sdk/python/jobs/spark/submit_spark_pipeline_jobs.ipynb | Bumped Spark runtime to 3.4.0 |
sdk/python/jobs/spark/automation/run_interactive_session_notebook.ipynb | Bumped Spark runtime to 3.4.0 |
sdk/python/jobs/pipelines/1k_demand_forecast_pipeline/aml-demand-forecast-mm-pipeline.ipynb | Updated supported Spark versions to only include 3.4.0 |
sdk/python/jobs/pipelines/1i_pipeline_with_spark_nodes/pipeline_with_spark_nodes.ipynb | Bumped Spark runtime to 3.4.0 |
sdk/python/featurestore_sample/project/fraud_model/pipelines/training_pipeline.yaml | Bumped Spark runtime to 3.4 |
sdk/python/featurestore_sample/project/fraud_model/pipelines/batch_inference_pipeline.yaml | Bumped Spark runtime to 3.4 |
sdk/python/featurestore_sample/automation-test/test_featurestore_vnet_samples.ipynb | Bumped Spark runtime to 3.4.0 |
sdk/python/featurestore_sample/automation-test/test_featurestore_sdk_samples.ipynb | Bumped Spark runtime to 3.4.0 |
sdk/python/featurestore_sample/automation-test/test_featurestore_cli_samples.ipynb | Bumped Spark runtime to 3.4.0 |
cli/monitoring/out-of-box-monitoring.yaml | Bumped Spark runtime to 3.4 |
cli/monitoring/model-monitoring-with-collected-data.yaml | Bumped Spark runtime to 3.4 |
cli/monitoring/generation-safety-quality-monitoring.yaml | Bumped Spark runtime to 3.4 |
cli/monitoring/custom-monitoring.yaml | Bumped Spark runtime to 3.4 |
cli/monitoring/azureml-e2e-model-monitoring/notebooks/model-monitoring-e2e.ipynb | Cleared execution counts and bumped runtime to 3.4 |
cli/monitoring/advanced-model-monitoring.yaml | Bumped Spark runtime to 3.4 |
cli/jobs/spark/serverless-spark-standalone-user-identity.yml | Bumped Spark runtime to 3.4 |
cli/jobs/spark/serverless-spark-standalone-managed-identity.yml | Bumped Spark runtime to 3.4 |
cli/jobs/spark/serverless-spark-standalone-default-identity.yml | Bumped Spark runtime to 3.4 |
cli/jobs/spark/serverless-spark-pipeline-user-identity.yml | Bumped Spark runtime to 3.4 |
cli/jobs/spark/serverless-spark-pipeline-managed-identity.yml | Bumped Spark runtime to 3.4 |
cli/jobs/spark/serverless-spark-pipeline-default-identity.yml | Bumped Spark runtime to 3.4 |
cli/jobs/pipelines/add-column-and-word-count-using-spark/pipeline.yml | Bumped Spark runtime to 3.4.0 |
cli/jobs/pipelines-with-components/shakespear_sample_and_word_count_using_spark/pipeline.yml | Bumped Spark runtime to 3.4.0 |
Comments suppressed due to low confidence (3)
cli/jobs/pipelines/add-column-and-word-count-using-spark/pipeline.yml:39
- [nitpick] Consider quoting the
runtime_version
value (e.g.,"3.4.0"
) to match the string format used in other YAML files and avoid potential parsing inconsistencies.
runtime_version: 3.4.0
sdk/python/featurestore_sample/project/fraud_model/pipelines/training_pipeline.yaml:31
- [nitpick] The version is specified as
"3.4"
here but3.4.0
is used elsewhere; standardizing the version string across all configurations improves consistency.
runtime_version: "3.4"
cli/monitoring/azureml-e2e-model-monitoring/notebooks/model-monitoring-e2e.ipynb:762
- [nitpick] Instead of only nulling
execution_count
, clear all outputs in the notebook before committing to keep the file clean and minimize merge conflicts.
"execution_count": null,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are approved for docs - they won't break our documentation. However, why are all the jobs failing?
Description
Checklist