-
|
https://github.com/dagster-io/hooli-data-eng-pipelines/blob/3b43ceb2ae4bb20acc014ab2d305e238294b11a4/hooli_data_eng/assets/forecasting/__init__.py#L183 is using submitted jobs (looks like databricks jobs are used) https://github.com/dagster-io/hooli-data-eng-pipelines/blob/3b43ceb2ae4bb20acc014ab2d305e238294b11a4/hooli_data_eng/resources/databricks.py a step launcher (looks like an interactive cluster is created) How can I ensure that the cheaper DBU pricing (jobs instead of interactive) is being used when dagster is interacting with databricks? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 11 replies
-
|
Both of these examples use job clusters, the lower DBU-priced resource. In the first example, the submit task call here defines a In the latter example, where a step launcher is being used, the So in the case of https://github.com/dagster-io/hooli-data-eng-pipelines/blob/3b43ceb2ae4bb20acc014ab2d305e238294b11a4/hooli_data_eng/resources/databricks.py where the step launcher is being used, the config on line 33 specifies that a new job cluster should be created to carry out the job. |
Beta Was this translation helpful? Give feedback.
-
|
When will the cluster be started/stopped? If there are multiple assets and each needs their own databricks cluster can I use a different cluster for each asset? If the cluster should be reused for (smaller) assets is it possible to keep it on/running? |
Beta Was this translation helpful? Give feedback.
-
|
for me the step launcher is showing up inside the workflows tab (which would be a non-interactive DBU cluster) is this the same case for oyu?
|
Beta Was this translation helpful? Give feedback.

Unfortunately that is not currently possible. Databricks only allows one to reuse a job cluster for tasks that are in the same Databricks workflow. A Dagster step which is configured with the
databricks_pyspark_step_launchermaps to a single Databricks workflow with a single task. Put differently, Databricks does not allow one to keep a job cluster alive between workflows, and every Dagster step configured with thedatabricks_pyspark_step_launcheris submitted to Databricks in its own workflow.I too would like to be able to reuse job clusters across steps in a Dagster job, but it seems that the step launcher may not be the right abstraction. It's a bit difficult to imagine how one could …