Open
Description
Describe the bug
Using Cube Cloud, I think there might be something wrong with the pre-aggregation warm up instances.
- I have a very simple scheduled_refresh_contexts in my cube.py, which depends on the databricks SDK.
- This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
- It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, maybe because it fails immediately. I did manage to get a screenshot
- I can definitely see that at least in my build job, the databricks.sdk is installed
To Reproduce
Steps to reproduce the behavior:
- Define your
requirements.txt
to install databricks-sdk
databricks-sdk
- Define a
scheduled_refresh_contexts
which depends on databricks incube.py
from cube import config
from databricks.sdk import WorkspaceClient
# ...
@config('scheduled_refresh_contexts')
def scheduled_refresh_contexts() -> list[object]:
databricks_workspace_client = WorkspaceClient(
host = os.environ.get('DATABRICKS_HOST'),
token = os.environ.get('CUBEJS_DB_DATABRICKS_TOKEN')
)
# Fetch the list of schemas within the environment's catalog
catalog_name = os.environ.get('CUBEJS_DB_DATABRICKS_CATALOG')
schemas = databricks_workspace_client.schemas.list(catalog_name=catalog_name)
# ...
return security_contexts_array
- Enable pre-aggregation warm up in cube cloud
Expected behavior
- dependencies from requirements.txt get installed before any instance run
- After the env vars update on cube cloud, all contexts defined by scheduled_refresh_contexts should compile and pre-aggregate, any query hitting a pre-aggregation should pass
Actual behavior
- This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
- It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, but when I do catch it, it says that databricks-sdk is not installed
- I can definitely see that at least in my build job, the databricks.sdk is installed
- The result is that NO pre-aggregations get built, unless the refresh_key triggers it, which can take time and leave the instance broken for extended periods of time
Minimally reproducible Cube Schema
Adding a cut out from my schema, but I don't think this is schema dependent. The important part is the requirements.txt and cube.py posted above
cubes:
- name: gold_journal_lines
sql_table: "{{ COMPILE_CONTEXT.securityContext.company_id | safe }}.gold__journal_lines"
dimensions:
- name: id
sql: id
type: string
primary_key: true
- name: net_amount
sql: net_amount
type: number
- name: posted_on
sql: posted_on
type: time
measures:
- name: sum_net_amount
type: sum
sql: net_amount
pre_aggregations:
# Rollup Pre-aggregation with accounts and counterparties
- name: journal_line_acc_cpt_rollup
measures:
- gold_journal_lines.sum_net_amount
time_dimension: CUBE.posted_on
granularity: month
partition_granularity: year
Version:
Tried with 0.35.55, 1.0.1, 1.1.0
Happy to provide any additional details