Skip to content

Cube Cloud failing pre-aggregation warm up - requirements.txt no running? #8923

Open
@johache

Description

@johache

Describe the bug
Using Cube Cloud, I think there might be something wrong with the pre-aggregation warm up instances.

  • I have a very simple scheduled_refresh_contexts in my cube.py, which depends on the databricks SDK.
  • This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
  • It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, maybe because it fails immediately. I did manage to get a screenshot
  • I can definitely see that at least in my build job, the databricks.sdk is installed

To Reproduce
Steps to reproduce the behavior:

  1. Define your requirements.txt to install databricks-sdk
databricks-sdk
  1. Define a scheduled_refresh_contexts which depends on databricks in cube.py
from cube import config
from databricks.sdk import WorkspaceClient

# ...

@config('scheduled_refresh_contexts')
def scheduled_refresh_contexts() -> list[object]:
    databricks_workspace_client = WorkspaceClient(
        host  = os.environ.get('DATABRICKS_HOST'),
        token = os.environ.get('CUBEJS_DB_DATABRICKS_TOKEN')
    )

    # Fetch the list of schemas within the environment's catalog
    catalog_name = os.environ.get('CUBEJS_DB_DATABRICKS_CATALOG')
    schemas = databricks_workspace_client.schemas.list(catalog_name=catalog_name)

    # ...
    return security_contexts_array
  1. Enable pre-aggregation warm up in cube cloud

Expected behavior

  • dependencies from requirements.txt get installed before any instance run
  • After the env vars update on cube cloud, all contexts defined by scheduled_refresh_contexts should compile and pre-aggregate, any query hitting a pre-aggregation should pass

Actual behavior

  • This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
  • It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, but when I do catch it, it says that databricks-sdk is not installed
  • I can definitely see that at least in my build job, the databricks.sdk is installed
  • The result is that NO pre-aggregations get built, unless the refresh_key triggers it, which can take time and leave the instance broken for extended periods of time

Screenshots
Screenshot 2024-10-18 at 1 12 42 PM
Screenshot 2024-10-18 at 1 14 27 PM
Screenshot 2024-10-18 at 1 15 44 PM

Minimally reproducible Cube Schema
Adding a cut out from my schema, but I don't think this is schema dependent. The important part is the requirements.txt and cube.py posted above

cubes:
  - name: gold_journal_lines
    sql_table: "{{ COMPILE_CONTEXT.securityContext.company_id | safe }}.gold__journal_lines"

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
      - name: net_amount
        sql: net_amount
        type: number
      - name: posted_on
        sql: posted_on
        type: time
    measures:
      - name: sum_net_amount
        type: sum
        sql: net_amount

    pre_aggregations:
      # Rollup Pre-aggregation with accounts and counterparties
      - name: journal_line_acc_cpt_rollup
        measures:
          - gold_journal_lines.sum_net_amount
        time_dimension: CUBE.posted_on
        granularity: month
        partition_granularity: year

Version:
Tried with 0.35.55, 1.0.1, 1.1.0

Happy to provide any additional details

Metadata

Metadata

Assignees

Labels

cube cloudIssues related to Cube Cloud

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions