Skip to content

Improve Create Profiler Dashboard CLI Usage#2319

Open
goodwillpunning wants to merge 26 commits intomainfrom
fix/profiler_cli_usage
Open

Improve Create Profiler Dashboard CLI Usage#2319
goodwillpunning wants to merge 26 commits intomainfrom
fix/profiler_cli_usage

Conversation

@goodwillpunning
Copy link
Contributor

Changes

What does this PR do?

This PR updates the deployment of the profiler summary dashboard to be more consistent with other Lakebridge components, such as the recon job and dashboards.

Relevant implementation details

  • Improved CLI prompts so that the extract file location, UC catalog, schema, and volume name are more clear
  • Adds a helper function to properly parse the extract file location and UC volume upload location
  • Hooks the installation of the profiler dashboard into the Lakebridge installer/uninstaller components

Caveats/things to watch out for when reviewing:

Linked issues

N/A

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@goodwillpunning goodwillpunning self-assigned this Mar 1, 2026
@goodwillpunning goodwillpunning requested a review from a team as a code owner March 1, 2026 23:58
@goodwillpunning goodwillpunning added the feat/profiler Issues related to profilers label Mar 1, 2026
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

✅ 145/145 passed, 7 flaky, 4 skipped, 37m39s total

Flaky tests:

  • 🤪 test_installs_and_runs_local_bladebridge (19.248s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (14.378s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (15.033s)
  • 🤪 test_transpiles_informatica_to_sparksql (15.097s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (4.487s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (4.436s)
  • 🤪 test_transpile_teradata_sql (5.014s)

Running from acceptance #4011

Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us take a step back, I think the approach needs to be composable

I would approach this as two commands

databricks labs lakebridge profiler-sync

databricks labs lakebridge deploy-profiler-dashboard

profiler-sync is pre-req for deploy-profiler-dashboard

profiler-sync Creates ingestion job and syncs the data into profiler tables within databricks. it is intentionally called sync because it creates infra if not already exists and triggers incremental data load when applied.

deploy-profiler-dashboard just creates AI/BI dashboard one of so that it points agains the tables or objects fails if the object doesn't exists and request user to run profiler-sync before running deploy-profiler-dashboard

Thoughts?

logging.info(f"Loading dashboard template from folder: {folder}")
dash_reference = f"{folder.stem}".lower()
dashboard_loader = ProfilerDashboardTemplateLoader(folder)
dashboard_json = dashboard_loader.load(source_system="synapse")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source system needs to be argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

logger.warning("Profiler Dashboard Config is empty.")
return
logger.info("Installing the profiler dashboard components.")
self._upload_profiler_extract(profiler_dashboard_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need some error handling here, for catching upload errors and permission checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Added better error handling to catch those exceptions.

logger.info("Uninstalling profiler dashboard components.")
self._remove_dashboards()
self._remove_jobs()
logging.info(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info(
logger.info(

self._installation = installation
self._install_state = install_state
self._product_info = product_info
self._table_deployer = table_deployer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it used anywhere may be we can remove it

Suggested change
self._table_deployer = table_deployer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removed.

@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 57.80731% with 127 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.68%. Comparing base (41c3f9a) to head (9ed2790).

Files with missing lines Patch % Lines
...s/labs/lakebridge/deployment/profiler_dashboard.py 34.88% 56 Missing ⚠️
...databricks/labs/lakebridge/deployment/dashboard.py 50.00% 50 Missing and 3 partials ⚠️
src/databricks/labs/lakebridge/install.py 75.67% 8 Missing and 1 partial ⚠️
src/databricks/labs/lakebridge/cli.py 11.11% 8 Missing ⚠️
.../labs/lakebridge/assessments/dashboards/execute.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2319      +/-   ##
==========================================
+ Coverage   66.41%   67.68%   +1.26%     
==========================================
  Files          99       99              
  Lines        9094     9271     +177     
  Branches      974      986      +12     
==========================================
+ Hits         6040     6275     +235     
+ Misses       2878     2816      -62     
- Partials      176      180       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goodwillpunning everything looks good, there is small improvement I would like to do, which is make the job deployer and dashboard deployer generic so it can be used by both profiler and reconcile job. We can iterate on this do not want block this PR.

Comment on lines +36 to +39
| Amazon Redshift | ❌ |
| Oracle | ❌ |
| Microsoft SQL Server | ❌ |
| Snowflake | ❌ |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we leave it here, may be better to update later or changing to coming soon.



def _ingest_table(extract_location: str, source_table_name: str, target_table_name: str) -> None:
def ingest_table(extract_location: str, source_table_name: str, target_table_name: str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goodwillpunning is there reason we change all the method signature from private to public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it easy to unit test without disabling pylint ( # pylint: disable=import-private-name forbidden by the build check).

Comment on lines +161 to +164
if not os.path.exists(filepath):
raise FileNotFoundError(f"Could not find dashboard template matching {source_system}.")
with open(filepath, "r", encoding="utf-8") as f:
return json.load(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not os.path.exists(filepath):
raise FileNotFoundError(f"Could not find dashboard template matching {source_system}.")
with open(filepath, "r", encoding="utf-8") as f:
return json.load(f)
try:
with open(filepath, "r", encoding="utf-8") as f:
return json.load(f)
except FileNotFoundError:
raise FileNotFoundError(f"Could not find dashboard template matching {source_system}.")

"""

# Load the dashboard template
logging.info(f"Loading dashboard template from folder: {folder}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info(f"Loading dashboard template from folder: {folder}")
logger.info(f"Loading dashboard template from folder: {folder}")

try:
dashboard = self._ws.lakeview.create(dashboard=dashboard)
except ResourceAlreadyExists:
logging.info("Dashboard already exists! Removing dashboard from workspace location.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info("Dashboard already exists! Removing dashboard from workspace location.")
logger.info("Dashboard already exists! Removing dashboard from workspace location.")

Comment on lines +441 to +445
def _configure_new_profiler_dashboard_installation(self) -> ProfilerDashboardConfig:
default_config = self._prompt_for_new_profiler_dashboard_installation()
self._save_config(default_config)
return default_config

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need of this

Suggested change
def _configure_new_profiler_dashboard_installation(self) -> ProfilerDashboardConfig:
default_config = self._prompt_for_new_profiler_dashboard_installation()
self._save_config(default_config)
return default_config

Comment on lines +111 to +115
return [
(job_name, int(job_id))
for job_name, job_id in self._install_state.jobs.items()
if job_name.startswith(_PROFILER_DASHBOARD_PREFIX) and job_name != PROFILER_INGESTION_JOB_NAME
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return [
(job_name, int(job_id))
for job_name, job_id in self._install_state.jobs.items()
if job_name.startswith(_PROFILER_DASHBOARD_PREFIX) and job_name != PROFILER_INGESTION_JOB_NAME
]
return [(name, job_id) for name, job_id in self._get_jobs() if name != PROFILER_INGESTION_JOB_NAME]

Comment on lines 70 to 83
def test_upload_duckdb_to_uc_volume_invalid_volume_path(
dashboard_manager: DashboardManager,
dashboard_manager: ProfilerDashboardManager,
mocked_workspace_client: WorkspaceClient,
):
ws = mocked_workspace_client
result = dashboard_manager.upload_duckdb_to_uc_volume(
local_file_path="file.duckdb", volume_path="invalid_path/myfile.duckdb"
config = ProfilerDashboardConfig(
source_tech="synapse",
extract_file_path="file.duckdb",
metadata_config=ProfilerDashboardMetadataConfig(catalog="lakebridge", schema="profiler", volume="invalid_path"),
)
result = dashboard_manager.upload_duckdb_to_uc_volume(config)
assert result is False
ws.files.upload.assert_not_called()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer need this test since we dont branch and test this particular path in dashboard_manager.py

Co-authored-by: SundarShankar89 <72757199+sundarshankar89@users.noreply.github.com>
goodwillpunning and others added 2 commits March 6, 2026 14:25
Co-authored-by: SundarShankar89 <72757199+sundarshankar89@users.noreply.github.com>
Copy link
Collaborator

@gueniai gueniai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sundarshankar89
Copy link
Collaborator

@goodwillpunning can you resolve conflicts then this can be ready to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/profiler Issues related to profilers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants