Releases: datahub-project/datahub
v0.8.18
DataHub Release 0.8.18 is here!
Release Highlights
-
Metadata Service Authentication: Make authenticated requests to the Metadata Service APIs (GraphQL + Rest.li)
-
Redshift Lineage: Out-of-the-box support for ingesting Dataset->Dataset lineage from Redshift system tables. Includes Tables, Views, and COPY from S3
-
Apache Nifi Connector (Beta) : Integration with Apache Nifi to extract DataJobs and DataFlows! Read the source docs here. This source is currently incubating in beta.
-
Mode Connector (Beta): Integration with Mode Analytics to extract reports, charts, and more! Read the source docs here. This source is currently incubating in beta.
-
Add Aspects without a fork: This is a major milestone towards No-Code UI
- Watch the No Code UI Sneak Peek
-
Glossary Term Transformer: Allows users to add tags or glossary terms to entities based on a regex match filter (Shoutout to Community Member ecooklin!)
-
Bug Fixes:
- [metadata service] Empty search query fails to resolve
- [metadata service] Log4j vulnerability addressed!! Highly recommend folks to upgrade to latest.
- [metadata ingestion] [bigquery] Fix handling of partitioned & snapshotted tables for lineage usage, and basic table indexing.
- [metadata-service] [recommendations] Fix issue where recently viewed and most popular recommendations were not showing up when user urn contains special chars.
- [metadata ingestion] Add config to specify ca certificate path for datahub-rest sink
- [metadata ingestion][snowflake] Handling for special characters in snowflake databases and schemas.
- [ui] Fix Groups page not showing asset ownership correctly
- [ui] Fix issue where markdown links were not clickable.
- [metadata service] Improve search & recommendations performance by ~50%, homepage load by ~50%.
- [cli] Fix deletes by search cannot accept auth token
- [metadata service][policies] Fix invalid Tag creation policy
- [metadata service][upgrade] Fix Spring injection of Entity Client inside datahub-upgrade
Backwards Incompatible Changes
- The standalone Spring GraphQL Service has been removed. (Replaced in full by Metadata Service GraphQL API)
New Contributors
- @robscriva made their first contribution in #3600
- @adriangb made their first contribution in #3582
- @bartlomiejolma made their first contribution in #3650
- @anshbansal made their first contribution in #3653
- @ecooklin made their first contribution in #3657
What's Changed
- style(react-app): add default monospace font to font-family by @robscriva in #3600
- feat(boot): Ingest datahub root user info on boot by @jjoyce0510 in #3603
- [refactor] - Remove GMS GraphQL Service by @arunvasudevan in #3605
- feat(auth): Metadata Service Authentication! by @jjoyce0510 in #3598
- docs:remove hubspot form and instead link to acryldata.io by @jeffmerrick in #3488
- fix(docs): Move transformers to be under metadata ingestion by @aseembansal-gogo in #3591
- fix(bigquery-usage): Fix filters and event joining logic. by @varunbharill in #3610
- feat(cli): adding a put command and docs by @swaroopjagadish in #3614
- feat(elastic): adding es logo by @gabe-lyons in #3611
- feat(profiler): dynamically combine queries by @hsheth2 in #3572
- doc(components): Adding DataHub components overview by @jjoyce0510 in #3606
- fix(java client): Fix Profiling NPE + misc improvements by @jjoyce0510 in #3621
- fix(docs-website): fix incorrect managed url by @jeffmerrick in #3618
- fix(ingest): rectify platform urn in kafka connect source by @mayurinehate in #3624
- docs(okta): Added Okta Logout Settings by @serefacet in #3627
- fix(search): Fix issue when query is empty by @dexter-mh-lee in #3620
- fix(redshift-usage): Add docs for redshift usage ingestion. by @varunbharill in #3617
- fix(ci): pin great expectations version by @swaroopjagadish in #3629
- fix(delete): Remove logic that adds an invalid filter for platform field by @dexter-mh-lee in #3619
- feat(metadata-service): support for custom model extensions without forks by @shirshanka in #3630
- fix(kafka-producer): fix debug logging by @claudio-benfatto in #3626
- fix(tests): fix typo in test name by @adriangb in #3582
- feat(cfg): Add configurable GCP log page size by @jjoyce0510 in #3556
- fix(recommendations): Fix issue with recently viewed and most popular recs not showing up by @dexter-mh-lee in #3631
- fix(ingestion): Add config to specify ca certificate path for datahub-rest sink by @dexter-mh-lee in #3632
- fix(ingest): workaround great-expectations compatibility issue by @hsheth2 in #3634
- fix(ingestion): Handling for special characters in snowflake databases and schemas. by @rslanka in #3635
- fix(group ownership): Fixing Groups Profile ownership by @jjoyce0510 in #3638
- feat(autorender): Auto render aspects that don't have frontend components in the UI by @gabe-lyons in #3597
- docs(business glossary): document the business glossary file format by @gabe-lyons in #3639
- fix(ingestion): Enhance supported and unsupported base_objects_accessed for Snowflake Usage by @rslanka in #3608
- feat(quickstart): Simplify docker generate and compare script by @EnricoMi in #3434
- fix(docs): small fixes to docs and docker images for custom metadata … by @swaroopjagadish in #3640
- fix(mongodb): enable version check for document size filter. by @varunbharill in #3644
- docs: Update to DataHub Adopter logos & Townhall details by @maggiehays in #3648
- feat(build): adds support for incremental build in ingestion by @swaroopjagadish in #3647
- fix(description): fix issue where markdown links are unclickable by @gabe-lyons in #3646
- fix(schema): fix bug where key/value toggle would appear on schema tabs with no fields by @gabe-lyons in #3643
- feat(build): Preflight script for metadata ingestion setup on m1 by @treff7es in #3652
- docs(graphql) Adding additional GraphQL docs by @jjoyce0510 in #3649
- docs: correct title of postgres gms by @bartlomiejolma in #3650
- fix(cli): fix for deletion cli by @anshbansal in #3653
- fix(metadata-io) Adds docker engine configuration checks before running docker-based tests by @pedro93 in #3654
- fix(model): Remove unused PDL from pre-nocode days by @dexter-mh-lee in #3659
- fix(docs): fix docs build on m1 by @anshbansal in #3662
- feat(ingest): add --strict-warnings option by @hsheth2 in #3665
- fix(search): Improve search and recs performance by @dexter-mh-lee in #3660
- feat(metadata-model): adding metadata model doc generation and upload… by @swaroopjagadish in #3667
- fix(ingestion): black formatting by @hsheth2 in #3676
- fix(metadata-ingestion): fix requirements for m1 preflight checks by @gabe-lyons in #3677
- fix(kafka): Add back changes to centralize kafka config by @dexter-mh-lee in #3675
- feat(ingestion): anonymous usage stats by @kevinhu in #3668
- docs(scheduling): re-arrange docs related to scheduling, lineage, CLI by @anshbansal in #3669
- feat(delete): support deleting by searc...
v0.8.17
Notable Changes
- Added Recommendations and redesigned the home page!
- Modular way to add recommendations throughout the application
- Recommendation modules for top platforms, recently viewed, popular entities, top tags/terms were added to home page
- Search page also has top tags/terms module on the bottom
- Ingestion Sources
- DBT enhancements
- Creating dbt platform entities to capture dbt node types such as models, tests, source, seed, etc. linking dbt entities with other dbt or underlying platform entities.
- OpenAPI specs
- Kafka Connect (Regex based transformers, BigQuery sink)
- Trino Usage (Starburst)
- DBT enhancements
- Improved lineage viz performance and lineage viz UX
- Improved layout logic
- Nodes can be dragged and dropped
- Fixes for delete API not always deleting all of an entities data
- Improved documentation for adding a custom Metadata Ingestion Source
- Fixes description rendering for Charts, Dashboards, Flows, Jobs
- Add YAML configuration file for Metadata Service
- Filter search results by Sub-Type (Looker Explore, View, etc)
- Support proxying DataHub Frontend requests to Metadata Service at
/api/gms
- Multi-platform (x86, arm64) support for Docker images (Apple M1 support)
- Graph Service: DGraph support (phase 1)
What's Changed
- fix(docs): fix image paths and company logo link by @jeffmerrick in #3435
- feat(docs-site): two small tweaks by @gabe-lyons in #3437
- feat(ingestion): support custom properties to be ingested via business glossary yaml by @gabe-lyons in #3438
- fix(restli entity client): fix case where sortCriterion is null by @gabe-lyons in #3436
- feat(lineage): improved lineage performance + simplified layout logic + some easter eggs by @gabe-lyons in #3357
- docs(metamodel): added DataHub's metadata model diagram by @swaroopjagadish in #3449
- fix(tag+terms): improved error messaging & rules on tag + term mutations by @gabe-lyons in #3448
- fix(browse): disable breadcrumb links on non-browsable entities by @gabe-lyons in #3447
- fix(ingest): fix lookml derived tables parsing by @remisalmon in #3443
- docs(docs-site): small nits for docs site homepage by @gabe-lyons in #3444
- perf(ingest): lazy load ingestion plugins by @hsheth2 in #3430
- Fix docs website by @jeffmerrick in #3446
- fix(restore): Fix restore backup jobs by @dexter-mh-lee in #3445
- fix(ingest): lineage for Airflow subdags by @kevinhu in #3351
- docs: Update to Q3 2021 accomplishments by @maggiehays in #3420
- fix(bigquery): Add gcp logging dependency for bigquery source. by @varunbharill in #3451
- build(frontend): unzip depend on yarnBuild by @gabe-lyons in #3452
- feat(react): add handy webpack analyze command by @gabe-lyons in #3454
- test(CI): show test results on GitHub by @EnricoMi in #3362
- docs(transformers): add exemple of custom tag function by @WaStCo in #3354
- docs: add guide for using custom sources by @DSchmidtDev in #3324
- feat(dbt-ingestion): added possibility to skip specific models by @AndreasTA-AW in #3340
- fix(mongodb): Support filtering mongodb documents as per size. by @varunbharill in #3456
- fix(mysql): Update default mysql collation to utf8mb4_bin by @jjoyce0510 in #3459
- fix(ingestion): Workaround for Python 3.8/3.9 mypy invalid syntax issue with airflow 2.2.0 by @rslanka in #3460
- fix(ui): Fixing UI User + Group display name by @jjoyce0510 in #3461
- fix(react): fix up
yarn test
error reporting by @gabe-lyons in #3462 - docs(frontend): remove confusing suggestion to manually create users by @gabe-lyons in #3465
- docs: Overhaul of DataHub Features page by @maggiehays in #3439
- docs: Update TownHall Agenda and TownHall History by @maggiehays in #3463
- fix(tags): fix links to tags when there are special chars in the urls by @gabe-lyons in #3464
- fix(CI): Stabalize gradle build by @EnricoMi in #3413
- docs: update next Townhall date in README.md by @maggiehays in #3466
- perf(react bundle): decrease bundle size by 15% by @gabe-lyons in #3468
- fix(graphql): fixing Graphql engine factory when analytics are disabled by @gabe-lyons in #3467
- feat(recommendations): Recommendations infra P1 by @jjoyce0510 in #3455
- refactor(styling): Improving recommendation Tag / Search query list styling by @jjoyce0510 in #3472
- fix(docs): fix transformer doc example by @aseembansal-gogo in #3469
- fix(ingest): redshift source gets external table types properly by @treff7es in #3371
- fix(recs): Remove removed entities from aggregation by @dexter-mh-lee in #3473
- fix(ui): fix double formatting of entity count on home page by @jjoyce0510 in #3474
- fix(subtypes): fix case where subtypes are not being fetched for leaf datasets by @gabe-lyons in #3476
- feat(ingestion): User configurable dataset profiling. by @rslanka in #3453
- styling(ui): improve tag list, glossary term list recommendation styling by @jjoyce0510 in #3475
- feat(ui): Provide filtering capability for Sub Types inside the UI by @jjoyce0510 in #3479
- fix(ingest): correctly support multiple snowflake databases by @hsheth2 in #3482
- fix(datajobs): fetch dataflow properties from a relationship by @gabe-lyons in #3487
- fix(fk): fix schemaField urn construction in foreign keys by @gabe-lyons in #3486
- fix(fk): trim whitespace from fk constraints in the case the fieldspec has leading or trailing whitespace characters by @gabe-lyons in #3485
- feat(dbt): add dbt logo and platform. by @varunbharill in #3483
- feat(lineage): some ux improvements to lineage interactions by @gabe-lyons in #3478
- refactor(nocode): Final part of No-Code cleanup by @jjoyce0510 in #3477
- fix(browse paths): Adjust Default browse path logic for datasets by @jjoyce0510 in #3495
- fix(lineage backend): fix ownership timestamps by @gabe-lyons in #3498
- tests(smoke): introducing first isolated smoke test: updating tags & terms by @gabe-lyons in #3496
- feat(graphql): extend entity client to support aspect methods directly via java by @gabe-lyons in #3489
- fix(aspects): fix null aspects case by @gabe-lyons in #3501
- Docs: Update to Slack & Townhall details by @maggiehays in #3502
- refactor(profiler): add PerfTimer class and fix typos by @hsheth2 in #3497
- fix tiny typo by @andrewm4894 in #3484
- fix(ingestion): Glue job names by @kevinhu in #3503
- fix(fk): fix foreign key styling with modals by @gabe-lyons in #3500
- docs: add path fix for 'command not found' by @dannylee8 in #3490
- docs: nit, grammar by @dannylee8 in #3491
- docs: nit by @dannylee8 in #3492
- Docs: nits by @dannylee8 in #3493
- add tooltip for owner category in dataset profile page by @saxo-lalrishav in #3470
- feat(ingest) : kafka connect source improvements by @mayurinehate in #3481
- feat(ingest): adding support for read-modify-write capabilities durin… by @swaroopjagadish in #3506
- feat(dbt): Dbt enhancements - dbt nodes, lineage, subtype, etc. by @varunbharill in #3519
- docs (Metadata Model): nits by @dannylee8 in #3525
- fix(ingestion): Enhance logging and error-handling in bigquery usage connector. by @rslanka in https://github.com/linkedin/datahub/pul...
DataHub v0.8.16
Release Highlights
- Important bug-fixes:
properties
for DataJob and DataFlow,descriptions
for Datasets should now correctly show in the UI - Search redesign! Single search experience across all entity types with left filter bar
- Added searchAcrossEntities endpoint on both GraphQL and Rest.li that pulls search results for all entity types and mixes them together
- Dataset level lineages - Added support for ingesting dataset level lineages for bigquery. Added support for linking external tables in redshift to the corresponding table in the external data catalog.
- Performance optimization: graphql will now directly call the entity service instead of calling the entity resource over http to hydrate graphql models.
- The “filter” input model used for “search” API now supports disjunctive normal form. (OR of ANDs). The previous filter model should continue to work as expected. (criteria array)
- Adding foundations (models) for search insights, or highlights shown in the search result previews.
- Add owner experience improvements: using full text search to find users and groups.
- User & Group Management Screens!
- View all users (and those who have logged in)
- View all groups
- Create new groups
- Add and remove group members
Breaking Changes
None
What's Changed
- feat(ui): Improve add owner search experience by @jjoyce0510 in #3306
- (fix) Set ebean transaction level to be repeatable read by @xdl in #3285
- fix(fonts): fix manrope styling by @gabe-lyons in #3311
- docs(datahub-frontend): add build instructions for the datahub-frontend docker image by @thebouv in #3314
- feat(ingest): support for primary and foreign key extraction from sql sources by @swaroopjagadish in #3316
- feat(transform): adds replace_existing config to set_dataset_browse_path by @sgomezvillamor in #3313
- feat(redshift): added ability to extract external schema from Redshift spectrum by @varunbharill in #3321
- fix(docs): patch link to Airflow Docker compose file by @kevinhu in #3322
- docs: Fix topic_pattern typo in kafka ingestion docs by @serefacet in #3317
- fix(graphql): add ElasticSearch path prefix configuration by @zhoxie-cisco in #3297
- fix(ingest): more robust error handling in lookml sql parsing by @swaroopjagadish in #3325
- fix(ingest): Fix sasl exception for hive ingestion by @serefacet in #3326
- fix(ingest): no error when there are no partition keys by @aseembansal-gogo in #3328
- fix(docs): fix graphql deprecated comment by @gabe-lyons in #3327
- feat(dbt-ingestion): added tags and owner from dbt by @AndreasTA-AW in #3270
- fix(oidc): Tolerate null emails by @jjoyce0510 in #3330
- feat(Snowflake Lineage Ingestion) by @rslanka in #3331
- feat(ingest): support user group filtering for Azure AD by @vlavorini in #3312
- feat(ingest): Redash add parse_table_names_from_sql feature and multiple refactor by @taufiqibrahim in #3267
- feat(ingest): add support for github and looker links in looker views… by @swaroopjagadish in #3332
- fix(git-ignore): Git ignore generated python and avro artifacts by @dexter-mh-lee in #3320
- fix(ingestion): make dbt tag prefix configurable by @remisalmon in #3334
- feat(ingest): add trino source in metadata-ingestion by @mayurinehate in #3307
- feat(ingestion): support Airflow cluster config by @hsheth2 in #3336
- feat: add support for specialization of models through subtypes with … by @swaroopjagadish in #3338
- feat(search): Redesign search page - left filter pane by @dexter-mh-lee in #3337
- feat(users & groups): User & Groups Management GraphQL APIs + UI by @jjoyce0510 in #3318
- fix(pk + autocomplete): some ui fixes by @gabe-lyons in #3347
- fix(urns): prevent corrupted urns from being created by @gabe-lyons in #3348
- fix(ingestion-docker): Codegen and build again by @dexter-mh-lee in #3342
- docs(ingest): fix trino doc by @mayurinehate in #3339
- fix(docker-quickstart): Fix volume mount paths when using quickstart by @dexter-mh-lee in #3341
- fix(autocomplete): Fix empty autocomplete server error by @jjoyce0510 in #3346
- fix(Add custom elastic field mappings for all timeseries fields) by @rslanka in #3350
- fix(gitignore): Fix gitignore to ignore whole directory by @dexter-mh-lee in #3361
- fix(mce_builder): deleted alias by @vlavorini in #3356
- feat(data-platform): Add science and airflow data platform by @dexter-mh-lee in #3363
- fix(ui): fix url encoding issues by @gabe-lyons in #3359
- fix(gitignore): Update gitignore again - remove metadata-ingestion objects by @dexter-mh-lee in #3365
- fix(ci): add run_id to the task instance constructor for airflow by @swaroopjagadish in #3366
- fix(aws-deploy-docs): Fix documentation for elasticsearch by @dexter-mh-lee in #3360
- fix(bigquery_usage): Gracefully failing while parsing GCP log events. by @varunbharill in #3367
- feat(ingest): allow disabling sample values in profiling by @aseembansal-gogo in #3355
- fix(docs): fix docs for developing on metadata ingestion by @aseembansal-gogo in #3353
- test(CI): Timeout build job by @EnricoMi in #3364
- docs(OIDC): add note that root user is still accessible by @aseembansal-gogo in #3372
- test(metadata-io): Run metadata-io tests in parallel by @EnricoMi in #3358
- test(ElasticSearch): Retry ES requests by @EnricoMi in #3377
- fix(ingest): redshift usage properly count queries by @treff7es in #3370
- feat(subtypes): Support Viz for "view" subtypes by @jjoyce0510 in #3376
- fix(graphql): Correctly return tags and legacy global tags field by @jjoyce0510 in #3378
- fix(ingest): fixing support for kafka key schemas when only key schemas are present by @swaroopjagadish in #3379
- fix(search): Small bug fixes for search redesign by @dexter-mh-lee in #3381
- test(airflow): remove unneeded execution_date parameter from test by @hsheth2 in #3368
- feat(ingest): add mariadb as possible source by @aseembansal-gogo in #3245
- fix(search): fixing user and group links in search results by @gabe-lyons in #3383
- fix(subtypes): Fix subtypes tab visibility by @jjoyce0510 in #3386
- Revert "test(ElasticSearch): Retry ES requests" by @gabe-lyons in #3385
- Revert "Revert "test(ElasticSearch): Retry ES requests"" by @gabe-lyons in #3392
- Adding kafka connect data platform by @jjoyce0510 in #3388
- Replace big query logo with the latest by @jjoyce0510 in #3387
- oidc: Add "name" claim extraction if present by @jjoyce0510 in #3384
- feat(ingest): teaching lookml source that athena has 2 parts in its dataset names by @swaroopjagadish in #3393
- fix(ingest): fix issues with lookml view file resolution on non-view … by @swaroopjagadish in #3397
- feat(search): Search insights foundations by @jjoyce0510 in #3391
- fix(graphQL): Populating deprecated Dataset description field by @jjoyce0510 in #3403
- feat(search): Support Boolean OR Filters in Rest.li APIs by @jjoyce0510 in #3344
- fix(lookml): Fixing lookml integration test. by @varunbharill in #3405
- fix(browse): Add more special character handling by @dexter-mh-lee in #3404
- fix(search): Reduce default batch size by @dexter-mh-lee in #3407
- fix(ui): Extract customProperties map from "properties" OR ...
DataHub v0.8.15
Notable Changes
- Support the “NONE” Client Authentication Method for OIDC login.
- Migrated to the new UI for Charts, Dashboards, Data Flows (Pipelines), Data Jobs (Tasks) profile pages
- Primary and Foreign Keys rendered in the UI
- Ingestion
- Support for
redshift-usage
source - Fixes for
looker
ingestion datahub
cli supports -f/--force option to skip confirmations
- Support for
Changelog
- #3310 @jjoyce0510 Updating logo
- #3309 @jjoyce0510 Fixing lineage
- #3308 @jjoyce0510 Attach Client ID to token request in Authentication Mode none
- #3256 @aseembansal-gogo feat(ingest): add -f option to skip confirmations for automation en…
- #3298 @gabe-lyons feat(react): show primary keys & foreign keys in the schema
- #3172 @gabe-lyons marking data process aspects as deprecated
- #3301 @jjoyce0510 fix(upgrade): Improving NoCodeUpgrade logic to account for Bootstrap logic
- #3305 @jjoyce0510 feat(oidc): Support NONE client auth method in OIDC (stopgap)
- #3304 @gabe-lyons fix(docs): fix entity doc link
- #3303 @jjoyce0510 feat(UI): UI Migration for Charts, Dashboards, Pipelines, Tasks + Glossary Terms and Links for all.
- #3276 @bboylen feat(react): add groups tab to user profile
- #3299 @swaroopjagadish feat(build): adding support for python codegen for all aspects, not just the snapshot ones
- #3294 @swaroopjagadish fix(ingest): looker explores with joins, parsing failures on lateral flatten
- #3277 @chinmay-bhat feat(ingest): add redshift usage source
- #3290 @adriaanslechten feat(ingest): optional custom headers REST emitter
- #3293 @chinmay-bhat fix(build): update tox.ini to allow new dependencies to be installed
- #3292 @gabe-lyons fix(ingest): update generated files
- #3278 @jjoyce0510 refactor(graphql): GraphQL Public API Refactor + Documentation
- #3287 @swaroopjagadish fix(ingest): fix typo in looker tag generation
- #3275 @gabe-lyons feat(foreign keys): add foreign key models
- #3283 @aseembansal-gogo feat(ingest): add athena logo
- #3280 @gabe-lyons fix(react): fix updates from the UI
- #3279 @swaroopjagadish feat(ingest): add nice semantic run-ids that use source type and time of ingestion
- #3274 @gabe-lyons fix(chartinfo): only map chartinfo inputs if exists
- #3272 @gabe-lyons docs(adoption): updating adoption logos
- #3271 @jjoyce0510 fix(policies): Always ingest non-editable policies on boot
- #3259 @gabe-lyons feat(graphql): Adding write side validation and tests for add+remove API
- #3264 @swaroopjagadish fix(ingest): making lookml recursive and nested includes work
- #3262 @swaroopjagadish fix(ingest): looker cascading derived tables should express lineage to view not underlying table
- #3254 @abdvl fix(web): upgrade remove-markdown package to fix a ReDoS security issue
- #3011 @EnricoMi test(GraphService): Thorough graph service tests
- #3258 @jensenity chore: add banksalad to datahub adoption readme
DataHub v0.8.14
Release Highlights
- Small bug fixes over 0.8.13
Notable Changes
- Fix bug in OIDC config for setting response type
- Add WAU chart in the analytics page
- Starting with
acryl_datahub==0.8.13.1
(pypi), Looker and Lookml ingestion will now name views differently from before. You will need to delete old LookML metadata to start with a clean slate or specifyview_naming_pattern = “{name}”
in both your Looker and LookML ingestion recipes to get the old behavior. - Populate the user email field in usage statistics to correctly show top users on the entity page
- Full changelog below
Changelog
- #3215 @aseembansal-gogo feat(ingest): support for env variable in cli
- #3253 @remisalmon fix(ingest): allow ingestion of glossary terms without nodes
- #3255 @swaroopjagadish feat(ingest): looker and lookml improvements - connection, explores, folders
- #3010 @EnricoMi refactor(dao/utils): Move general createRelationshipFilter from Neo4jUtil to QueryUtils
- #2736 @jjoyce0510 rfc(RBAC): Fine-Grained Access Controls in GMS
- #3251 @jjoyce0510 Fixing response type bug
- #3249 @dexter-mh-lee Fix OIDC doc
- #3252 @dexter-mh-lee feat(analytics): Add WAU over the last 2 months chart
- #3250 @gabe-lyons feat(glossary): splitting apart tags & terms into their own visual sections
- #3244 @rslanka fix(usage statistics): populate the email field
- #3238 @aseembansal-gogo fix(ingest): add missing partition keys in schema for glue sources
- #3243 @swaroopjagadish fix(ingest): fixing snowflake and bigquery usage connectors to use real user urns
- #3241 @claudio-benfatto fix(docker): use wait-http-header to avoid printing cleartext credentials
- #3220 @dexter-mh-lee fix(frontend): Add additional sasl config for kafka producer in datahub-frontend
DataHub v0.8.13
Release Highlights
- Support for aggregated statistics wrt the timeseries aspect. Moved usage stats functionality to use the new framework.
- Auto-ingest common data platforms on GMS boot! No more generic logos.
- Fixes re-ingestion of modified policies at startup
- Full changelog below
Breaking Changes
- Usage stats endpoint now uses the time-series aspect index in Elastic, meaning that statistics ingested previously will be lost. Please re-run usage ingestion (e.g. bigquery-usage / snowflake-usage) etc. to backfill your usage statistics history.
Changelog
- #3242 @rslanka fix(dataset profiles): compatibility with older indices.
- #3235 @gabe-lyons fix(glossary terms): some cosmetic fixes for glossary terms
- #3234 @jjoyce0510 fix(policies): Only ingest bootstrap policies on clean start
- #3207 @rslanka feat(Analytics): Support for Timeseries Aggregated Statistics
- #3160 @EnricoMi test(metadata-io): Improve speed of ElasticSearch tests
- #3232 @gabe-lyons fix(glossary terms): add glossary terms privilege to COMMON_ENTITY_PRIVILEGES
- #3230 @gabe-lyons fix(react): fix jitter on schema when adding description
- #3219 @chinmay-bhat feat(ingest): auto-ingest common data platforms on GMS boot
- #3229 @jjoyce0510 fix(upgrade): Check whether tables exist using findList
- #3213 @gabe-lyons feat(business glossary): Add support to add & remove glossary terms from the UI
- #3221 @dexter-mh-lee fix(oidc): add more oidc config
- #3223 @gabe-lyons fix(graphql): fix tag mapper
- #3224 @gabe-lyons fix(react): fix lineage button highlighting
- #3222 @gabe-lyons fix(react): add owner modal title
- #3216 @gabe-lyons fix(react): fix proxy for login route
DataHub v0.8.12
Release Highlights
- RBAC Phase 1: Added abilities to control access through policies in the UI and backend
- Dataset page refresh!!! + improved home page, search and browse screens
- Added the ability to monitor DataHub through Prometheus and provided example Grafana dashboards
- GraphQL API browser hosted on /api/graphql endpoint.
- Support for Business Glossary ingestion through yml file
- Support for Azure AD ingestion source
Notable Changes
- Fixed unicode rendering bug introduced in v0.8.11
- Added the ability to search by properties in the customProperties bag: supports case-insensitive matches of the form ‘key=value’
- For instance, query “encoding=utf-8” will return entities with “encoding”: “utf-8” in the property bag
- Full changelog below
Changelog
- #3214 @dexter-mh-lee fix(docker): pin setuptools version in docker ingestion build
- #3212 @gabe-lyons fix(metadata-ingestion): fixing lint issues
- #3196 @abdvl fix(react): safely access caught Error properties
- #3195 @dexter-mh-lee feat(perf): Add perf testing and monitoring framework
- #3136 @dexter-mh-lee feat(search): Add searchable annotation to maps
- #3158 @karoliskascenas feat(ingest): optionally ingest deleted looker dashboards
- #3210 @gabe-lyons fix(admin): moving admin links to header
- #3211 @dexter-mh-lee fix(build): specify setuptools version for dev install
- #3208 @dexter-mh-lee fix(search): Move filters to query instead of post query
- #3209 @gabe-lyons fix(react): fix tag schema search on tag profile
- #3190 @jjoyce0510 fix(graphql): fix ml model properties resolver
- #3200 @jjoyce0510 fix(bootstrap): making bootstrap manager run once
- #3197 @jjoyce0510 feat(access control): Adding "authorizedActors" method to AuthorizationManager
- #3201 @EnricoMi ci: upload test reports
- #3199 @jjoyce0510 Fix GraphQL Variables
- #3193 @abdvl refactor(test): remove the
datahub-frontend.graphql
- #3198 @dexter-mh-lee fix(platform): fix kafka env name for MCL_timeseries
- #3194 @jjoyce0510 fix(react): fix add links
- #3192 @gabe-lyons fix(react): fixing format of search snippets
- #3191 @jjoyce0510 fix(react): pin the control center menu icon
- #3189 @jjoyce0510 fix(404): Fix 404 Exit Error.
- #3182 @jjoyce0510 feat(access control): Fine-Grained Access Control M1
- #3187 @gabe-lyons fix(react): Fix the fieldPath grouping logic in the front-end
- #3188 @nickwu241 docs: fix "data platforms" link in dbt.md
- #3184 @dexter-mh-lee fix(kafka): Change env variable name for MCL_versioned to be consistent
- #3185 @gabe-lyons fix(react): removing preview artifact from platform logo
- #3183 @chinmay-bhat fix(business_glossary): added init.py
- #3181 @chinmay-bhat refactor(ingest): rename azure source to azure_ad
- #3159 @sgomezvillamor feat(ingest): add optional config for ownership type in ownership transformers
- #3179 @remisalmon fix(dbt): use_identifiers option and avoid duplicate descriptions
- #3164 @shirshanka feat(ingest): Add a business glossary source
- #3178 @gabe-lyons fix(react): show schema-attached description
- #3177 @dexter-mh-lee Revert "fix(search): move filters to query instead of postFilter (#3112)"
- #3173 @dexter-mh-lee fix(docs): Add documentation for AWS MSK
- #3176 @dexter-mh-lee feat(airflow): add example docker setup for airflow
- #3175 @gabe-lyons fix(dataflow): optimize topological sort logic
- #3170 @chinmay-bhat docs(ingestion): updated hive ingestion docs with Databricks recipe
- #3171 @chinmay-bhat fix(doc): add use_odbc to mssql doc example
- #3169 @gabe-lyons feat(react): Dataset page refresh + improved homepage, search and browse screens
- #3168 @gabe-lyons fix(frontend): fix utf8 encoding bug
- #3167 @shirshanka docs: update Aug townhall details and announce Sep townhall
- #3112 @dexter-mh-lee fix(search): move filters to query instead of postFilter
- #3148 @frsann feat(ingest): Minor Kafka Connect source improvements
- #3161 @chinmay-bhat feat(ingest): Adding Azure Source integration to ingest users, groups and group memberships
- #3165 @jjoyce0510 feat(graphql): add GraphQL Explorer (GraphiQL)
DataHub v0.8.11
Release Highlights
- Business Glossary: Phase 1 is feature complete. Full support for UI viewing and API-based edits, no support for UI edits.
- Users and Groups: Just-in-time User and Group provisioning on login (SSO/OIDC), basic Group pages with membership information
- New Integrations: Redash
Notable Changes
- GraphQL and REST API-s are now both served by datahub-metadata-service (new name for gms). Frontend is now a proxy. Container names are not changed.
- Kafka source will no longer tokenize on
.
in the topic name. This will result in a flat browse experience in UI. - Airflow lineage emission will only populate specific properties of Tasks and DAGs to limit bloat and avoid leaking environment variables.
- Schema history feature turned off in UI based on feedback from the community. Will re-emerge in a future release!
- Mongodb collections with extremely wide schemas will have schema fields sampled to keep UI responsive.
- Full changelog below.
ChangeLog
- #3156 @swaroopjagadish fix(frontend): replacing broken link for default avatar
- #3154 @swaroopjagadish fix(frontend): fixing broken link to default avatar
- #3153 @swaroopjagadish feat(ingest): adding maxSchemaSize to mongodb source
- #3150 @saxo-lalrishav fix(business-glossary): business glossary visual changes
- #3142 @greysond fix(metadata-service): actually load keys from keystore for elastic connections
- #3110 @frsann feat(ingestion): bring your own SQL parser
- #3146 @jjoyce0510 fix(react): refactoring hasKeySchema computation
- #3145 @swaroopjagadish deps(ingest): upgrade to pick up acryl-pyhive changes
- #3144 @sgomezvillamor fix(profiles): prevent NoneType exception when profiling empty datasets
- #3140 @swaroopjagadish fix(glossary): Make terms searchable and browseable
- #3139 @swaroopjagadish fix(deps): Adding min version to python-dateutil to guard against isoparse failures
- #3135 @dexter-mh-lee fix(kafka): Change consumer id of mae/mce processor
- #3137 @swaroopjagadish fix(airflow): only emit specific keys for airflow lineage properties
- #3131 @jjoyce0510 feat(graphql): migrating GraphQL API to metadata-service (nee GMS)
- #3082 @jjoyce0510 feat(sso): Just-In-Time User & Group Provisioning on SSO Login (oidc)
- #3129 @saxo-lalrishav feat(business-glossary): Business glossary relationship UI
- #3113 @dexter-mh-lee feat(ingest): Add custom browse paths for kafka sources and remove browse lowercase filter
- #2918 @taufiqibrahim feat(ingest): adding redash source
- #3103 @saxo-lalrishav feat(business-glossary): glossary term relationship graphql changes
- #3015 @jjoyce0510 refactor: remove unused gms code, frontend endpoints part 2/4
- #3094 @jjoyce0510 feat(group ui): Basic group search membership in UI
- #3012 @Shikha-Trivedi-Saxo feat(business-glossary): Glossary term relationship backend
- #3049 @neojunjie feat(frontend): logout with oidc
- #3099 @gabe-lyons fix(schema-version): temporarily hide schema version tab
- #3048 @saxo-lalrishav feat(business-glossary): added field level glossary terms
- #3095 @shirshanka fix(ingest): increasing default ingestion REST timeout to 30 seconds
- #3096 @dexter-mh-lee fix(upgrade): Fix MAE consumer and upgrade's dependency issue
- #3092 @jensenity fix(postgres): fix postgres setup to handle existing database
DataHub v0.8.10
Release Highlights
Bugfix release for 0.8.9
- [#3096] Fix dependency injection issue introduced by this PR
- Increase REST emitter timeout to 30 seconds by default
ChangeLog
- #3095 @shirshanka fix(ingest): increasing default ingestion REST timeout to 30 seconds
- #3096 @dexter-mh-lee fix(upgrade): Fix MAE consumer and upgrade's dependency issue
- #3092 @jensenity fix(postgres): fix postgres setup to handle existing database
DataHub v0.8.9
Release Highlights
- Support for nested structs, union types and key-value schemas in Kafka
- Support for JDBC Connector based sources in Kafka Connect
- Support for Okta as a source for User and Group metadata
- Support for using AWS Glue schema registry
Breaking Changes
- [#3079] : Introduces a change to fieldPath encoding in schema metadata. Note: This is a backwards compatible change for the storage layer. Old fieldPaths will still be rendered correctly. At read time, fieldPaths in the new encoding will be translated to the old encoding to discover tags written before this change. Tags and Descriptions applied to fields earlier (which were being stored in the old format) will be migrated on applying new tags or editing descriptions.
Important Bug Fixes
- [#3070] Charts and Dataset lineage was broken in release 0.8.8. This has been fixed via [gma-125]
ChangeLog
- #3093 @gabe-lyons fix: fixing key-value after adding version
- #3088 @dexter-mh-lee fix(mysql-setup): Change default charset to utf8mb4
- #3091 @gabe-lyons feat: Adding clarity around qualified unions and removing extra lines for structs
- #3090 @dexter-mh-lee feat(workflow): Add mysql/postgres setup workflow
- #3083 @dexter-mh-lee feat: add support for AWS glue schema registry
- #3043 @jjoyce0510 feat(ingest): Adding an Okta Integration to extract Users, Groups, Group Membership
- #3080 @gabe-lyons fix(react): bolding field name if single token
- #3081 @shirshanka docs: Update Aug town-hall dates and previous town-hall details
- #3079 @rslanka feat: Adding support for nested schemas in ingestion and visualization
- #3078 @kevinhu fix(ingest): sqlalchemy-snowflake add constraints to make sure we don't pull in 1.2.5
- #2987 @jjoyce0510 docs(deploy): Adding confluent cloud doc
- #3076 @chinmay-bhat feat(ingest): add support for jdbc connector to kafka-connect source
- #3071 @kevinhu feat(docs): link to SQL profiling docs from each SQL source
- #3074 @jjoyce0510 refactor(build): Remove unnecessary ext modules.
- #3073 @dexter-mh-lee fix(docker): remove unnecessary components from docker-compose
- #3072 @dexter-mh-lee fix(docker): upgrade base image version
- #3068 @kevinhu feat(docs): add overrides for sidebar labels and S3 guide in sources dropdown
- #3069 @kevinhu fix(ingest): remove tags from bootstrap_mce since that is deprecated
- #3070 @gabe-lyons chore: upgrading gma to 0.2.80
- #2990 @rahulbsw feat(ingest): Added support for "add dataset ownership by regex match"
- #3067 @kevinhu fix(ingest): apply case insensitive regex matching by default
- #3066 @aseembansal-gogo docs(ingest): fix typos
- #3041 @kevinhu feat(docs): reorder and restyle navbar
- #3062 @gabe-lyons feat(cli): datahub init & docs for it
- #3064 @dexter-mh-lee fix(quickstart): remove mem_limit for datahub containers
- #3059 @kevinhu feat(docs): link to ingestion quickstart under ingestion section
- #3058 @kevinhu docs(ingest): link to docs from recipes
- #3014 @jjoyce0510 chore(frontend): Remove unused files 1/4
- #2986 @aseembansal-gogo feat(ingest): add transformers to clear dataset ownership, mark status, add browse paths
- #3056 @dexter-mh-lee fix(ingest): stop looker source from unnecessarily filling out owners
- #3055 @dexter-mh-lee fix(ingest): add default configurable timeout for rest emitter
- #3039 @kevinhu fix(docs): update metadata ingestion dev guide
- #3031 @kevinhu feat(docs): refactor source and sink ingestion docs
- #3054 @gabe-lyons fix(frontend): fixing homepage jitter
- #3051 @gabe-lyons feat(quickstart): linking to slack from quickstart
- #3053 @jjoyce0510 docs: Add Exact match search CURL example.
- #3033 @kevinhu feat(ingest): replace and warn against relative imports
- #3035 @aseembansal-gogo feat(ingest): add underlying platform for glue
- #3052 @jjoyce0510 docs: Add tags GMS API documentation
- #2973 @saxo-lalrishav fix(Business Glossary): updated glossary term search strategy
- #2996 @jensenity feat(postgres): add postgres setup docker image
- #3044 @gabe-lyons fix(frontend): fixing external url logic in charts and dashboard mapper
- #3045 @gabe-lyons fix(frontend): hide dashboard date when null
- #3022 @jsotelo fix(frontend): add support for SASL_KERBEROS_SERVICE_NAME & SASL_PLAINTEXT
- #3036 @jjoyce0510 fix(frontend): Fix exception casting in EntityClient
- #3037 @kevinhu fix(ingest): detect malformed Glue S3 script paths
- #3038 @kevinhu fix(ingest): replace backticks for lookml
- #3040 @kevinhu fix(ingest): add bigquery type mappings
- #3042 @gabe-lyons fix(frontend): fixing lineage tokenization
- #3034 @jjoyce0510 fix(gms): Validate unrecognized model fields.