Skip to content

feat: Add integration tests for tpchgen-cli #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 21, 2025

Conversation

alamb
Copy link
Collaborator

@alamb alamb commented Jul 15, 2025

Rationale

As we make changes to tpchgen-cli, we need some way to test that our changes are correct.
Let's add some tests

Changes

Add CLI based tests using assert_cmd

THis was partly written by GPT 4.0 / Github copilot

@alamb alamb changed the title Implement integration tests Integration tests for tpchgen-cli Jul 15, 2025
@alamb alamb marked this pull request as ready for review July 15, 2025 13:04
@alamb alamb changed the title Integration tests for tpchgen-cli feat: Add integration tests for tpchgen-cli Jul 15, 2025
@alamb alamb requested a review from clflushopt July 15, 2025 13:04
@alamb
Copy link
Collaborator Author

alamb commented Jul 15, 2025

FYI @@mrendi29 as well

@alamb
Copy link
Collaborator Author

alamb commented Jul 15, 2025

The new tests are run as part of CI. Here is an example:

@alamb alamb mentioned this pull request Jul 15, 2025
5 tasks
@mrendi29
Copy link

Thanks @alamb for taking a stab at this. @kevinjqliu i recall you had some test that you had not pushed yet, so question to you both, do we want to build off the integration tests suite within this branch/PR or should we split this into multiple PRs?

Copy link
Owner

@clflushopt clflushopt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb I am approving this PR to unblock and will let you resolve the comments from @mrendi29. As a follow up, do we want to have /data/ use two sources of truth duckdb + parquet and dbgen outputs ?

@alamb
Copy link
Collaborator Author

alamb commented Jul 17, 2025

Thanks @alamb for taking a stab at this. @kevinjqliu i recall you had some test that you had not pushed yet, so question to you both, do we want to build off the integration tests suite within this branch/PR or should we split this into multiple PRs?

I personally think we can build from this integration test and then add the "check the output parquet file format"

@alamb
Copy link
Collaborator Author

alamb commented Jul 17, 2025

Thanks @alamb I am approving this PR to unblock and will let you resolve the comments from @mrendi29. As a follow up, do we want to have /data/ use two sources of truth duckdb + parquet and dbgen outputs ?

Thank you @clflushopt 🙏

When you say "two sources of truth" I wonder what you mean? Do you mean some way to verify that the actual values written into the .parquet are are correct?

The tests in this PR do not do that (as you note)

@clflushopt
Copy link
Owner

Yes one for the tbl format via dbgen and one for the parquet format via duckdb we can generate samples at various scale factors and have them in the repo as fixtures.

@alamb
Copy link
Collaborator Author

alamb commented Jul 21, 2025

Yes one for the tbl format via dbgen and one for the parquet format via duckdb we can generate samples at various scale factors and have them in the repo as fixtures.

Filed #161 to track

@alamb alamb merged commit 914612f into clflushopt:main Jul 21, 2025
7 checks passed
@alamb
Copy link
Collaborator Author

alamb commented Jul 21, 2025

Thanks again @clflushopt

@alamb alamb deleted the alamb/integration_test branch July 21, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants