Skip to content

Conversation

YuweiXiao
Copy link
Contributor

@YuweiXiao YuweiXiao commented Mar 21, 2025

Partially fixes #30

@YuweiXiao
Copy link
Contributor Author

hi~ @JelteF The test cases aren't fully covered yet. would appreciate some feedback before finalizing the details. Thanks!

Comment on lines 186 to 191
if (query->onConflict || query->returningList) {
elog(elevel, "DuckDB does not support INSERT ON CONFLICT or RETURNING on Postgres tables");
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allow these? Do they not work? I would expect that if we insert using postgres its functionality that thees things would automatically work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, not saying that this should be implemented in the first version of this feature. But I'd at least like a code comment explaining why we do not support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, block them since I haven't tested them yet. I've done some initial testing, and RETURNING works out of box while onConflict doesn't. Will dig deeper into the WHY and add comments to clarify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ON CONFLICT is also working

Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is failing in many places, because duckdb.force_execution is true in our tests. Apparently something breaks when selecting from generate_series() in duckdb for a PG INSERT.

@JelteF
Copy link
Collaborator

JelteF commented Mar 21, 2025

My example code from #30:

create table t(a int, x text);
-- 1. easiest to make work (still not super easy)
insert into t select r['a']::int, r['x']::text from duckdb.query($$ SELECT 1 a, 'abc' x $$) r;

Fails with this error:

ERROR:  42804: table row type and query-specified row type do not match
DETAIL:  Table has type text at ordinal position 2, but query expects character varying.
LOCATION:  ExecCheckPlanOutput, nodeModifyTable.c:212

I think that would be worked around by #583, but then the opposite would hold (i.e. inserting into a varchar column woudl fail). So I think we need some logic to make sure that the types match against eachother, and maybe add some code to convert the type that we get back from DuckDB to the type that PG wants for the INSERT. I think that would also fix the generate_series() issue that all of the tests are failing on currently.

@YuweiXiao YuweiXiao marked this pull request as draft March 22, 2025 09:29
@YuweiXiao
Copy link
Contributor Author

I think that would be worked around by #583, but then the opposite would hold (i.e. inserting into a varchar column woudl fail). So I think we need some logic to make sure that the types match against eachother, and maybe add some code to convert the type that we get back from DuckDB to the type that PG wants for the INSERT. I think that would also fix the generate_series() issue that all of the tests are failing on currently.

YES, the type check and conversion are substantial. I manually added an Expr to handle the coerce and it seems working so far (e.g., varchar and generate_series issue). I'll add more test cases to ensure if I haven't missed (or broken) anything, and code re-org (& cleanup) are also in progress. Thanks for the feedback, @JelteF !

@YuweiXiao YuweiXiao changed the title [DRAFT] Support insert_into_select for Postgres(heap) table [DRAFT] Support insert_into_select for Postgres table Mar 22, 2025
@YuweiXiao YuweiXiao force-pushed the heap_insert_select branch from dabffc2 to e7f7aaf Compare March 24, 2025 09:19
@YuweiXiao
Copy link
Contributor Author

YuweiXiao commented Mar 24, 2025

The MR is ready for review~ @JelteF Please take a look when you have a chance.

Currently, the regression is failing for hugeint_numeric. The issue seems related to the following behavior, which is reproducible in the main branch. Do you have any insights on this?

select * from duckdb.query($$ SELECT a FROM (VALUES (32942348563242.111222333444555666777888::numeric(38,24)), (NULL::numeric), (123456789.000000000000000000000001::numeric(38,24))) t(a) $$);

                   a
----------------------------------------
 32942348563242.11122233344455566677789  (correct one should be `32942348563242.111222333444555666777888`

      123456789.00000000000000000000000 (correct one should be 123456789.000000000000000000000001)
(3 rows)

update: DuckDB cli has the exact same output.

@YuweiXiao YuweiXiao marked this pull request as ready for review March 24, 2025 12:38
@YuweiXiao YuweiXiao changed the title [DRAFT] Support insert_into_select for Postgres table Support insert_into_select for Postgres table Mar 24, 2025
@YuweiXiao YuweiXiao requested a review from JelteF April 2, 2025 09:56
@JelteF
Copy link
Collaborator

JelteF commented Apr 16, 2025

Currently, the regression is failing for hugeint_numeric. The issue seems related to the following behavior, which is reproducible in the main branch. Do you have any insights on this?

I think you can probably fix the test by changing the plain NULL to NULL::NUMERIC(38, 24). It seems at least that that solves the issue when running your reproducing example in the duckdb CLI.

continue;
}

if (attr->atttypid == BYTEAOID || attr->atttypid == XMLOID || pgduckdb::pg::IsDomainType(attr->atttypid) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding some unsupported types, I think we should instead determine the allowed types. Also, why are these types unsupported? Because we do support BYTEA as a type for regular duckdb execution, so that seems surprising that it doesn't work here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the behavior to whitelist. As long as the Postgres type can be converted to a DuckDB type using ConvertPostgresToDuckColumnType, it will be allowed.

}

if (!select_rte) {
elog(elevel, "DuckDB does not support INSERT without a subquery");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for this error case afaict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


DROP TABLE tbl;
-- case: ON CONFLICT DO NOTHING
CREATE TABLE tbl (a int PRIMARY KEY, b text);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with a SERIAL column or a column with GENERATED ALWAYS AS IDENTITY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a regression for them, and it works perfectly.

@JelteF JelteF added this to the 1.1.0 milestone Apr 17, 2025
@JelteF
Copy link
Collaborator

JelteF commented Apr 17, 2025

To set expectations: I think this is a very cool feature and your implementation looks like the right direction. But it is relatively complex and impactful, so I would like to give it an appropriate round of testing and "baking" (aka testing by merging it and having it on the main branch for a while). Since we want to release 1.0 somewhat soon, I don't think it's realistic to still do that before the 1.0 release. So I added this to the new 1.1 milestone instead. I also won't spend time reviewing this PR in depth until then.

@YuweiXiao
Copy link
Contributor Author

To set expectations: I think this is a very cool feature and your implementation looks like the right direction. But it is relatively complex and impactful, so I would like to give it an appropriate round of testing and "baking" (aka testing by merging it and having it on the main branch for a while). Since we want to release 1.0 somewhat soon, I don't think it's realistic to still do that before the 1.0 release. So I added this to the new 1.1 milestone instead. I also won't spend time reviewing this PR in depth until then.

Totally agree, 1.1 makes sense. btw, any target date for 1.0 release?

@JelteF
Copy link
Collaborator

JelteF commented Apr 17, 2025

any target date for 1.0 release?

The current plan is to do it quickly after the DuckDB 1.3.0 release, so early May. But dates may obviously slip.

Yuwei Xiao added 2 commits May 8, 2025 06:11
handle type conversion

handle column mapping & cleanup

fix schedule

code cleanup & add restriction

revert uncessary changes
@YuweiXiao YuweiXiao force-pushed the heap_insert_select branch from 3be8e43 to 1b84967 Compare May 8, 2025 06:12
@YuweiXiao YuweiXiao force-pushed the heap_insert_select branch from 749203f to 93e8b8a Compare May 8, 2025 08:05
Yuwei Xiao added 2 commits May 8, 2025 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Writing into Postgres tables
2 participants