Skip to content

Conversation

@riverszhang89
Copy link
Contributor

Our customers have issue using spark to create a dataframe on a datetime column, because spark interpolates the provided datetime string, but drops the "T" separator when filling the range.

It seems that omitting the "T" character is fairly acceptable, and ISO 8601 does allow it to be omitted in a few cases. This is a very simple patch to allow it.

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Error ⚠.
Regression testing: 3/625 tests failed ⚠.

The first 10 failing tests are:
udf
consumer_non_atomic_default_consumer_generated
consumer

@dorinhogea
Copy link
Contributor

If we allow strings without T to be converted to datetimes, we have the following conundrum: is "20250101" a datetime or an epoch value?

@riverszhang89 riverszhang89 force-pushed the datetime_literal branch 2 times, most recently from de0a305 to c8c985d Compare August 16, 2025 11:58
@riverszhang89
Copy link
Contributor Author

If we allow strings without T to be converted to datetimes, we have the following conundrum: is "20250101" a datetime or an epoch value?

If the string contains only digits, it's treated as a unix timestamp (code). So "20250101" will still be an epoch value! I've also added a test for it!

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 5/625 tests failed ⚠.

The first 10 failing tests are:
sc_transactional_rowlocks_generated
udf
consumer_non_atomic_default_consumer_generated
consumer
sc_downgrade

@akshatsikarwar
Copy link
Contributor

This breaks UDF and consumer tests?

Our customers have issue using spark to create a dataframe on a
datetime column, because spark interpolates the provided datetime
string, but drops the "T" separator when filling the range.

It seems that omitting the "T" character is fairly acceptable, and
ISO 8601 does allow it to be omitted in a few cases. This is a
very simple patch to allow it.

Signed-off-by: Rivers Zhang <[email protected]>
@riverszhang89
Copy link
Contributor Author

This breaks UDF and consumer tests?

There's one case that I overlooked, eg 2025-01-01 UTC. Apparently the type code expects that a timezone has a leading space. But because we now additionally allow spaces in between the date portion and the rest of the datetime, we may not have a leading space when we read the timezone. for instance:

2025-01-01 UTC
          ^ old code would stop here, timezone would be "[space]UTC"
   vs
   
2025-01-01 UTC
           ^ new code stops here, timezone is "UTC"

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 2/625 tests failed ⚠.

The first 10 failing tests are:
cdb2_close
sc_downgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants