Describe the bug
datafusion-spark's SparkNextDay trims the dayOfWeek argument before matching it:
https://github.com/apache/datafusion/blob/main/datafusion/spark/src/function/datetime/next_day.rs (in spark_next_day):
let day_of_week = day_of_week.trim().to_uppercase();
Spark's DateTimeUtils.getDayOfWeekFromString does not trim — it only uppercases and then matches character-for-character:
def getDayOfWeekFromString(string: UTF8String): Int = {
val dowString = string.toString.toUpperCase(Locale.ROOT)
dowString match {
case "SU" | "SUN" | "SUNDAY" => SUNDAY
case "MO" | "MON" | "MONDAY" => MONDAY
...
case _ => throw new SparkIllegalArgumentException(errorClass = "ILLEGAL_DAY_OF_WEEK", ...)
}
}
So a padded value such as ' MO ' is not a valid day name in Spark, but the trim makes datafusion-spark accept it. This is a correctness divergence that is independent of spark.sql.ansi.enabled.
To Reproduce
SELECT next_day(DATE '2024-01-01', ' MO ');
- Spark:
NULL (non-ANSI) / throws ILLEGAL_DAY_OF_WEEK (ANSI), because ' MO ' is not a recognised day name.
datafusion-spark: 2024-01-08 (the trim makes it match MONDAY).
Expected behavior
Match Spark: do not trim the dayOfWeek argument. Only an exact (case-insensitive) match of SU/SUN/SUNDAY … SA/SAT/SATURDAY should be accepted; anything else is invalid.
Additional context
Related: the same function carries several // TODO: if spark.sql.ansi.enabled is false, returns NULL instead of an error for a malformed dayOfWeek comments. datafusion-spark has no access to Spark's spark.sql.ansi.enabled session config, so the throw-vs-NULL choice has to be made by the embedding engine; that part is a known gap rather than a bug. The whitespace trim above, however, is wrong for every configuration.
Found while fixing the equivalent divergence downstream in Apache DataFusion Comet (apache/datafusion-comet#4450), where we replaced the call into datafusion-spark::SparkNextDay with a Comet-native implementation that does not trim.
Version: observed in datafusion-spark 53.1.0; the trim is still present on main.
Describe the bug
datafusion-spark'sSparkNextDaytrims thedayOfWeekargument before matching it:https://github.com/apache/datafusion/blob/main/datafusion/spark/src/function/datetime/next_day.rs (in
spark_next_day):Spark's
DateTimeUtils.getDayOfWeekFromStringdoes not trim — it only uppercases and then matches character-for-character:So a padded value such as
' MO 'is not a valid day name in Spark, but the trim makesdatafusion-sparkaccept it. This is a correctness divergence that is independent ofspark.sql.ansi.enabled.To Reproduce
NULL(non-ANSI) / throwsILLEGAL_DAY_OF_WEEK(ANSI), because' MO 'is not a recognised day name.datafusion-spark:2024-01-08(the trim makes it matchMONDAY).Expected behavior
Match Spark: do not trim the
dayOfWeekargument. Only an exact (case-insensitive) match ofSU/SUN/SUNDAY…SA/SAT/SATURDAYshould be accepted; anything else is invalid.Additional context
Related: the same function carries several
// TODO: if spark.sql.ansi.enabled is false, returns NULL instead of an error for a malformed dayOfWeekcomments.datafusion-sparkhas no access to Spark'sspark.sql.ansi.enabledsession config, so the throw-vs-NULL choice has to be made by the embedding engine; that part is a known gap rather than a bug. The whitespace trim above, however, is wrong for every configuration.Found while fixing the equivalent divergence downstream in Apache DataFusion Comet (apache/datafusion-comet#4450), where we replaced the call into
datafusion-spark::SparkNextDaywith a Comet-native implementation that does not trim.Version: observed in
datafusion-spark53.1.0; the trim is still present onmain.