Skip to content

[datafusion-spark] next_day trims whitespace from dayOfWeek; Spark does not #22717

@andygrove

Description

@andygrove

Describe the bug

datafusion-spark's SparkNextDay trims the dayOfWeek argument before matching it:

https://github.com/apache/datafusion/blob/main/datafusion/spark/src/function/datetime/next_day.rs (in spark_next_day):

let day_of_week = day_of_week.trim().to_uppercase();

Spark's DateTimeUtils.getDayOfWeekFromString does not trim — it only uppercases and then matches character-for-character:

def getDayOfWeekFromString(string: UTF8String): Int = {
  val dowString = string.toString.toUpperCase(Locale.ROOT)
  dowString match {
    case "SU" | "SUN" | "SUNDAY" => SUNDAY
    case "MO" | "MON" | "MONDAY" => MONDAY
    ...
    case _ => throw new SparkIllegalArgumentException(errorClass = "ILLEGAL_DAY_OF_WEEK", ...)
  }
}

So a padded value such as ' MO ' is not a valid day name in Spark, but the trim makes datafusion-spark accept it. This is a correctness divergence that is independent of spark.sql.ansi.enabled.

To Reproduce

SELECT next_day(DATE '2024-01-01', ' MO ');
  • Spark: NULL (non-ANSI) / throws ILLEGAL_DAY_OF_WEEK (ANSI), because ' MO ' is not a recognised day name.
  • datafusion-spark: 2024-01-08 (the trim makes it match MONDAY).

Expected behavior

Match Spark: do not trim the dayOfWeek argument. Only an exact (case-insensitive) match of SU/SUN/SUNDAYSA/SAT/SATURDAY should be accepted; anything else is invalid.

Additional context

Related: the same function carries several // TODO: if spark.sql.ansi.enabled is false, returns NULL instead of an error for a malformed dayOfWeek comments. datafusion-spark has no access to Spark's spark.sql.ansi.enabled session config, so the throw-vs-NULL choice has to be made by the embedding engine; that part is a known gap rather than a bug. The whitespace trim above, however, is wrong for every configuration.

Found while fixing the equivalent divergence downstream in Apache DataFusion Comet (apache/datafusion-comet#4450), where we replaced the call into datafusion-spark::SparkNextDay with a Comet-native implementation that does not trim.

Version: observed in datafusion-spark 53.1.0; the trim is still present on main.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions