Skip to content

fix(e6): preserve DBR DATE_ADD arity under E6_EXECUTOR_TYPE=native#255

Merged
ArnavBorkarE6x merged 1 commit into
e6data:mainfrom
tkaunlaky-e6:fix/e6-date-add-2arg-from-dbr
May 8, 2026
Merged

fix(e6): preserve DBR DATE_ADD arity under E6_EXECUTOR_TYPE=native#255
ArnavBorkarE6x merged 1 commit into
e6data:mainfrom
tkaunlaky-e6:fix/e6-date-add-2arg-from-dbr

Conversation

@tkaunlaky-e6
Copy link
Copy Markdown

Summary

  • Databricks 2-arg DATE_ADD(date, n) returns DATE; 3-arg DATE_ADD(unit, n, ts) returns TIMESTAMP. The transpiler previously collapsed both to e6 3-arg DATE_ADD('DAY', n, ts), switching the return type and breaking CONCAT(DATE_ADD(ts, 0), SUBSTR(ts, 11)) patterns by producing a doubled time portion (Samsung zero-row issue).
  • New behavior is gated behind E6_EXECUTOR_TYPE=native (the same flag used by PR fix(e6): gate TO_UNIX_TIMESTAMP /1000 on executor type #253). Java executor default is unchanged.
    • E6_EXECUTOR_TYPE=native: 2-arg DBR -> e6 2-arg DATE_ADD(ts, n); 3-arg DBR -> e6 3-arg DATE_ADD(unit, n, ts).
    • default (java): 2-arg and 3-arg DBR both emit e6 3-arg with DAY filled in (prior behavior preserved).
  • Non-DAY units (e.g. DATE_ADD('YEAR', 5, d)) always stay 3-arg in both modes.

Mechanism

  • sqlglot/dialects/databricks.py: pass default_unit=None to build_date_delta for DATE_ADD/DATEADD. Now 2-arg DBR parses with unit=None, 3-arg with an explicit unit. Matches the pattern at clickhouse.py:310. No DBR output change since date_delta_sql falls back to DAY via unit_to_var.
  • sqlglot/dialects/e6.py: new auto-discovered dateadd_sql method reads E6_EXECUTOR_TYPE and emits 2-arg only when native AND unit is None. Replaces the previous exp.DateAdd: lambda TRANSFORMS entry. Read parser also handles 2-arg DATE_ADD(date, n) for round-trips.

Context

The Samsung repro pattern:

CONCAT(
  DATE_ADD(CAST(... AS TIMESTAMP), CAST(7 * 0 AS INT)),
  SUBSTR(<ts>, 11)
)

With prior 3-arg conversion, DATE_ADD returned TIMESTAMP -> CONCAT(<full ts>, ' 00:00:00') -> doubled time portion -> invalid timestamp -> NULL after outer CAST, predicate becomes unknown, rows dropped.

Test plan

  • New test_date_add_native_executor covering both modes (2-arg/3-arg input, 2-arg/3-arg expected output).
  • Existing test_dialect_specific_functions extended with the java-default case (DATE_ADD(ts, 2) -> DATE_ADD('DAY', 2, ts)).
  • python -m unittest discover tests/dialects -> 545/545 pass (e6, databricks, spark, hive, etc.) -- no regressions.

DBR 2-arg DATE_ADD(date, n) returns DATE; 3-arg DATE_ADD(unit, n, ts)
returns TIMESTAMP. The transpiler previously collapsed both to e6 3-arg,
which broke patterns like CONCAT(DATE_ADD(ts, 0), SUBSTR(ts, 11)) by
producing a doubled time portion (Samsung zero-row issue).

Changes:
- databricks parser: default_unit=None so 2-arg parses with unit=None
  while 3-arg keeps the explicit unit. Mirrors clickhouse.py pattern.
  No DBR output change since date_delta_sql falls back to DAY.
- e6 generator: new dateadd_sql checks E6_EXECUTOR_TYPE.
  - native: emit 2-arg when unit is None, 3-arg otherwise.
  - java (default): always 3-arg with DAY filled in (unchanged).
- e6 read parser handles both arities for round-trips.
@ArnavBorkarE6x ArnavBorkarE6x merged commit 0c6cfcb into e6data:main May 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants