Description
sqlglot.optimizer.annotate_types.annotate_types(expression, schema=..., dialect=...)
appears to accept a call-site dialect for type-annotation dispatch, but the
kwarg is silently dropped when schema is a Schema instance whose own
.dialect is set. The EXPRESSION_METADATA actually used comes from
schema.dialect, not the call-site dialect.
This makes it possible to write seemingly correct cross-dialect annotation
code that silently dispatches through one dialect's typing module for all
calls.
Reproduction (sqlglot main, commit 9f169ab)
from sqlglot import exp, parse_one
from sqlglot.optimizer.annotate_types import annotate_types
from sqlglot.optimizer.qualify import qualify
from sqlglot.schema import MappingSchema
# Schema built once, with hive
schema = MappingSchema({"t": {"e": "TIMESTAMP"}}, dialect="hive")
# Same SQL, same schema, but call-site dialect varies
sql = "SELECT date_add(e, 24) AS r FROM t"
for d in ["hive", "spark", "databricks"]:
ast = qualify(parse_one(sql, read=d), schema=schema, dialect=d)
annotated = annotate_types(ast, schema=schema, dialect=d)
print(f"{d:10s} -> {annotated.selects[0].this.type}")
# Output (sqlglot main):
# hive -> UNKNOWN
# spark -> UNKNOWN <-- Spark.EXPRESSION_METADATA is NOT consulted
# databricks -> UNKNOWN <-- Databricks.EXPRESSION_METADATA is NOT consulted
#
# Expected (or at least: what the signature suggests):
# hive -> UNKNOWN
# spark -> <whatever Spark typing says for TsOrDsAdd>
# databricks -> <whatever Databricks typing says for TsOrDsAdd>
If the schema is rebuilt per-iteration with the matching dialect, the
expected per-dialect dispatch occurs. So the workaround is "build the
schema with the dialect you intend to annotate against."
Root cause
TypeAnnotator.__init__ (sqlglot/optimizer/annotate_types.py:202-205):
self.schema = schema
dialect = schema.dialect or Dialect()
self.dialect = dialect
self.expression_metadata = expression_metadata or dialect.EXPRESSION_METADATA
The schema's dialect wins; the dialect kwarg passed into annotate_types
is forwarded to ensure_schema(schema, dialect=...) and used only when
constructing a schema from raw input. Once a Schema instance exists, its
.dialect is the only source consulted for typing dispatch.
Suggested resolutions (in order of conservatism)
-
Docstring note on annotate_types stating that schema.dialect
takes precedence over the dialect kwarg for typing dispatch when a
Schema instance is passed. Cheapest, no behavior change.
-
Prefer the call-site dialect in TypeAnnotator.__init__ when one
is forwarded, falling back to schema.dialect. Behavior change, but
matches what the public signature implies.
-
Plumb dialect through to TypeAnnotator so the precedence is
explicit at the constructor level, not implicit via schema.dialect.
I lean toward (1) as the smallest correct change — schema.dialect
winning is internally consistent (column types in a schema are
dialect-flavored), so the surprise is mostly a documentation gap.
Encountered while writing tests for #7588 (the Databricks
date_add/dateadd disambiguation patch).
Description
sqlglot.optimizer.annotate_types.annotate_types(expression, schema=..., dialect=...)appears to accept a call-site
dialectfor type-annotation dispatch, but thekwarg is silently dropped when
schemais aSchemainstance whose own.dialectis set. TheEXPRESSION_METADATAactually used comes fromschema.dialect, not the call-sitedialect.This makes it possible to write seemingly correct cross-dialect annotation
code that silently dispatches through one dialect's typing module for all
calls.
Reproduction (sqlglot main, commit 9f169ab)
If the schema is rebuilt per-iteration with the matching dialect, the
expected per-dialect dispatch occurs. So the workaround is "build the
schema with the dialect you intend to annotate against."
Root cause
TypeAnnotator.__init__(sqlglot/optimizer/annotate_types.py:202-205):The schema's dialect wins; the
dialectkwarg passed intoannotate_typesis forwarded to
ensure_schema(schema, dialect=...)and used only whenconstructing a schema from raw input. Once a
Schemainstance exists, its.dialectis the only source consulted for typing dispatch.Suggested resolutions (in order of conservatism)
Docstring note on
annotate_typesstating thatschema.dialecttakes precedence over the
dialectkwarg for typing dispatch when aSchemainstance is passed. Cheapest, no behavior change.Prefer the call-site
dialectinTypeAnnotator.__init__when oneis forwarded, falling back to
schema.dialect. Behavior change, butmatches what the public signature implies.
Plumb
dialectthrough toTypeAnnotatorso the precedence isexplicit at the constructor level, not implicit via
schema.dialect.I lean toward (1) as the smallest correct change —
schema.dialectwinning is internally consistent (column types in a schema are
dialect-flavored), so the surprise is mostly a documentation gap.
Encountered while writing tests for #7588 (the Databricks
date_add/dateadddisambiguation patch).