Skip to content

fix(e6): escape lone apostrophes in literals under FIX_QUOTE_ESCAPES#256

Merged
tkaunlaky-e6 merged 3 commits into
e6data:mainfrom
tkaunlaky-e6:fix/e6-quote-escaping-kantar
May 11, 2026
Merged

fix(e6): escape lone apostrophes in literals under FIX_QUOTE_ESCAPES#256
tkaunlaky-e6 merged 3 commits into
e6data:mainfrom
tkaunlaky-e6:fix/e6-quote-escaping-kantar

Conversation

@tkaunlaky-e6
Copy link
Copy Markdown

Summary

Kantar's INSERT queries containing Databricks double-quoted strings with an apostrophe inside — e.g. "SCARLET'S WALK (2023 REMASTER)" — were emitting invalid E6 with the apostrophe unescaped. The bare ' terminated the output string literal early.

The E6 generator's escape_str override at sqlglot/dialects/e6.py unconditionally skipped the single-quote escape pass under FIX_QUOTE_ESCAPES=true. That is correct only for Variant C literals (parser-merged adjacent string tokens with '' embedded in Literal.this). For literals whose text contains a lone ' (Databricks "..." strings, or \' escapes the tokenizer consumed), the override produced raw apostrophes in output.

Fix

Gate the override on whether '' is actually present in the literal text. When '' is not in the text, defer to the base Generator.escape_str, which correctly replaces ' with \' since E6's STRING_ESCAPES = ['\\']. When '' is present, retain the existing behavior — '' passes through unchanged, no \' introduced.

if os.getenv("FIX_QUOTE_ESCAPES", "False").lower() == "true":
    if "''" not in text:
        return super().escape_str(text, escape_backslash)
    # ... existing override body

Behavior

Input Literal.this Path Output
"SCARLET'S WALK" (Kantar) SCARLET'S WALK base 'SCARLET\'S WALK'
'ROCKIN'' AROUND' ROCKIN'' AROUND override (unchanged) 'ROCKIN'' AROUND'
'Côte d''''Azur' Côte d''''Azur override (unchanged) 'Côte d''''Azur'

Test plan

  • Existing test_fix_quote_escapes continues to pass (4 cases — all literals contain '', take the unchanged path).
  • Two new cases added: Databricks double-quoted string with apostrophe inside, and the full Kantar INSERT shape.
  • Full E6 dialect test suite: 50/50 pass.
  • Verified end-to-end against the running converter API with FIX_QUOTE_ESCAPES=true on Kantar's exact query.

…s on

Under FIX_QUOTE_ESCAPES, the E6 generator's escape_str override was
unconditionally skipping the single-quote escape pass. That is correct
for Variant C literals (parser-merged adjacent strings with '' embedded
in Literal.this), but it breaks literals whose text contains a lone
apostrophe — e.g. Databricks double-quoted strings like
"SCARLET'S WALK (2023 REMASTER)", which the tokenizer extracts as a
single-quoted literal value with one ' in the text. With the override
skipping the escape, the apostrophe goes out raw and terminates the
output string early, producing invalid E6.

Gate the override on the presence of '' in the literal: when '' is not
in the text, defer to the base Generator.escape_str (which correctly
replaces ' with \') and only apply the existing skip-escape behavior
when '' is actually present (the case the override was designed for).
@tkaunlaky-e6 tkaunlaky-e6 merged commit 7e3719c into e6data:main May 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants