Antalya 26.3: Fix empty partition_key and sorting_key in system.tables for Iceberg tables without data snapshots#1819
Open
il9ue wants to merge 1 commit into
Conversation
system.tables for Iceberg tables without data snapshots
Changelog category: Bug Fix
Changelog entry: Fixed `system.tables.partition_key` and
`system.tables.sorting_key` returning empty strings for
Iceberg tables that have no data snapshot, including all
empty tables and (more frequently) tables accessed via the
Glue catalog. The snapshot-existence gate in
IcebergMetadata::partitionKey() / sortingKey() was
semantically wrong: partition spec and sort order are
table-level properties recorded at the top level of the
Iceberg metadata file (`default-spec-id`,
`default-sort-order-id`) and exist independently of
whether any data snapshot has been written. Also adds a
defensive guard in getSortingKeyDescriptionFromMetadata
against Iceberg V1 metadata files missing `sort-orders`,
which becomes reachable for empty tables after this fix.
Closes ClickHouse#1235.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1235.
Summary
SELECT partition_key, sorting_key FROM system.tablesreturned empty strings for Iceberg tables that had no data snapshot. This was reliably observable for tables accessed via the Glue catalog (since Glue'smetadata_locationmore frequently points at a snapshot-free metadata file), but also reproduced for any empty Iceberg table regardless of catalog (REST, Glue, or directIcebergS3).Root cause
IcebergMetadata::partitionKey()andIcebergMetadata::sortingKey()(introduced in #959, refined in #1026, ported to 25.8 in #1095) gated their work on the existence of a data snapshot:This is semantically wrong. Partition spec and sort order are table-level properties recorded at the top level of the Iceberg metadata file (
default-spec-id,default-sort-order-id,partition-specs,sort-orders) and exist independently of whether any data snapshot has been written. Code inspection ofgetState()confirms thatactual_table_state_snapshotis fully populated (schema_id,metadata_file_path,metadata_version) regardless of whether a snapshot exists; onlyactual_table_state_snapshot.snapshot_idisstd::nullopt, and that field is never read bygetPartitionKey()orgetSortingKey().The gate was therefore dead-gating valid data. The fix removes it.
Change list
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpppartitionKey(): removed theif (!actual_data_snapshot)early return.sortingKey(): removed theif (!actual_data_snapshot)early return.src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cppgetSortingKeyDescriptionFromMetadata(): added a defensivehas()guard forsort-ordersanddefault-sort-order-id. This is a pre-existing null-deref that was previously unreachable in practice (always behind the snapshot gate); after removing the gate, empty Iceberg V1 tables withoutsort-orderswould have hit it. The guard mirrors the shape already present ingetSortingKeyDisplayStringFromMetadata.No header changes. No
StorageSystemTables.cppchanges — the existing null/exception guards added for #1210 (Glue segfault) remain untouched.Behavior preservation
NULLS FIRST/NULLS LAST).getPartitionKeyStringFromMetadataalready guards on missingpartition-specs).IDataLakeMetadataare unchanged.StorageSystemTables.cppremain in place.Out of scope
Glue's
metadata_locationpointer can lag schema-evolution events, which could causepartition_key/sorting_keyto reflect a stale spec. This is orthogonal to the snapshot gate and is not addressed by this PR.Test plan
New regression test reproduces Root Cause A without needing any catalog mock: creates an Iceberg table with a non-trivial partition spec and sort order, asserts
system.tables.partition_keyandsystem.tables.sorting_keyare non-empty before any data is inserted.Existing
test_system_tables_partition_sorting_keysintests/integration/test_storage_iceberg_with_spark/test_system_iceberg_metadata.pycontinues to pass with byte-identical output.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fixed
system.tables.partition_keyandsystem.tables.sorting_keyreturning empty strings for Iceberg tables that have no data snapshot, including all empty tables and (more frequently) tables accessed via the Glue catalog. Also added a defensive guard against Iceberg V1 metadata files missingsort-orders.Documentation entry for user-facing changes
Not required — this is a bug fix to existing
system.tablescolumns; no new user-facing surface.CI/CD Options
Exclude tests:
Regression jobs to run: