Fixed iceberg alter table modify rename by subkanthi · Pull Request #1841 · Altinity/ClickHouse

subkanthi · 2026-05-27T02:32:31Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Backport of #1594, Fixes Alter table ADD/DROP/RENAME/MODIFY column for Iceberg tables.

Documentation entry for user-facing changes

Fixes Alter table ADD/DROP/RENAME/MODIFY column for Iceberg tables.

CI/CD Options

Exclude tests:

Regression jobs to run:

github-actions · 2026-05-27T02:59:26Z

Workflow [PR], commit [067b3f7]

subkanthi · 2026-05-27T03:01:19Z

continuation of #1794

subkanthi · 2026-05-27T03:05:28Z

Audit Review: PR #1841 — Fixed iceberg alter table modify rename

AI audit note: This review comment was generated by AI (claude-4.6-opus).

PR: #1841 — Iceberg ALTER TABLE schema-change via REST catalog

Call Graph

StorageObjectStorage::alter
  → configuration->update(object_storage, context)
  → params.apply(new_metadata, context)
  → configuration->alter(params, context, storageID, catalog)
     → DataLakeConfiguration::alter
        → assertInitializedDL()
        → IcebergMetadata::alter(params, shared_from_this(), context, storage_id, catalog)
           → Iceberg::alter(params, context, object_storage, data_lake_settings, ...)
              → MetadataGenerator::generate{Add,Drop,Modify,Rename}ColumnMetadata
              → writeMetadataFileAndVersionHint
              → catalog->updateMetadata(ns, table, path, metadata_json)
                 → buildUpdateMetadataRequestBody(ns, table, new_snapshot)
                 → sendRequest(endpoint, request_body)
  → database->alterTable (skipped for datalake catalogs)
  → setInMemoryMetadata(new_metadata)

Key Invariants

Schema ID in the REST request must match the old schema ID that the catalog currently holds (assert-current-schema-id requirement).
On catalog commit failure, orphan metadata files in object storage must be cleaned up.
data_lake_settings must not be mutated unsafely across threads.
The retry loop must converge and must not adopt stale/wrong metadata.
checkAlterIsPossible must be called before alter to validate commands.

Confirmed Defects

Medium: `old_schema_id = new_schema_id - 1` assumption is fragile and can send wrong REST requirement

Impact: If an Iceberg table has non-contiguous schema IDs (e.g. schema-id jumps from 0 to 5 due to external schema evolutions), the assert-current-schema-id requirement sent to the REST catalog will be wrong. The REST catalog will reject the commit with a requirement-violation error, making ALTER TABLE permanently fail for such tables even though the operation is valid.
Anchor: src/Databases/DataLake/RestCatalog.cpp / buildUpdateMetadataRequestBody — const Int32 old_schema_id = new_schema_id - 1;
Trigger: Any Iceberg table whose current-schema-id in the metadata JSON was not produced by this code's MetadataGenerator (e.g. table created/evolved by Spark with ID gaps).
Why defect: The MetadataGenerator always increments current-schema-id by 1, so new_schema_id - 1 happens to equal the current value in the metadata. However, buildUpdateMetadataRequestBody receives the entire metadata JSON blob and could read the actual old schema ID directly from the schemas array or from the metadata's current-schema-id before mutation. Deriving it arithmetically is brittle and relies on an undocumented invariant of MetadataGenerator. If MetadataGenerator ever changes its numbering (or metadata is loaded from a table with gaps), this breaks silently.
Fix direction: Pass the original current-schema-id value (before increment) as an explicit parameter, or read it from the metadata blob's pre-increment state rather than computing new - 1.
Regression test direction: Unit test with current-schema-id=5 and new_schema_id=6; verify requirement says current-schema-id=5, not computed from 6 - 1 (currently happens to pass but test should lock the contract).

Medium: `const_cast` on `data_lake_settings` creates mutable alias to const reference — unsynchronized cross-thread mutation risk

Impact: IcebergMetadata stores DataLakeStorageSettings & obtained via const_cast from configuration_->getDataLakeSettings() (which returns const &). The alter function then mutates data_lake_settings[iceberg_metadata_file_path] through this reference. Since DataLakeStorageSettings is also read by the background metadata prefetch task and by concurrent SELECT queries (both reading iceberg_metadata_file_path), this is an unsynchronized write to shared state. In ASan/TSan builds this is a data race (UB); in release builds it could produce torn reads.
Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp line 267: const_cast<DataLakeStorageSettings &>(configuration_->getDataLakeSettings()) and src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp lines 830-831: data_lake_settings[...] = metadata_name;
Trigger: ALTER TABLE ... DROP/ADD/MODIFY COLUMN concurrent with any read query or background prefetch that accesses iceberg_metadata_file_path.
Why defect: const_cast to strip const from a reference returned by an API that intends it to be const is UB if the original object was declared const. Even if it is not declared const, the mutation is unsynchronized.
Fix direction: Either pass the new metadata path back via a return value / output parameter, or use a proper synchronized setter on the configuration object.
Regression test direction: TSan integration test running ALTER TABLE concurrently with SELECT queries.

Medium: `checkAlterIsPossible` silently skips validation when `current_metadata` is null

Impact: Before this PR, checkAlterIsPossible called assertInitializedDL() which would throw LOGICAL_ERROR if metadata was not initialized. Now it silently returns without validating the alter commands. This means an ALTER on an uninitialized table (e.g. due to a metadata load failure) will pass validation and proceed to alter which will throw a different, less clear error from assertInitializedDL() inside alter.
Anchor: src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h lines 177-181.
Trigger: ALTER TABLE on a datalake table where metadata failed to load (e.g. transient S3 error during table open).
Why defect: The old code failed fast with a clear error. The new code allows the operation to proceed further before failing, potentially with a confusing error or partial side effects (e.g. params.apply(new_metadata, context) in StorageObjectStorage::alter runs before configuration->alter is called).
Fix direction: Keep the assertInitializedDL() check, or at minimum throw explicitly when current_metadata is null rather than silently skipping.
Regression test direction: Integration test that drops metadata cache, then tries ALTER TABLE and expects a clear error message.

Low: Catalog commit failure with `!wrote_ok && file_exists` silently adopts a file that might be from a different/stale operation

Impact: When writeMetadataFileAndVersionHint fails but the file already exists in object storage (written by a concurrent operation or a previous failed attempt), the code proceeds to call catalog->updateMetadata. If the catalog update also fails, it calls continue (retry). If it succeeds, it adopts the file. The adopted file may have been written by a different schema evolution, or contain stale metadata from a previous failed attempt that was not cleaned up.
Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp lines 790-828.
Trigger: Two concurrent ALTER TABLE operations on the same Iceberg table, both targeting the same metadata version. One writes the file, the other's write fails, but it finds the file and adopts it.
Why defect: The content of the existing file is not verified. The code assumes that if a file with the expected name exists, it has the right content. This is plausible for version-numbered metadata files but is not guaranteed, especially if object storage has eventual consistency or if the file was written with different schema changes.
Fix direction: After finding an existing file, read it back and verify the current-schema-id and schema content match what this operation expects before adopting it.
Regression test direction: Test with two concurrent schema changes targeting the same version; verify only one succeeds.

Low: `removeObjectIfExists` in orphan cleanup path swallows all exceptions silently

Impact: If the metadata file cleanup fails (e.g. permission error, network timeout), the only evidence is a warning log. The orphan metadata file remains in object storage. While the code then throws DATALAKE_DATABASE_ERROR, the orphan file is never cleaned up and accumulates over time.
Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp lines 811-818.
Trigger: Object storage permission error or transient failure during cleanup.
Why defect: This is an acceptable trade-off (the alternative is losing the catalog-failure error), but the swallowed exception means orphan files silently accumulate. The test test_alter_orphan_metadata_cleanup_on_catalog_failure only tests the happy path of successful cleanup.
Fix direction: Consider adding a background cleanup mechanism for orphan metadata files, or document this as a known limitation.
Regression test direction: Test where both catalog commit and file cleanup fail; verify the error message still mentions catalog failure.

Coverage Summary

Scope reviewed: All 16 changed files (776 insertions, 87 deletions). Full call graph from StorageObjectStorage::alter through Iceberg::alter to RestCatalog::updateMetadata and buildUpdateMetadataRequestBody.
Categories passed: Input validation/error paths, REST request body construction (both schema and snapshot paths), retry loop convergence (bounded by MAX_TRANSACTION_RETRIES=100), memory lifetime (Poco JSON ref-counted Ptr), exception safety/partial rollback (catalog failure rolls back metadata file).
Categories failed: Schema ID derivation correctness (Medium), thread safety via const_cast (Medium), checkAlterIsPossible validation bypass (Medium), unverified file adoption (Low), orphan cleanup resilience (Low).
Not applicable: Iterator invalidation, integer overflow (Int32 schema IDs incremented by 1), RAII violations.
Assumptions/limits: Static analysis only. The old_schema_id = new - 1 computation is currently correct in practice because MetadataGenerator always increments by 1; the defect is about brittleness and missing contract enforcement.

subkanthi added 2 commits May 27, 2026 04:28

Fixed iceberg alter table modify rename

e045fa2

Resolved merge conflicts

ae62718

subkanthi marked this pull request as ready for review May 27, 2026 03:00

subkanthi mentioned this pull request May 27, 2026

Antalya 26.3: Fixed alter table operations(add, modify, drop, rename) for iceberg #1794

Closed

27 tasks

subkanthi assigned arthurpassos May 27, 2026

Fixed iceberg test_alter_orphan_metadata_cleanup_on_catalog_failure test

067b3f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed iceberg alter table modify rename#1841

Fixed iceberg alter table modify rename#1841
subkanthi wants to merge 3 commits into
antalya-26.3from
antalya_26_3_fix_alter_table_iceberg_new

subkanthi commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

subkanthi commented May 27, 2026

Uh oh!

subkanthi commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

subkanthi commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

subkanthi commented May 27, 2026

Uh oh!

subkanthi commented May 27, 2026

Audit Review: PR #1841 — Fixed iceberg alter table modify rename

Call Graph

Key Invariants

Confirmed Defects

Medium: old_schema_id = new_schema_id - 1 assumption is fragile and can send wrong REST requirement

Medium: const_cast on data_lake_settings creates mutable alias to const reference — unsynchronized cross-thread mutation risk

Medium: checkAlterIsPossible silently skips validation when current_metadata is null

Low: Catalog commit failure with !wrote_ok && file_exists silently adopts a file that might be from a different/stale operation

Low: removeObjectIfExists in orphan cleanup path swallows all exceptions silently

Coverage Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

subkanthi commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

Medium: `old_schema_id = new_schema_id - 1` assumption is fragile and can send wrong REST requirement

Medium: `const_cast` on `data_lake_settings` creates mutable alias to const reference — unsynchronized cross-thread mutation risk

Medium: `checkAlterIsPossible` silently skips validation when `current_metadata` is null

Low: Catalog commit failure with `!wrote_ok && file_exists` silently adopts a file that might be from a different/stale operation

Low: `removeObjectIfExists` in orphan cleanup path swallows all exceptions silently