Skip to content

Fixed iceberg alter table modify rename#1841

Open
subkanthi wants to merge 3 commits into
antalya-26.3from
antalya_26_3_fix_alter_table_iceberg_new
Open

Fixed iceberg alter table modify rename#1841
subkanthi wants to merge 3 commits into
antalya-26.3from
antalya_26_3_fix_alter_table_iceberg_new

Conversation

@subkanthi
Copy link
Copy Markdown
Collaborator

@subkanthi subkanthi commented May 27, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Backport of #1594, Fixes Alter table ADD/DROP/RENAME/MODIFY column for Iceberg tables.

Documentation entry for user-facing changes

Fixes Alter table ADD/DROP/RENAME/MODIFY column for Iceberg tables.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

Workflow [PR], commit [067b3f7]

@subkanthi
Copy link
Copy Markdown
Collaborator Author

continuation of #1794

@subkanthi
Copy link
Copy Markdown
Collaborator Author

Audit Review: PR #1841 — Fixed iceberg alter table modify rename

AI audit note: This review comment was generated by AI (claude-4.6-opus).

PR: #1841 — Iceberg ALTER TABLE schema-change via REST catalog


Call Graph

StorageObjectStorage::alter
  → configuration->update(object_storage, context)
  → params.apply(new_metadata, context)
  → configuration->alter(params, context, storageID, catalog)
     → DataLakeConfiguration::alter
        → assertInitializedDL()
        → IcebergMetadata::alter(params, shared_from_this(), context, storage_id, catalog)
           → Iceberg::alter(params, context, object_storage, data_lake_settings, ...)
              → MetadataGenerator::generate{Add,Drop,Modify,Rename}ColumnMetadata
              → writeMetadataFileAndVersionHint
              → catalog->updateMetadata(ns, table, path, metadata_json)
                 → buildUpdateMetadataRequestBody(ns, table, new_snapshot)
                 → sendRequest(endpoint, request_body)
  → database->alterTable (skipped for datalake catalogs)
  → setInMemoryMetadata(new_metadata)

Key Invariants

  1. Schema ID in the REST request must match the old schema ID that the catalog currently holds (assert-current-schema-id requirement).
  2. On catalog commit failure, orphan metadata files in object storage must be cleaned up.
  3. data_lake_settings must not be mutated unsafely across threads.
  4. The retry loop must converge and must not adopt stale/wrong metadata.
  5. checkAlterIsPossible must be called before alter to validate commands.

Confirmed Defects

Medium: old_schema_id = new_schema_id - 1 assumption is fragile and can send wrong REST requirement

  • Impact: If an Iceberg table has non-contiguous schema IDs (e.g. schema-id jumps from 0 to 5 due to external schema evolutions), the assert-current-schema-id requirement sent to the REST catalog will be wrong. The REST catalog will reject the commit with a requirement-violation error, making ALTER TABLE permanently fail for such tables even though the operation is valid.
  • Anchor: src/Databases/DataLake/RestCatalog.cpp / buildUpdateMetadataRequestBodyconst Int32 old_schema_id = new_schema_id - 1;
  • Trigger: Any Iceberg table whose current-schema-id in the metadata JSON was not produced by this code's MetadataGenerator (e.g. table created/evolved by Spark with ID gaps).
  • Why defect: The MetadataGenerator always increments current-schema-id by 1, so new_schema_id - 1 happens to equal the current value in the metadata. However, buildUpdateMetadataRequestBody receives the entire metadata JSON blob and could read the actual old schema ID directly from the schemas array or from the metadata's current-schema-id before mutation. Deriving it arithmetically is brittle and relies on an undocumented invariant of MetadataGenerator. If MetadataGenerator ever changes its numbering (or metadata is loaded from a table with gaps), this breaks silently.
  • Fix direction: Pass the original current-schema-id value (before increment) as an explicit parameter, or read it from the metadata blob's pre-increment state rather than computing new - 1.
  • Regression test direction: Unit test with current-schema-id=5 and new_schema_id=6; verify requirement says current-schema-id=5, not computed from 6 - 1 (currently happens to pass but test should lock the contract).

Medium: const_cast on data_lake_settings creates mutable alias to const reference — unsynchronized cross-thread mutation risk

  • Impact: IcebergMetadata stores DataLakeStorageSettings & obtained via const_cast from configuration_->getDataLakeSettings() (which returns const &). The alter function then mutates data_lake_settings[iceberg_metadata_file_path] through this reference. Since DataLakeStorageSettings is also read by the background metadata prefetch task and by concurrent SELECT queries (both reading iceberg_metadata_file_path), this is an unsynchronized write to shared state. In ASan/TSan builds this is a data race (UB); in release builds it could produce torn reads.
  • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp line 267: const_cast<DataLakeStorageSettings &>(configuration_->getDataLakeSettings()) and src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp lines 830-831: data_lake_settings[...] = metadata_name;
  • Trigger: ALTER TABLE ... DROP/ADD/MODIFY COLUMN concurrent with any read query or background prefetch that accesses iceberg_metadata_file_path.
  • Why defect: const_cast to strip const from a reference returned by an API that intends it to be const is UB if the original object was declared const. Even if it is not declared const, the mutation is unsynchronized.
  • Fix direction: Either pass the new metadata path back via a return value / output parameter, or use a proper synchronized setter on the configuration object.
  • Regression test direction: TSan integration test running ALTER TABLE concurrently with SELECT queries.

Medium: checkAlterIsPossible silently skips validation when current_metadata is null

  • Impact: Before this PR, checkAlterIsPossible called assertInitializedDL() which would throw LOGICAL_ERROR if metadata was not initialized. Now it silently returns without validating the alter commands. This means an ALTER on an uninitialized table (e.g. due to a metadata load failure) will pass validation and proceed to alter which will throw a different, less clear error from assertInitializedDL() inside alter.
  • Anchor: src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h lines 177-181.
  • Trigger: ALTER TABLE on a datalake table where metadata failed to load (e.g. transient S3 error during table open).
  • Why defect: The old code failed fast with a clear error. The new code allows the operation to proceed further before failing, potentially with a confusing error or partial side effects (e.g. params.apply(new_metadata, context) in StorageObjectStorage::alter runs before configuration->alter is called).
  • Fix direction: Keep the assertInitializedDL() check, or at minimum throw explicitly when current_metadata is null rather than silently skipping.
  • Regression test direction: Integration test that drops metadata cache, then tries ALTER TABLE and expects a clear error message.

Low: Catalog commit failure with !wrote_ok && file_exists silently adopts a file that might be from a different/stale operation

  • Impact: When writeMetadataFileAndVersionHint fails but the file already exists in object storage (written by a concurrent operation or a previous failed attempt), the code proceeds to call catalog->updateMetadata. If the catalog update also fails, it calls continue (retry). If it succeeds, it adopts the file. The adopted file may have been written by a different schema evolution, or contain stale metadata from a previous failed attempt that was not cleaned up.
  • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp lines 790-828.
  • Trigger: Two concurrent ALTER TABLE operations on the same Iceberg table, both targeting the same metadata version. One writes the file, the other's write fails, but it finds the file and adopts it.
  • Why defect: The content of the existing file is not verified. The code assumes that if a file with the expected name exists, it has the right content. This is plausible for version-numbered metadata files but is not guaranteed, especially if object storage has eventual consistency or if the file was written with different schema changes.
  • Fix direction: After finding an existing file, read it back and verify the current-schema-id and schema content match what this operation expects before adopting it.
  • Regression test direction: Test with two concurrent schema changes targeting the same version; verify only one succeeds.

Low: removeObjectIfExists in orphan cleanup path swallows all exceptions silently

  • Impact: If the metadata file cleanup fails (e.g. permission error, network timeout), the only evidence is a warning log. The orphan metadata file remains in object storage. While the code then throws DATALAKE_DATABASE_ERROR, the orphan file is never cleaned up and accumulates over time.
  • Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp lines 811-818.
  • Trigger: Object storage permission error or transient failure during cleanup.
  • Why defect: This is an acceptable trade-off (the alternative is losing the catalog-failure error), but the swallowed exception means orphan files silently accumulate. The test test_alter_orphan_metadata_cleanup_on_catalog_failure only tests the happy path of successful cleanup.
  • Fix direction: Consider adding a background cleanup mechanism for orphan metadata files, or document this as a known limitation.
  • Regression test direction: Test where both catalog commit and file cleanup fail; verify the error message still mentions catalog failure.

Coverage Summary

  • Scope reviewed: All 16 changed files (776 insertions, 87 deletions). Full call graph from StorageObjectStorage::alter through Iceberg::alter to RestCatalog::updateMetadata and buildUpdateMetadataRequestBody.
  • Categories passed: Input validation/error paths, REST request body construction (both schema and snapshot paths), retry loop convergence (bounded by MAX_TRANSACTION_RETRIES=100), memory lifetime (Poco JSON ref-counted Ptr), exception safety/partial rollback (catalog failure rolls back metadata file).
  • Categories failed: Schema ID derivation correctness (Medium), thread safety via const_cast (Medium), checkAlterIsPossible validation bypass (Medium), unverified file adoption (Low), orphan cleanup resilience (Low).
  • Not applicable: Iterator invalidation, integer overflow (Int32 schema IDs incremented by 1), RAII violations.
  • Assumptions/limits: Static analysis only. The old_schema_id = new - 1 computation is currently correct in practice because MetadataGenerator always increments by 1; the defect is about brittleness and missing contract enforcement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants