fix: double type is not handled correctly in tsz compress on windows#34847
fix: double type is not handled correctly in tsz compress on windows#34847guanshengliang merged 2 commits intomainfrom
Conversation
test_compress_basic.py test_compress_alter_table.py
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses and resolves compression failures observed on Windows by standardizing integer types and size representations within the TSZ compression library. The changes primarily involve migrating from potentially ambiguous Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request primarily updates integer types from long and unsigned long to int64_t, uint64_t, and size_t across several C header and source files to improve type consistency and portability. This includes changes in ByteToolkit.h/c, CompressElement.h, Huffman.h/c, dataCompression.h/c, sz.h, utility.h/c, sz_double.c, szd_double.c, and szd_float.c. Additionally, a specific bug in TightDataPointStorageD.c was addressed by correcting a size calculation from 4 to 8, which a reviewer noted could be further improved by replacing magic numbers with sizeof or named constants for better readability, maintainability, and to prevent similar bugs.
There was a problem hiding this comment.
Pull request overview
This PR adjusts the bundled SZ/TSZ compression code to behave consistently on Windows by replacing platform-dependent integer types (long, unsigned long) with fixed-width / pointer-width types (int64_t, uint64_t, size_t) in several serialization and compression/decompression paths.
Changes:
- Switched lossless compress/decompress APIs to
size_tfor sizes and return values. - Replaced
long/unsigned longwithint64_t/uint64_tin Huffman coding and byte-tooling utilities to avoid Windowslongsize differences. - Fixed a size accounting constant in
TightDataPointStorageDresidual size calculation.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| contrib/TSZ/sz/src/utility.c | Updates lossless compress/decompress function signatures to size_t. |
| contrib/TSZ/sz/inc/utility.h | Header updates matching utility.c signature changes. |
| contrib/TSZ/sz/src/szd_float.c | Updates sz_lossless_decompress call sites to new size_t signature. |
| contrib/TSZ/sz/src/szd_double.c | Updates sz_lossless_decompress call sites to new size_t signature. |
| contrib/TSZ/sz/src/sz_double.c | Uses size_t when computing radiusIndex. |
| contrib/TSZ/sz/src/dataCompression.c | Moves from long to int64_t for double bit-pattern serialization and updates getRealPrecision_int signature. |
| contrib/TSZ/sz/inc/dataCompression.h | Header updates matching dataCompression.c signature changes. |
| contrib/TSZ/sz/src/TightDataPointStorageD.c | Adjusts residual payload size computation to match written layout. |
| contrib/TSZ/sz/src/Huffman.c | Migrates Huffman codeword storage to uint64_t-based buffers. |
| contrib/TSZ/sz/inc/Huffman.h | Updates Huffman public types to uint64_t and adds <stdint.h>. |
| contrib/TSZ/sz/src/ByteToolkit.c | Changes long<->bytes helpers to use int64_t and updates related bitmask constants. |
| contrib/TSZ/sz/inc/ByteToolkit.h | Header updates matching ByteToolkit.c signature changes and adds <stdint.h>. |
| contrib/TSZ/sz/inc/sz.h | Uses fixed-width 64-bit types in SZ unions to avoid Windows long differences. |
| contrib/TSZ/sz/inc/CompressElement.h | Changes DoubleValueCompressElement.curValue to int64_t. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Description
for tsz compression algorithm, the double data type is not handled correctly on windows and result in two failed cases:
this PR fixes the issue.
Issue(s)
Checklist
Please check the items in the checklist if applicable.