Parquet: omit min/max for interval columns when writing stats#5147
Merged
Conversation
Contributor
|
What ColumnOrder are we currently writing for these columns? |
Contributor
Author
I'm not sure, actually. I tried running this test in arrow_writer/mod.rs on master branch: #[test]
fn test_123() {
let a = Int32Array::from(vec![1, 2, 3, 4, 5]);
let b = IntervalDayTimeArray::from(vec![0; 5]);
let batch = RecordBatch::try_from_iter(vec![
("a", Arc::new(a) as ArrayRef),
("b", Arc::new(b) as ArrayRef),
])
.unwrap();
let mut buf = Vec::with_capacity(1024);
let mut writer = ArrowWriter::try_new(&mut buf, batch.schema(), None).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
let bytes = Bytes::from(buf);
let options = ReadOptionsBuilder::new().with_page_index().build();
let reader = SerializedFileReader::new_with_options(bytes, options).unwrap();
dbg!(reader.metadata().file_metadata().column_orders());
}Running: arrow-rs$ cargo test -p parquet --lib arrow::arrow_writer::tests::test_123 -- --nocapture --exact
Blocking waiting for file lock on build directory
Compiling parquet v49.0.0 (/home/jeffrey/Code/arrow-rs/parquet)
Finished test [unoptimized + debuginfo] target(s) in 11.49s
Running unittests src/lib.rs (/media/jeffrey/1tb_860evo_ssd/.cargo_target_cache/debug/deps/parquet-a4f7a499e85a325c)
running 1 test
[parquet/src/arrow/arrow_writer/mod.rs:2760] reader.metadata().file_metadata().column_orders() = None
test arrow::arrow_writer::tests::test_123 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 667 filtered out; finished in 0.00sEven when I change it to only write the Int32Array, it is still none. Not sure if I'm doing something wrong here? |
Contributor
Author
|
I noticed this: arrow-rs/parquet/src/file/writer.rs Lines 326 to 336 in 6d4b8bb
Looks like might be a separate issue, to implement writing ColumnOrder |
Contributor
Author
|
Raised #5152 for the column order issue |
Closed
This was referenced Jan 5, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #5145
Rationale for this change
What changes are included in this PR?
Add extra checks before calculating min/max for chunks/pages, to ignore Interval columns
Are there any user-facing changes?