Skip to content

fix: avoid ListView take_reduce rebuild for dense selections#7339

Open
dimitarvdimitrov wants to merge 8 commits intodevelopfrom
mitko/avoid-list-view-take-reduce-rebuild
Open

fix: avoid ListView take_reduce rebuild for dense selections#7339
dimitarvdimitrov wants to merge 8 commits intodevelopfrom
mitko/avoid-list-view-take-reduce-rebuild

Conversation

@dimitarvdimitrov
Copy link
Copy Markdown
Contributor

The take_reduce path rebuilds the ListView, which is expensive.

For dense selections (density above the rebuild threshold) there's little garbage to reclaim, so skip the rebuild and let the regular take path handle it. The rebuild is still worthwhile for sparse takes where we'd otherwise drag around a lot of unused elements when exporting (e.g. to DuckDB).

This PR does a very crude estimation of the number of elements that stay in the ListViewArray.

Potentially we should do this estimation at export time in the ListViewExporter, but that's less trivial.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 8, 2026

Merging this PR will not alter performance

✅ 1122 untouched benchmarks
⏩ 1455 skipped benchmarks1


Comparing mitko/avoid-list-view-take-reduce-rebuild (df8bb7d) with develop (d8c156f)

Open in CodSpeed

Footnotes

  1. 1455 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@gatesn
Copy link
Copy Markdown
Contributor

gatesn commented Apr 8, 2026

Am I missing something? Why would we ever rebuild? The whole point of ListView is to have cheap take/filter.

It should be explicit from the user if they want to garbage collect

@dimitarvdimitrov
Copy link
Copy Markdown
Contributor Author

Am I missing something? Why would we ever rebuild? The whole point of ListView is to have cheap take/filter.

It should be explicit from the user if they want to garbage collect

The rebuild is still worthwhile for sparse takes where we'd otherwise drag around a lot of unused elements when exporting e.g. duckdb's ListViewExporter

let elements_exporter =
new_array_exporter_with_flatten(elements.clone(), cache, ctx, true)?;
if !elements.is_empty() {
elements_exporter.export(0, elements.len(), &mut duckdb_elements, ctx)?;
}

ideally we do this as heuristics in the exporter as opposed to in each operation like take. I was thinking of doing that next, but it might take a bit longer

@gatesn
Copy link
Copy Markdown
Contributor

gatesn commented Apr 8, 2026

ideally we do this as heuristics in the exporter as opposed to in each operation like take

Yeah, 100%. Agree priority might not be to do it now, but eager GC impacts all Vortex users

When the selection density is above the threshold, skip take_reduce so
we don't rebuild the ListView for dense takes.

Signed-off-by: Dimitar Dimitrov <mitko@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
@dimitarvdimitrov dimitarvdimitrov force-pushed the mitko/avoid-list-view-take-reduce-rebuild branch from 1074601 to 0cd4f36 Compare April 9, 2026 11:23
@dimitarvdimitrov dimitarvdimitrov added the changelog/skip Do not list PR in the changelog label Apr 9, 2026
@dimitarvdimitrov
Copy link
Copy Markdown
Contributor Author

looking into why the benchmarks regressed...

@dimitarvdimitrov dimitarvdimitrov marked this pull request as ready for review April 9, 2026 15:54
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
// TODO(connor)[ListView]: Ideally, we would only rebuild after all `take`s and `filter`
// compute functions have run, at the "top" of the operator tree. However, we cannot do this
// right now, so we will just rebuild every time (similar to [`ListArray`]).
pub const REBUILD_DENSITY_THRESHOLD: f32 = 0.1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should move? and also be pub crate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any suggestion for where to move it? it's now used in arrays/listview/compute and arrays/filter/execute

@joseph-isaacs joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label Apr 10, 2026
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action/benchmark Trigger full benchmarks to run on this PR changelog/skip Do not list PR in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants