feat: basic BoolBuffer / BoolBufferMut#2456
Conversation
|
Can we call this BitBuffer? |
gatesn
left a comment
There was a problem hiding this comment.
Think we'll also want iterators and vectorized constructors (i.e. BooleanBuffer::collect_bool)
25fc4bd to
67729e2
Compare
|
Sadly |
|
Where are we at with this? Need someone to takeover while you poke at FFI bindings? |
|
Maybe? I'm no longer sure there's a strong path forward that gives us something better than Arrow's BooleanBufferBuilder. The big issue is that mapping a bitset on top of uninit memory causes UB when you try and set bits in-place. While it generally seems harmless it is technically a violation of Rust's memory model so not guaranteed to work across platforms. |
|
Why does the memory have to be uninitialized? Even the ability to into_mut and update bits is valuable from an API perspective. |
|
I will take over this pr |
454875c to
e26e316
Compare
e26e316 to
ceff11e
Compare
39b9c8a to
7f68ba7
Compare
|
This now compiles but there's still a lot of tests to fix |
aff4773 to
68d6261
Compare
|
I need to figure out all the slow downs - seems like arrows append_packed_range is faster than bitvec implementaion and forcing iterators to be owned adds a bunch of drops that could be avoided |
4d268af to
42eb48a
Compare
87ea4e1 to
7e1e2df
Compare
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5a48fba to
a16e4ca
Compare
Pulled from #2456 This separates out the logic of adding the BitBuffer from using it. There are some outstanding performance issues before we can pick up #2456 again. I added some benchmarks that highlight where we're slower than Arrow atm --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Robert Kruszewski <github@robertk.io>
a16e4ca to
d93320a
Compare
|
Will make a new pr |
Replace usages of BooleanBuffer with our own BitBuffer. This lets us have our own backing Buffer for BitBuffer and implement additional optimisations for use in vortex. previously #2456 This pr requires resolving outstanding performance issues before merging 1. Make iterators cheaper to construct, likely can't be owned 2. Improve append_packed_range performance. Arrow has a function that is faster than bitvec crate --------- Signed-off-by: Robert Kruszewski <github@robertk.io> Signed-off-by: Nicholas Gates <nick@nickgates.com> Co-authored-by: Andrew Duffy <andrew@a10y.dev> Co-authored-by: Nicholas Gates <nick@nickgates.com>
It's very annoying to do random in-place setting of bits in Arrow BooleanBufferBuilder. It's very append focused. For something like SparseArray in particular, you want to create a BoolArray against uninitialized memory and only set the bits corresponding to positions of non-null values.