Rewrite HashSet<T>'s implementation based on Dictionary<T>'s#37180
Rewrite HashSet<T>'s implementation based on Dictionary<T>'s#37180stephentoub merged 9 commits intodotnet:masterfrom
Conversation
|
Tagging subscribers to this area: @eiriktsarpalis |
I was adjusting it based on a the asm output from the JIT; however that has improved considerably in 5.0 so trade off may have changed. Also the Dictionary in the core find is a ref return and its |
There was a problem hiding this comment.
Do we have dictionary tests that go down the "secure string comparer" paths? I thought we did but can't immediately find them.
There was a problem hiding this comment.
I can't find any either. It's probably worth adding some, but it'd also take a little work to find collisions. Maybe @GrabYourPitchforks has some in his back pocket :)
There was a problem hiding this comment.
I think it should read "comparer that uses..." instead of "comparer which is using..." (I know this is a copy-paste).
There was a problem hiding this comment.
We (@maryamariyan?) tested it and I recall it didn't take looping terribly long to find 100 collisions but I guess we didn't check them in...
danmoseley
left a comment
There was a problem hiding this comment.
Read it all and seems good to me.
|
I just noticed |
Thanks for reviewing. |
|
A few legitimate failures in the immutable collection tests; taking a look... |
And factor out InsertionBehavior into its own file
This effectively deletes HashSet's data structure and replaces it with the one used by Dictionary, then updated for the differences (e.g. just a value rather than a key and a value). HashSet used to have the same implementation, but Dictionary has evolved significantly and HashSet hasn't; this brings them to basic parity on implementation. Based on perf tests, I veered away from Dictionary's implementation in a few places (e.g. a goto-based implementation in the core find method led to a significant regression for Int32-based Contains operations), and we should follow-up to understand whether Dictionary should be changed as well, or why there's a difference between the two. Functionally, bringing over Dictionary's implementation yields a few notable changes, namely that Remove and Clear no longer invalidate enumerations. The tests have been updated accordingly.
Fixes #37111
Contributes to #1989
This moves HashSet into corelib, and then effectively deletes HashSet's data structure and replaces it with the one used by Dictionary, then updated for the differences (e.g. just a value rather than a key and a value). HashSet used to have basically the same implementation, but Dictionary has evolved significantly and HashSet hasn't; this brings them to basic parity on implementation.
Based on perf tests, I veered away from Dictionary's implementation in a few places (e.g. a goto-based implementation in the core find method led to a significant regression for Int32-based Contains operations), and we should follow-up to understand whether Dictionary should be changed as well, or why there's a difference between the two. @benaadams, if you have some time, it'd probably worth looking at this again; maybe you'll get different numbers than I did.
Functionally, bringing over Dictionary's implementation yields a few notable changes, namely that Remove and Clear no longer invalidate enumerations. The tests have been updated accordingly.
With HashSet now in corelib, I also updated two Dictionary uses I found in corelib that were using Dictionary as a set and switched them to use HashSet.
Running the dotnet/performance perf tests:
cc: @benaadams, @danmosemsft, @GrabYourPitchforks, @eiriktsarpalis, @layomia