-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
[Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102578
Description
Feature or enhancement
The dataclasses library provides an easy way to create classes. The library will automatically generate relevant methods for the users.
Creating dataclasses with argument frozen=True will automatically generate methods __setattr__ and __delattr__ in _frozen_get_del_attr.
This issue proposes to change the tuple-based lookup to set-based lookup. Reduce the time complexity from
In [1]: # tuple-based
In [2]: %timeit 'a' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
9.91 ns ± 0.0982 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [3]: %timeit 'd' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
33.2 ns ± 0.701 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [4]: %timeit 'g' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
56.4 ns ± 0.818 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: # set-based
In [6]: %timeit 'a' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.3 ns ± 0.0723 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [7]: %timeit 'd' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11 ns ± 0.106 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [8]: %timeit 'g' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.1 ns ± 0.126 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)A tiny benchmark script:
from contextlib import suppress
from dataclasses import FrozenInstanceError, dataclass
@dataclass(frozen=True)
class Foo2:
a: int
b: int
foo2 = Foo2(1, 2)
def bench2(inst):
with suppress(FrozenInstanceError):
inst.a = 0
with suppress(FrozenInstanceError):
inst.b = 0
@dataclass(frozen=True)
class Foo7:
a: int
b: int
c: int
d: int
e: int
f: int
g: int
foo7 = Foo7(1, 2, 3, 4, 5, 6, 7)
def bench7(inst):
with suppress(FrozenInstanceError):
inst.a = 0
with suppress(FrozenInstanceError):
inst.b = 0
with suppress(FrozenInstanceError):
inst.c = 0
with suppress(FrozenInstanceError):
inst.d = 0
with suppress(FrozenInstanceError):
inst.e = 0
with suppress(FrozenInstanceError):
inst.f = 0
with suppress(FrozenInstanceError):
inst.g = 0
class Bar(Foo7):
def __init__(self, a, b, c, d, e, f, g):
super().__init__(a, b, c, d, e, f, g)
self.baz = 0
def bench(inst):
inst.baz = 1Result:
set-based lookup:
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit bench(bar)
249 ns ± 6.31 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)tuple-based lookup (original):
In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit bench(bar)
269 ns ± 4.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Result:
`set`-based lookup:
```python
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
tuple-based lookup (original):
In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)The set-based is constantly faster than the old approach. And the theoretical time complexity is also smaller (
Ref: #102573
Pitch
(Explain why this feature or enhancement should be implemented and how it would be used.
Add examples, if applicable.)
In the autogenerate __setattr__ and __delattr__, they have a sanity check at the beginning of the method. For example:
def __setattr__(self, name, value):
if type(self) is {{UserType}} or name in ({{a tuple of field names}}):
raise FrozenInstanceError(f"cannot assign to field {name!r}")
super(cls, self).__setattr__(name, value)If someone inherits the frozen dataclass, the sanity check will take tuple__contains__(...) and finally calls super().__setattr__(...). For example:
@dataclass(frozen=True)
class FrozenBase:
x: int
y: int
... # N_FIELDS
class Foo(FrozenBase):
def __init__(self, x, y, somevalue, someothervalue):
super().__init__(x, y)
self.somevalue = somevalue # takes O(N_FIELDS)
self.someothervalue = someothervalue # takes O(N_FIELDS) time again
foo = Foo(1, 2, 3, 4)
foo.extravalue = extravalue # takes O(N_FIELDS) time againPrevious discussion
N/A.