gh-131798: Narrow the return type of _FORMAT_SIMPLE and _FORMAT_WITH_SPEC to str#146639
gh-131798: Narrow the return type of _FORMAT_SIMPLE and _FORMAT_WITH_SPEC to str#146639NekoAsakura wants to merge 3 commits intopython:mainfrom
_FORMAT_SIMPLE and _FORMAT_WITH_SPEC to str#146639Conversation
…MAT_WITH_SPEC` to `str`
| res = sym_new_type(ctx, &PyUnicode_Type); | ||
| } | ||
|
|
||
| op(_FORMAT_WITH_SPEC, (value, fmt_spec -- res)) { |
There was a problem hiding this comment.
This can return a subclass, e.g.
class my(str):
def __format__(self, spec):
return self
m=my('aap')
type(f'{m}')
Does the sym_new_type(ctx, &PyUnicode_Type); not guarantee we have an exact type?
There was a problem hiding this comment.
Good catch! I oversimplified this. Narrowing type for format isn't as straightforward as I thought.
That said, I'm curious why FORMAT_SIMPLE needs to preserve str subclasses. Had a look through the git history it's been this way since 2015, and seems not a deliberate type-preservation contract.
Current behaviour is already inconsistent:
class my(str):
def __format__(self, spec):
return self
m = my('aap')
type('{}'.format(m)) # <class 'my'>
type('x{}'.format(m)) # <class 'str'>
type(f'{m}') # <class 'my'>
type(f'x{m}') # <class 'str'>
Multi-piece concatenation already strips it (both _PyUnicodeWriter in str.format and BUILD_STRING in f-strings create exact str).
If FORMAT_SIMPLE normalised to exact str, this inconsistency goes away and type narrowing becomes unconditionally correct. Would that be a reasonable change, or is there a reason to preserve str subclass identity through formatting that I'm missing?
There was a problem hiding this comment.
Huh that's kinda scary. The semantics look a little strange, so maybe we should leave this alone for now?
There was a problem hiding this comment.
Fair enough. I just noticed the inconsistency and thought it was worth raising.
For the narrowing itself: we could do conditional narrowing for input types where we can prove __format__ returns exact str (e.g. int, float). But honestly it feels not pythonic to hardcode a list of "safe" types for a uop that's less then 0.1%. I'd prefer just closing this PR unless you think it's worth keeping with that approach?
There was a problem hiding this comment.
Actually we already have something like that called sym_is_safe_type, you can use that
There was a problem hiding this comment.
I couldn't find sym_is_safe_type anywhere. The closest thing I can see is sym_is_safe_const, but that checks for known constant values rather than types. Have I overlooked something?
There was a problem hiding this comment.
Oh yeah sorry that's the one! Just factor out the type checks in _Py_uop_sym_is_safe_const into another function and use that in this case, I think that's fine!
There was a problem hiding this comment.
Done. How does it look now?
…MAT_WITH_SPEC` to str for built-in types
Python/optimizer_symbols.c
Outdated
| return (typ == &PyLong_Type) || | ||
| (typ == &PyUnicode_Type) || | ||
| (typ == &PyFloat_Type) || | ||
| (typ == &_PyNone_Type) || | ||
| (typ == &PyBool_Type) || | ||
| (typ == &PyFrozenDict_Type); |
There was a problem hiding this comment.
Oops, I meant to factor this out into a common function, sorry!
There was a problem hiding this comment.
You mean factoring safe types into a static helper? No problem.
…MAT_WITH_SPEC` to str for built-in types
Uh oh!
There was an error while loading. Please reload this page.