Feature or enhancement
Proposal:
Here are a few optimizing macros, some of which clang under Linux does not "see", because
None of these are seen by clang-cl on Windows, because there
- clang-cl does not set
__GNUC__ (most probably because too much code out there would then assume "ah - I am on Linux")
- but clang-cl does set
__clang__
IMHO, "syncing" them between GCC/clang on Linux and clang-cl on Windows is preferable.
Neither seen on Linux nor on Windows: #130891 would fix:
|
#if defined(__GNUC__) \ |
|
&& ((__GNUC__ >= 5) || (__GNUC__ == 4) && (__GNUC_MINOR__ >= 3)) |
|
#define _Py_HOT_FUNCTION __attribute__((hot)) |
Seen on Linux, not seen on Windows: #131019 would fix:
|
#if defined(__GNUC__) && (__GNUC__ > 2) && defined(__OPTIMIZE__) |
|
# define UNLIKELY(value) __builtin_expect((value), 0) |
|
# define LIKELY(value) __builtin_expect((value), 1) |
Seen on Linux, not seen on Windows:
|
#if defined(__GNUC__) \ |
|
&& (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96)) |
|
# define XML_ATTR_MALLOC __attribute__((__malloc__)) |
Neither seen on Linux nor on Windows:
|
#if defined(__GNUC__) \ |
|
&& ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)) |
|
# define XML_ATTR_ALLOC_SIZE(x) __attribute__((__alloc_size__(x))) |
The last two are in vendored code, but I've temporarily modified it (01183d7) and then reverted again (1c4a55d)
Enabling them all for clang-cl on Windows is performance neutral wrt to the pyperformance benchmark.
| Benchmark |
clang.release.19.1.1.92e5f826ac |
clang.release.19.1.1.16a7f4607e.pyHot |
| Geometric mean |
(ref) |
1.01x faster |
| Benchmark |
clang.pgo.19.1.1.92e5f826ac |
clang.pgo.19.1.1.16a7f4607e.pyHot |
| Geometric mean |
(ref) |
1.01x slower |
| Benchmark |
clang.release.20.1.0-rc2.92e5f826ac |
clang.release.20.1.0-rc2.16a7f4607e.pyHot |
| Geometric mean |
(ref) |
1.01x faster |
| Benchmark |
clang.pgo.20.1.0-rc2.92e5f826ac |
clang.pgo.20.1.0-rc2.16a7f4607e.pyHot |
| Geometric mean |
(ref) |
1.00x slower |
Details
| Benchmark |
clang.release.19.1.1.92e5f826ac |
clang.release.19.1.1.16a7f4607e.pyHot |
| telco |
11.9 ms |
10.9 ms: 1.10x faster |
| xml_etree_parse |
236 ms |
217 ms: 1.09x faster |
| logging_format |
15.8 us |
15.0 us: 1.05x faster |
| async_tree_eager |
145 ms |
138 ms: 1.05x faster |
| async_tree_none_tg |
375 ms |
358 ms: 1.05x faster |
| unpickle_list |
5.83 us |
5.57 us: 1.05x faster |
| xml_etree_iterparse |
157 ms |
150 ms: 1.05x faster |
| unpickle |
23.0 us |
22.0 us: 1.05x faster |
| async_tree_memoization_tg |
460 ms |
442 ms: 1.04x faster |
| xml_etree_generate |
142 ms |
137 ms: 1.04x faster |
| nqueens |
125 ms |
120 ms: 1.04x faster |
| async_tree_memoization |
490 ms |
472 ms: 1.04x faster |
| async_tree_io |
876 ms |
844 ms: 1.04x faster |
| logging_simple |
14.3 us |
13.8 us: 1.04x faster |
| deepcopy_reduce |
3.92 us |
3.78 us: 1.04x faster |
| crypto_pyaes |
104 ms |
101 ms: 1.04x faster |
| pprint_pformat |
2.18 sec |
2.10 sec: 1.04x faster |
| async_tree_none |
391 ms |
378 ms: 1.03x faster |
| pprint_safe_repr |
1.06 sec |
1.02 sec: 1.03x faster |
| async_tree_eager_memoization |
290 ms |
281 ms: 1.03x faster |
| json_dumps |
16.5 ms |
16.0 ms: 1.03x faster |
| fannkuch |
580 ms |
562 ms: 1.03x faster |
| scimark_sparse_mat_mult |
5.76 ms |
5.59 ms: 1.03x faster |
| async_tree_eager_io |
822 ms |
798 ms: 1.03x faster |
| xml_etree_process |
96.4 ms |
93.7 ms: 1.03x faster |
| async_tree_eager_tg |
307 ms |
299 ms: 1.03x faster |
| scimark_fft |
481 ms |
470 ms: 1.03x faster |
| coroutines |
31.9 ms |
31.2 ms: 1.02x faster |
| async_tree_io_tg |
853 ms |
834 ms: 1.02x faster |
| pathlib |
255 ms |
250 ms: 1.02x faster |
| typing_runtime_protocols |
224 us |
220 us: 1.02x faster |
| django_template |
53.7 ms |
52.8 ms: 1.02x faster |
| sympy_expand |
650 ms |
640 ms: 1.02x faster |
| unpickle_pure_python |
305 us |
300 us: 1.02x faster |
| async_tree_eager_memoization_tg |
411 ms |
405 ms: 1.02x faster |
| async_tree_cpu_io_mixed_tg |
752 ms |
741 ms: 1.02x faster |
| chaos |
88.2 ms |
86.9 ms: 1.02x faster |
| sqlite_synth |
3.57 us |
3.52 us: 1.01x faster |
| tomli_loads |
2.70 sec |
2.66 sec: 1.01x faster |
| pickle_pure_python |
444 us |
438 us: 1.01x faster |
| sqlglot_normalize |
150 ms |
148 ms: 1.01x faster |
| regex_compile |
171 ms |
169 ms: 1.01x faster |
| mako |
17.3 ms |
17.1 ms: 1.01x faster |
| sqlglot_parse |
1.67 ms |
1.65 ms: 1.01x faster |
| sympy_sum |
211 ms |
208 ms: 1.01x faster |
| hexiom |
8.26 ms |
8.19 ms: 1.01x faster |
| sqlglot_transpile |
2.06 ms |
2.04 ms: 1.01x faster |
| sqlglot_optimize |
74.0 ms |
73.4 ms: 1.01x faster |
| python_startup |
43.2 ms |
42.9 ms: 1.01x faster |
| async_generators |
540 ms |
536 ms: 1.01x faster |
| gc_traversal |
4.82 ms |
4.79 ms: 1.01x faster |
| comprehensions |
23.2 us |
23.1 us: 1.01x faster |
| generators |
38.1 ms |
37.8 ms: 1.01x faster |
| richards_super |
73.9 ms |
73.4 ms: 1.01x faster |
| deepcopy |
376 us |
373 us: 1.01x faster |
| genshi_text |
29.8 ms |
29.6 ms: 1.01x faster |
| pickle_dict |
32.3 us |
32.2 us: 1.00x faster |
| scimark_sor |
168 ms |
169 ms: 1.01x slower |
| go |
145 ms |
146 ms: 1.01x slower |
| pyflate |
596 ms |
602 ms: 1.01x slower |
| logging_silent |
133 ns |
135 ns: 1.01x slower |
| dulwich_log |
130 ms |
132 ms: 1.01x slower |
| regex_v8 |
35.2 ms |
35.6 ms: 1.01x slower |
| spectral_norm |
128 ms |
130 ms: 1.02x slower |
| docutils |
3.60 sec |
3.66 sec: 1.02x slower |
| sympy_integrate |
26.4 ms |
26.8 ms: 1.02x slower |
| scimark_monte_carlo |
90.7 ms |
92.6 ms: 1.02x slower |
| float |
102 ms |
105 ms: 1.02x slower |
| 2to3 |
429 ms |
439 ms: 1.02x slower |
| nbody |
151 ms |
155 ms: 1.03x slower |
| genshi_xml |
71.7 ms |
74.2 ms: 1.04x slower |
| Geometric mean |
(ref) |
1.01x faster |
| Benchmark |
clang.pgo.19.1.1.92e5f826ac |
clang.pgo.19.1.1.16a7f4607e.pyHot |
| 2to3 |
465 ms |
380 ms: 1.22x faster |
| async_generators |
506 ms |
490 ms: 1.03x faster |
| coroutines |
27.1 ms |
26.4 ms: 1.03x faster |
| pidigits |
233 ms |
228 ms: 1.02x faster |
| pickle_dict |
27.8 us |
27.3 us: 1.02x faster |
| sympy_sum |
187 ms |
184 ms: 1.02x faster |
| typing_runtime_protocols |
186 us |
183 us: 1.02x faster |
| raytrace |
309 ms |
305 ms: 1.02x faster |
| unpickle |
16.6 us |
16.4 us: 1.01x faster |
| genshi_xml |
60.4 ms |
59.5 ms: 1.01x faster |
| regex_compile |
151 ms |
149 ms: 1.01x faster |
| scimark_sparse_mat_mult |
4.82 ms |
4.77 ms: 1.01x faster |
| unpack_sequence |
55.7 ns |
55.1 ns: 1.01x faster |
| sqlglot_parse |
1.42 ms |
1.41 ms: 1.01x faster |
| sqlglot_transpile |
1.75 ms |
1.73 ms: 1.01x faster |
| telco |
9.01 ms |
8.91 ms: 1.01x faster |
| logging_format |
12.9 us |
12.8 us: 1.01x faster |
| unpickle_list |
5.04 us |
4.99 us: 1.01x faster |
| nqueens |
95.3 ms |
94.4 ms: 1.01x faster |
| async_tree_eager_io |
720 ms |
714 ms: 1.01x faster |
| sympy_expand |
556 ms |
551 ms: 1.01x faster |
| scimark_lu |
124 ms |
123 ms: 1.01x faster |
| docutils |
3.09 sec |
3.07 sec: 1.01x faster |
| chaos |
69.1 ms |
68.7 ms: 1.01x faster |
| sqlglot_optimize |
63.5 ms |
63.1 ms: 1.01x faster |
| sympy_integrate |
22.9 ms |
22.7 ms: 1.01x faster |
| spectral_norm |
106 ms |
105 ms: 1.00x faster |
| scimark_fft |
352 ms |
351 ms: 1.00x faster |
| deepcopy |
298 us |
300 us: 1.00x slower |
| generators |
34.0 ms |
34.2 ms: 1.01x slower |
| meteor_contest |
119 ms |
119 ms: 1.01x slower |
| logging_silent |
106 ns |
106 ns: 1.01x slower |
| tomli_loads |
2.21 sec |
2.22 sec: 1.01x slower |
| pickle_pure_python |
367 us |
369 us: 1.01x slower |
| regex_effbot |
3.21 ms |
3.24 ms: 1.01x slower |
| pyflate |
514 ms |
518 ms: 1.01x slower |
| sqlite_synth |
3.41 us |
3.44 us: 1.01x slower |
| deltablue |
3.66 ms |
3.69 ms: 1.01x slower |
| unpickle_pure_python |
247 us |
249 us: 1.01x slower |
| nbody |
126 ms |
128 ms: 1.01x slower |
| scimark_sor |
140 ms |
141 ms: 1.01x slower |
| mdp |
3.13 sec |
3.16 sec: 1.01x slower |
| pprint_safe_repr |
891 ms |
899 ms: 1.01x slower |
| go |
126 ms |
127 ms: 1.01x slower |
| richards_super |
52.0 ms |
52.6 ms: 1.01x slower |
| async_tree_eager |
116 ms |
117 ms: 1.01x slower |
| regex_dna |
204 ms |
207 ms: 1.01x slower |
| create_gc_cycles |
1.49 ms |
1.51 ms: 1.01x slower |
| richards |
45.4 ms |
46.0 ms: 1.01x slower |
| deepcopy_memo |
33.4 us |
34.1 us: 1.02x slower |
| async_tree_eager_tg |
267 ms |
273 ms: 1.02x slower |
| json_loads |
31.2 us |
31.9 us: 1.02x slower |
| pprint_pformat |
1.79 sec |
1.85 sec: 1.03x slower |
| gc_traversal |
5.03 ms |
5.28 ms: 1.05x slower |
| xml_etree_parse |
208 ms |
220 ms: 1.06x slower |
| async_tree_io |
759 ms |
832 ms: 1.10x slower |
| asyncio_tcp |
1.38 sec |
1.52 sec: 1.10x slower |
| xml_etree_process |
78.5 ms |
87.4 ms: 1.11x slower |
| xml_etree_generate |
114 ms |
128 ms: 1.11x slower |
| async_tree_memoization_tg |
392 ms |
449 ms: 1.15x slower |
| async_tree_io_tg |
746 ms |
855 ms: 1.15x slower |
| async_tree_memoization |
414 ms |
477 ms: 1.15x slower |
| async_tree_none_tg |
325 ms |
382 ms: 1.17x slower |
| xml_etree_iterparse |
141 ms |
172 ms: 1.22x slower |
| Geometric mean |
(ref) |
1.01x slower |
| Benchmark |
clang.release.20.1.0-rc2.92e5f826ac |
clang.release.20.1.0-rc2.16a7f4607e.pyHot |
| spectral_norm |
139 ms |
124 ms: 1.13x faster |
| pickle_list |
5.89 us |
5.46 us: 1.08x faster |
| sqlite_synth |
3.71 us |
3.51 us: 1.06x faster |
| pickle_dict |
32.3 us |
30.7 us: 1.05x faster |
| unpickle |
20.8 us |
20.0 us: 1.04x faster |
| json_loads |
43.0 us |
41.3 us: 1.04x faster |
| unpickle_list |
5.35 us |
5.15 us: 1.04x faster |
| mako |
16.9 ms |
16.3 ms: 1.04x faster |
| pprint_safe_repr |
1.01 sec |
976 ms: 1.03x faster |
| crypto_pyaes |
102 ms |
98.7 ms: 1.03x faster |
| coverage |
111 ms |
108 ms: 1.03x faster |
| coroutines |
30.3 ms |
29.5 ms: 1.03x faster |
| telco |
10.4 ms |
10.2 ms: 1.03x faster |
| json_dumps |
15.6 ms |
15.3 ms: 1.02x faster |
| asyncio_websockets |
547 ms |
534 ms: 1.02x faster |
| scimark_sparse_mat_mult |
5.94 ms |
5.81 ms: 1.02x faster |
| pprint_pformat |
2.07 sec |
2.02 sec: 1.02x faster |
| unpickle_pure_python |
300 us |
294 us: 1.02x faster |
| xml_etree_parse |
218 ms |
214 ms: 1.02x faster |
| xml_etree_generate |
135 ms |
133 ms: 1.02x faster |
| async_generators |
510 ms |
501 ms: 1.02x faster |
| typing_runtime_protocols |
217 us |
213 us: 1.02x faster |
| mdp |
3.72 sec |
3.67 sec: 1.02x faster |
| scimark_fft |
437 ms |
431 ms: 1.01x faster |
| bench_thread_pool |
1.79 ms |
1.77 ms: 1.01x faster |
| deepcopy_reduce |
3.71 us |
3.66 us: 1.01x faster |
| docutils |
3.56 sec |
3.52 sec: 1.01x faster |
| async_tree_memoization_tg |
433 ms |
428 ms: 1.01x faster |
| sqlglot_transpile |
2.02 ms |
2.00 ms: 1.01x faster |
| genshi_xml |
69.7 ms |
69.0 ms: 1.01x faster |
| xml_etree_process |
92.2 ms |
91.3 ms: 1.01x faster |
| fannkuch |
539 ms |
535 ms: 1.01x faster |
| sqlglot_normalize |
144 ms |
143 ms: 1.01x faster |
| float |
102 ms |
101 ms: 1.01x faster |
| raytrace |
361 ms |
358 ms: 1.01x faster |
| sqlglot_parse |
1.64 ms |
1.63 ms: 1.01x faster |
| gc_traversal |
4.84 ms |
4.80 ms: 1.01x faster |
| nqueens |
117 ms |
116 ms: 1.01x faster |
| meteor_contest |
123 ms |
123 ms: 1.01x faster |
| sqlglot_optimize |
71.4 ms |
71.1 ms: 1.00x faster |
| comprehensions |
23.0 us |
22.9 us: 1.00x faster |
| pidigits |
240 ms |
240 ms: 1.00x faster |
| unpack_sequence |
55.0 ns |
55.2 ns: 1.00x slower |
| chaos |
84.3 ms |
84.7 ms: 1.00x slower |
| dulwich_log |
126 ms |
126 ms: 1.00x slower |
| regex_compile |
165 ms |
166 ms: 1.00x slower |
| hexiom |
8.01 ms |
8.07 ms: 1.01x slower |
| async_tree_cpu_io_mixed_tg |
708 ms |
714 ms: 1.01x slower |
| async_tree_eager |
134 ms |
135 ms: 1.01x slower |
| richards_super |
73.2 ms |
73.9 ms: 1.01x slower |
| deltablue |
4.31 ms |
4.35 ms: 1.01x slower |
| asyncio_tcp_ssl |
3.59 sec |
3.64 sec: 1.01x slower |
| 2to3 |
418 ms |
423 ms: 1.01x slower |
| scimark_sor |
159 ms |
162 ms: 1.02x slower |
| python_startup |
40.9 ms |
41.7 ms: 1.02x slower |
| scimark_lu |
143 ms |
146 ms: 1.02x slower |
| async_tree_eager_cpu_io_mixed |
551 ms |
565 ms: 1.03x slower |
| go |
146 ms |
150 ms: 1.03x slower |
| generators |
38.2 ms |
39.7 ms: 1.04x slower |
| nbody |
136 ms |
142 ms: 1.05x slower |
| Geometric mean |
(ref) |
1.01x faster |
| Benchmark |
clang.pgo.20.1.0-rc2.92e5f826ac |
clang.pgo.20.1.0-rc2.16a7f4607e.pyHot |
| pickle_pure_python |
383 us |
364 us: 1.05x faster |
| pprint_safe_repr |
863 ms |
840 ms: 1.03x faster |
| regex_effbot |
3.20 ms |
3.13 ms: 1.02x faster |
| pickle_list |
4.77 us |
4.66 us: 1.02x faster |
| typing_runtime_protocols |
178 us |
174 us: 1.02x faster |
| pprint_pformat |
1.78 sec |
1.74 sec: 1.02x faster |
| xml_etree_generate |
110 ms |
108 ms: 1.02x faster |
| richards |
45.1 ms |
44.3 ms: 1.02x faster |
| scimark_sor |
138 ms |
136 ms: 1.01x faster |
| gc_traversal |
5.21 ms |
5.15 ms: 1.01x faster |
| xml_etree_process |
76.3 ms |
75.5 ms: 1.01x faster |
| async_tree_eager |
113 ms |
111 ms: 1.01x faster |
| xml_etree_parse |
202 ms |
201 ms: 1.01x faster |
| nqueens |
92.3 ms |
91.4 ms: 1.01x faster |
| coroutines |
24.9 ms |
24.7 ms: 1.01x faster |
| mako |
13.4 ms |
13.3 ms: 1.01x faster |
| meteor_contest |
118 ms |
118 ms: 1.00x faster |
| unpickle_pure_python |
247 us |
246 us: 1.00x faster |
| sqlglot_normalize |
120 ms |
119 ms: 1.00x faster |
| deltablue |
3.69 ms |
3.71 ms: 1.00x slower |
| sympy_integrate |
22.6 ms |
22.7 ms: 1.00x slower |
| deepcopy |
289 us |
291 us: 1.01x slower |
| sympy_sum |
181 ms |
182 ms: 1.01x slower |
| unpack_sequence |
55.1 ns |
55.4 ns: 1.01x slower |
| 2to3 |
370 ms |
373 ms: 1.01x slower |
| asyncio_tcp_ssl |
3.52 sec |
3.55 sec: 1.01x slower |
| sqlite_synth |
3.20 us |
3.22 us: 1.01x slower |
| async_tree_eager_io |
701 ms |
707 ms: 1.01x slower |
| sqlglot_parse |
1.38 ms |
1.40 ms: 1.01x slower |
| pidigits |
228 ms |
230 ms: 1.01x slower |
| async_tree_io_tg |
727 ms |
735 ms: 1.01x slower |
| dulwich_log |
115 ms |
117 ms: 1.01x slower |
| python_startup |
39.4 ms |
39.9 ms: 1.01x slower |
| chaos |
67.0 ms |
67.9 ms: 1.01x slower |
| raytrace |
299 ms |
303 ms: 1.01x slower |
| nbody |
119 ms |
120 ms: 1.01x slower |
| async_tree_eager_tg |
260 ms |
264 ms: 1.01x slower |
| scimark_lu |
122 ms |
124 ms: 1.02x slower |
| python_startup_no_site |
34.0 ms |
34.5 ms: 1.02x slower |
| regex_dna |
204 ms |
208 ms: 1.02x slower |
| crypto_pyaes |
81.1 ms |
82.7 ms: 1.02x slower |
| scimark_fft |
341 ms |
349 ms: 1.02x slower |
| scimark_sparse_mat_mult |
4.53 ms |
4.65 ms: 1.03x slower |
| sympy_str |
320 ms |
329 ms: 1.03x slower |
| bench_thread_pool |
1.63 ms |
1.68 ms: 1.03x slower |
| deepcopy_reduce |
2.96 us |
3.06 us: 1.03x slower |
| tomli_loads |
2.20 sec |
2.28 sec: 1.03x slower |
| pathlib |
232 ms |
241 ms: 1.04x slower |
| telco |
8.45 ms |
8.77 ms: 1.04x slower |
| unpickle |
15.6 us |
16.2 us: 1.04x slower |
| pickle |
13.5 us |
14.3 us: 1.05x slower |
| async_tree_memoization |
405 ms |
428 ms: 1.06x slower |
| async_tree_io |
740 ms |
784 ms: 1.06x slower |
| Geometric mean |
(ref) |
1.00x slower |
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
Feature or enhancement
Proposal:
Here are a few optimizing macros, some of which clang under Linux does not "see", because
None of these are seen by clang-cl on Windows, because there
__GNUC__(most probably because too much code out there would then assume "ah - I am on Linux")__clang__IMHO, "syncing" them between GCC/clang on Linux and clang-cl on Windows is preferable.
Neither seen on Linux nor on Windows: #130891 would fix:
cpython/Include/pyport.h
Lines 323 to 325 in 98fa4a4
Seen on Linux, not seen on Windows: #131019 would fix:
cpython/Objects/obmalloc.c
Lines 1460 to 1462 in 98fa4a4
Seen on Linux, not seen on Windows:
cpython/Modules/expat/expat_external.h
Lines 115 to 117 in 98fa4a4
Neither seen on Linux nor on Windows:
cpython/Modules/expat/expat_external.h
Lines 122 to 124 in 98fa4a4
The last two are in vendored code, but I've temporarily modified it (01183d7) and then reverted again (1c4a55d)
Enabling them all for clang-cl on Windows is performance neutral wrt to the pyperformance benchmark.
Details
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs