Skip to content

gh-147991: Speed up tomllib import time#147992

Draft
vstinner wants to merge 5 commits intopython:mainfrom
vstinner:lazy_tomllib
Draft

gh-147991: Speed up tomllib import time#147992
vstinner wants to merge 5 commits intopython:mainfrom
vstinner:lazy_tomllib

Conversation

@vstinner
Copy link
Copy Markdown
Member

@vstinner vstinner commented Apr 2, 2026

Defer regular expressions import until the first datetime, localtime or non-trivial number (other that just decimal digits) is met.

Defer regular expressions import until the first datetime, localtime
or non-trivial number (other that just decimal digits) is met.
@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented Apr 2, 2026

It might be interesting to replace from types import MappingProxyType with built-in frozendict. But currently, the GitHub Action CI runs mypy with Python 3.12 which doesn't have frozendict.

Copy link
Copy Markdown
Member Author

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked added constants and functions as private by adding _ prefix. I'm not sure if it's needed, all other _parser APIs are "public" (no underscore prefix).

@hugovk
Copy link
Copy Markdown
Member

hugovk commented Apr 2, 2026

It might be interesting to replace from types import MappingProxyType with built-in frozendict. But currently, the GitHub Action CI runs mypy with Python 3.12 which doesn't have frozendict.

Adding # type: ignore[name-defined] is a quick fix.

This:

diff --git a/Lib/tomllib/_parser.py b/Lib/tomllib/_parser.py
index b59d0f7d54b..96f189537cf 100644
--- a/Lib/tomllib/_parser.py
+++ b/Lib/tomllib/_parser.py
@@ -4,7 +4,7 @@
 
 from __future__ import annotations
 
-from types import MappingProxyType
+__lazy_modules__ = ["tomllib._re"]
 
 from ._re import (
     RE_DATETIME,
@@ -42,7 +42,7 @@
 KEY_INITIAL_CHARS: Final = BARE_KEY_CHARS | frozenset("\"'")
 HEXDIGIT_CHARS: Final = frozenset("abcdef" "ABCDEF" "0123456789")
 
-BASIC_STR_ESCAPE_REPLACEMENTS: Final = MappingProxyType(
+BASIC_STR_ESCAPE_REPLACEMENTS: Final = frozendict(  # type: ignore[name-defined]
     {
         "\\b": "\u0008",  # backspace
         "\\t": "\u0009",  # tab

Gets us from 4ms:

image

To ~0ms:

image

if pos >= end:
break
else:
if src[pos] != "\n":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen? We could just return None and fall back to the original path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in many cases. See the added test_parse_simple_number(). Examples:

  • The test is true when parsing 1979-05-27: we cannot parse the date.
  • The test is false when parsing 1\n (ex: value = 1\n) or 23, 24]\n (ex: list = [23, 24]\n)

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented Apr 2, 2026

I updated the PR to replace types.MappingProxyType with frozendict type thanks to # type: ignore[name-defined] annotation (to please mypy gods).

I ran benchmarks on the latest PR using Python built in release mode (gcc -O3) on Fedora 43:

  • According to -X importtime, with this change, import tomllib takes 828 us instead of 9.0 ms on main (10.9x faster).
  • Using python -m pyperf command with ./python -S, with this change, import tomllib takes 0.98 ms instead of 9.8 ms (10x faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants