Performance by dimbleby · Pull Request #467 · python-poetry/tomlkit

dimbleby · 2026-04-02T07:30:24Z

More from Claude, this time attacking performance. In my rough and ready benchmarking this series of commits makes tomlkit about 4 times faster parsing its own pyproject.toml.

(It still is the slowest toml parser that I know - but four times faster than it was).

Some of these commits are more attractive than others eg the last one that inlines some methods is a performance win - but perhaps a small loss for clean code.

Let me know if you want to keep some but not others. Or feel free to experiment, rearrange, put commits onto your own branch - I dont care whether improvements land from this pull request or another.

When merging super-tables (e.g. [tool.ruff] into existing [tool]), the parser was deep-copying the entire existing table just to append new entries alongside it. Since TOML forbids duplicate keys, the existing items are never modified — we can simply mutate the table in place, appending new entries directly. The only place that needed protection was the out-of-order table validation path (OutOfOrderTableProxy.validate), which creates a temp container and re-merges fragments to check for conflicts. Move the (shallow) copy there — it's a rare path that only runs when tables of the same name are separated by unrelated tables. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- StringType.is_*() methods: replace set membership tests with identity comparison ("self is X or self is Y"). Enum singletons make "is" correct and it avoids creating a temporary set and hashing enum values on every call. These methods are called per-character during string parsing. - Parser._current/_idx/_marker: access Source._current etc directly, bypassing an unnecessary property indirection layer. With ~2M accesses per parse this eliminates millions of redundant function calls. - _parse_string: hoist loop-invariant delim.is_singleline(), delim.is_multiline(), delim.is_basic(), and delim.unit into local variables before the per-character loop. The delimiter type never changes within the loop (it is set once, after the opening delimiter is consumed). - Use tuple instead of list for "in" check on control char codes (tuples are faster for containment tests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace the TOMLChar wrapper class with plain module-level string constants (BARE, KV, NUMBER, SPACES, NL, WS). Character class checks become simple `c in CONSTANT` instead of method calls on a str subclass, eliminating 710k object creations per parse. Switch Source from an iterator over a pre-built list of (int, TOMLChar) tuples to direct string indexing. The _State context manager now saves/restores just _idx, _current, and _marker — three scalar assignments instead of copying a list iterator. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three structural fixes that remove redundant work during parsing: 1. Table.raw_append: after appending a dotted key, used to read the value back via Container.__getitem__ (which creates a throwaway SingleKey, searches the map, etc). Now uses dict.__getitem__ on the Container directly — same result, no intermediate objects. 2. Container.append: checked 'key in self' which goes through MutableMapping.__contains__ → __getitem__ → item() → SingleKey. Changed to 'key in self._map' — a direct dict lookup on the internal map that already uses Key objects as keys. 3. SingleKey.__init__: the bare-key character check was rebuilding string.ascii_letters + string.digits + '-_' on every call. Now uses the pre-computed BARE constant from toml_char. Together these eliminate ~46k unnecessary SingleKey creations and ~26k unnecessary Container.__getitem__ calls per 500 parses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Structural changes to avoid unnecessary work: 1. Extract _parse_table_header() and pass pre-parsed headers to _parse_table(), eliminating the peek-then-reparse pattern where every table header was parsed twice (once to peek, once for real). Removes _peek_table() entirely. 2. Reorder _parse_array() to check for closing bracket before attempting to parse a value, eliminating 3 speculative UnexpectedCharError constructions per parse (each requiring an expensive _to_linecol() call that scans the full source). Total function calls reduced from 6.3M to 5.88M per 500 parses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

During parsing, Container.append performed expensive formatting work (indentation adjustment, display name invalidation, insertion ordering) that was immediately short-circuited by the _parsed flag — but the isinstance checks guarding those blocks still ran through the slow ABCMeta path every time. Guard all formatting-only logic with 'if not self._parsed:', avoiding ~250k isinstance calls (including ~188k through ABCMeta) and ~38k ends_with_whitespace / _previous_item calls per 500 parses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace the Parser delegation layer (self._current, self.inc(), self.mark(), self.extract(), self.end()) with direct access to the Source object (src = self._src; src._current, src.inc(), etc.) in all hot methods. This eliminates ~3M Python function calls per 500 parses — one extra frame per property access or method delegation. The delegation wrappers are retained for use by less performance- sensitive code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dimbleby and others added 2 commits April 2, 2026 08:23

dimbleby force-pushed the performance branch from 2b238d8 to e36aa85 Compare April 2, 2026 07:31

dimbleby and others added 5 commits April 2, 2026 08:34

dimbleby force-pushed the performance branch from e36aa85 to 9c27cc2 Compare April 2, 2026 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance#467

Performance#467
dimbleby wants to merge 7 commits intopython-poetry:masterfrom
dimbleby:performance

dimbleby commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dimbleby commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dimbleby commented Apr 2, 2026 •

edited

Loading