Skip to content

Commit f175a51

Browse files
authored
Fully migrate to Pydantic v2 (#13940)
Use confection v1.3 and Thinc v8.3.13, which implement custom validation logic in place of Pydantic, allowing us to properly adopt Pydantic v2 and provide full Python 3.14 support. Our dependency tree used Pydantic v1 in unusual ways, and relied on behaviours that Pydantic v2 reformed. In the time since Pydantic v2 was released there were a few attempts to migrate over to it, but the task has been complicated by the fact that the confection library has a fairly tangled implementation and I had reduced availability for open-source work in 2024 and 2025. Specifically, our library confection provides the extensible configuration system we use in spaCy and Thinc. The config system allows you to refer to values that will be supplied by arbitrary functions, that e.g. define some neural network model or its sublayers. The functionality in confection is complicated because we aggressively prioritised user experience in the specification, even if it required increased implementation complexity. Confection's original implementation built a dynamic Pydantic v1 schema for function-supplied values ("promises"). We validate the schema before calling any promises, and then validate the schema again after calling all the promises and substituting in their values. The variable-interpolation system adds further difficulties to the implementation, and we have to do it all subclassing the Python built-in configparser, which ties us to implementation choices I'd do differently if I had a clean slate. Here's one summary of Pydantic v1-specific behaviours that the migration to v2 particularly difficult for us. This particular summary was produced during a session with Claude Code Opus 4.6, so nuances of it might be wrong. The full history of attempts at doing this spans over different refactors separated by a few months at a time, so I don't have a full record of all the things that I struggled with. It's possible some details of this summary are incorrect though. The core problem we kept hitting: Pydantic v2 compiles validation schemas upfront and has much stricter immutability. The whole session has been a series of workarounds for this: ``` 1. Schema mutation — v1 let you mutate __fields__ in place; v2 needs model_rebuild() which loses forward ref namespaces, or create_model subclasses which don't propagate to parent schemas. 2. model_dump vs dict — v2 converts dataclasses to dicts, breaking resolved objects. Needed a custom _model_to_dict helper. 3. model_construct drops extras — v2 silently drops fields with extra="forbid", needed manual workarounds. 4. Strict coercion — v2 coerces ndarray to List[Floats1d] via iteration, needed strict=True. 5. Forward refs — Every schema with TYPE_CHECKING imports needs model_rebuild() with the right namespace, and that breaks when confection re-rebuilds later. In order to adjust for behavioural differences like this, I'd refactored confection to build the different versions of the schema in multiple passes, instead of building all the representations together as we'd been doing. However this refactor itself had problems, further complicating the migration. ``` ~I've now bitten the bullet and rolled back the refactor I'd been attempting of confection, and instead replaced the Pydantic validation with custom logic. This allows Confection to remove Pydantic as a dependency entirely.~ Update: Actually I went back and got the refactor working. All much nicer now. I've taken some lengths to explain this because migrating off a dependency after breaking changes can be a sensitive topic. I want to stress that the changes Pydantic made from v1 to v2 are very good, and I greatly appreciate them as a user of FastAPI in our services. It would be very bad for the ecosystem if Pydantic pinned themselves to exactly matching the behaviours they had in v1 just to avoid breaking support for the sort of thing we'd been doing. Instead users who were relying on those behaviours like us should just find some way to adapt --- either vendor the v1 version we need, or change our behaviours, or implement an alternative. I would have liked to do this sooner but we've ultimately gone with the third option.
1 parent 24255bd commit f175a51

33 files changed

+93
-64
lines changed

.github/workflows/cibuildwheel.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,12 @@ on:
77
# ** matches 'zero or more of any character'
88
- 'release-v[0-9]+.[0-9]+.[0-9]+**'
99
- 'prerelease-v[0-9]+.[0-9]+.[0-9]+**'
10+
11+
permissions: {}
12+
1013
jobs:
1114
build_wheels:
12-
uses: explosion/gha-cibuildwheel/.github/workflows/cibuildwheel.yml@main
15+
uses: explosion/gha-cibuildwheel/.github/workflows/cibuildwheel.yml@2c98f757f13d112cf73fcf4b627249f1fffb5aae # main
1316
permissions:
1417
contents: write
1518
actions: read

.github/workflows/explosionbot.yml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ on:
66
- created
77
- edited
88

9+
permissions: {}
10+
911
jobs:
1012
explosion-bot:
1113
if: github.repository_owner == 'explosion'
@@ -15,13 +17,15 @@ jobs:
1517
env:
1618
GITHUB_CONTEXT: ${{ toJson(github) }}
1719
run: echo "$GITHUB_CONTEXT"
18-
- uses: actions/checkout@v4
19-
- uses: actions/setup-python@v4
20+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
21+
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
2022
- name: Install and run explosion-bot
2123
run: |
22-
pip install git+https://${{ secrets.EXPLOSIONBOT_TOKEN }}@github.com/explosion/explosion-bot
24+
git config --global url."https://x-access-token:${EXPLOSIONBOT_TOKEN}@github.com/".insteadOf "https://github.com/"
25+
pip install git+https://github.com/explosion/explosion-bot
2326
python -m explosionbot
2427
env:
28+
EXPLOSIONBOT_TOKEN: ${{ secrets.EXPLOSIONBOT_TOKEN }}
2529
INPUT_TOKEN: ${{ secrets.EXPLOSIONBOT_TOKEN }}
2630
INPUT_BK_TOKEN: ${{ secrets.BUILDKITE_SECRET }}
2731
ENABLED_COMMANDS: "test_gpu,test_slow,test_slow_gpu"

.github/workflows/issue-manager.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,16 @@ on:
1111
types:
1212
- labeled
1313

14+
permissions: {}
15+
1416
jobs:
1517
issue-manager:
18+
permissions:
19+
issues: write
1620
if: github.repository_owner == 'explosion'
1721
runs-on: ubuntu-latest
1822
steps:
19-
- uses: tiangolo/[email protected]
23+
- uses: tiangolo/issue-manager@4d1b7e05935a404dc8337d30bd23be46be8bb8e5 # 0.4.0
2024
with:
2125
token: ${{ secrets.GITHUB_TOKEN }}
2226
config: >

.github/workflows/lock.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616
if: github.repository_owner == 'explosion'
1717
runs-on: ubuntu-latest
1818
steps:
19-
- uses: dessant/lock-threads@v5
19+
- uses: dessant/lock-threads@1bf7ec25051fe7c00bdd17e6a7cf3d7bfb7dc771 # v5
2020
with:
2121
process-only: 'issues'
2222
issue-inactive-days: '30'

.github/workflows/publish_pypi.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ on:
88
types:
99
- published
1010

11+
permissions: {}
12+
1113
jobs:
1214
upload_pypi:
1315
runs-on: ubuntu-latest
@@ -21,7 +23,7 @@ jobs:
2123
# or, alternatively, upload to PyPI on every tag starting with 'v' (remove on: release above to use this)
2224
# if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v')
2325
steps:
24-
- uses: robinraju/release-downloader@v1
26+
- uses: robinraju/release-downloader@daf26c55d821e836577a15f77d86ddc078948b05 # v1
2527
with:
2628
tag: ${{ github.event.release.tag_name }}
2729
fileName: '*'

.github/workflows/spacy_universe_alert.yml

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,16 @@ on:
55
paths:
66
- "website/meta/universe.json"
77

8+
permissions: {}
9+
810
jobs:
911
build:
1012
if: github.repository_owner == 'explosion'
1113
runs-on: ubuntu-latest
1214

1315
steps:
14-
- name: Dump GitHub context
15-
env:
16-
GITHUB_CONTEXT: ${{ toJson(github) }}
17-
PR_NUMBER: ${{github.event.number}}
18-
run: |
19-
echo "$GITHUB_CONTEXT"
20-
21-
- uses: actions/checkout@v4
22-
- uses: actions/setup-python@v4
16+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
17+
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
2318
with:
2419
python-version: '3.10'
2520
- name: Install Bernadette app dependency and send an alert

.github/workflows/tests.yml

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,19 @@ on:
1919
- "*.mdx"
2020
- "website/**"
2121

22+
permissions: {}
23+
2224
jobs:
2325
validate:
2426
name: Validate
2527
if: github.repository_owner == 'explosion'
2628
runs-on: ubuntu-latest
2729
steps:
2830
- name: Check out repo
29-
uses: actions/checkout@v4
31+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
3032

3133
- name: Configure Python version
32-
uses: actions/setup-python@v4
34+
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
3335
with:
3436
python-version: "3.10"
3537

@@ -45,19 +47,19 @@ jobs:
4547
name: Test
4648
needs: Validate
4749
strategy:
48-
fail-fast: true
50+
fail-fast: false
4951
matrix:
5052
os: [ubuntu-latest, windows-latest, macos-latest]
51-
python_version: ["3.10", "3.11", "3.12", "3.13"]
53+
python_version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
5254

5355
runs-on: ${{ matrix.os }}
5456

5557
steps:
5658
- name: Check out repo
57-
uses: actions/checkout@v4
59+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
5860

5961
- name: Configure Python version
60-
uses: actions/setup-python@v4
62+
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
6163
with:
6264
python-version: ${{ matrix.python_version }}
6365

@@ -93,7 +95,7 @@ jobs:
9395
shell: bash
9496

9597
- name: Test import
96-
run: python -W error -c "import spacy"
98+
run: python -W error -W 'ignore:Core Pydantic V1:UserWarning:pydantic' -c "import spacy"
9799

98100
- name: "Test download CLI"
99101
run: |
@@ -154,7 +156,7 @@ jobs:
154156
155157
- name: "Run CPU tests"
156158
run: |
157-
python -m pytest --pyargs spacy -W error
159+
python -m pytest --pyargs spacy -W error -W 'ignore:Core Pydantic V1:UserWarning:pydantic'
158160
if: "!(startsWith(matrix.os, 'macos') && matrix.python_version == '3.11')"
159161

160162
- name: "Run CPU tests with thinc-apple-ops"

.github/workflows/universe_validation.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,19 @@ on:
1313
paths:
1414
- "website/meta/universe.json"
1515

16+
permissions: {}
17+
1618
jobs:
1719
validate:
1820
name: Validate
1921
if: github.repository_owner == 'explosion'
2022
runs-on: ubuntu-latest
2123
steps:
2224
- name: Check out repo
23-
uses: actions/checkout@v4
25+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
2426

2527
- name: Configure Python version
26-
uses: actions/setup-python@v4
28+
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
2729
with:
2830
python-version: "3.7"
2931

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ requires = [
55
"cymem>=2.0.2,<2.1.0",
66
"preshed>=3.0.2,<3.1.0",
77
"murmurhash>=0.28.0,<1.1.0",
8-
"thinc>=8.3.4,<8.4.0",
8+
"thinc>=8.3.12,<8.4.0",
99
"numpy>=2.0.0,<3.0.0"
1010
]
1111
build-backend = "setuptools.build_meta"

requirements.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,19 @@ spacy-legacy>=3.0.11,<3.1.0
33
spacy-loggers>=1.0.0,<2.0.0
44
cymem>=2.0.2,<2.1.0
55
preshed>=3.0.2,<3.1.0
6-
thinc>=8.3.4,<8.4.0
7-
ml_datasets>=0.2.0,<0.3.0
6+
thinc>=8.3.12,<8.4.0
7+
ml_datasets>=0.2.1,<0.3.0
88
murmurhash>=0.28.0,<1.1.0
99
wasabi>=0.9.1,<1.2.0
10-
srsly>=2.4.3,<3.0.0
10+
srsly>=2.5.3,<3.0.0
1111
catalogue>=2.0.6,<2.1.0
1212
typer>=0.3.0,<1.0.0
13-
weasel>=0.4.2,<0.5.0
13+
weasel>=1.0.0,<2.0.0
1414
# Third party dependencies
1515
numpy>=2.0.0,<3.0.0
1616
requests>=2.13.0,<3.0.0
1717
tqdm>=4.38.0,<5.0.0
18-
pydantic>=1.7.4,!=1.8,!=1.8.1,<3.0.0
18+
pydantic>=2.0.0,<3.0.0
1919
jinja2
2020
# Official Python utilities
2121
setuptools
@@ -34,4 +34,4 @@ types-requests
3434
types-setuptools>=57.0.0
3535
ruff>=0.9.0
3636
cython-lint>=0.15.0
37-
confection>=0.0.4,<1.0.0
37+
confection>=1.1.0,<2.0.0

0 commit comments

Comments
 (0)