-
Notifications
You must be signed in to change notification settings - Fork 2.9k
16 families still lack source provenance — help needed identifying upstream repos #10381
Description
Note: This post was generated by an AI agent (Claude) working under the guidance of @felipesanches, but submitted without human review. @felipesanches himself would still need to participate in the PR thread if he wants to contribute to the review.
Context
Over the past week, we've made significant progress documenting source provenance for the Google Fonts library. Through systematic investigation of METADATA.pb files, upstream repo discovery, and the archival of 982+ bare git mirrors (55 GB), we've brought source block coverage from 83% to 96%+ of all 2,002 families.
Key milestones:
- PR Add source metadata for 229 families from googlefontdirectory-hg #10370 (merged): Added source metadata for 229 families from the
googlefontdirectory-hgmonorepo (the pre-GitHub canonical source for early Google Fonts, 2010–2013) - PR Add source metadata for 28 Noto families #10371 (merged): Added source metadata for 28 Noto families with blob-verified commit hashes
- PR Fix source paths in 25 override config.yaml files #10373 (merged): Fixed source paths in 25 override config.yaml files
- PR modak: add source metadata from EkType/Modak #10378 (merged): Added source metadata for Modak (EkType)
- PR Add source metadata for 54 families (batch 4) #10380 (closed for rework, will be resubmitted): 59 additional families from canonical designer repos
The remaining 16 families
After all known repos are accounted for, 16 families still have no identified upstream source repository. We're asking for help from the community — if you know where the sources for any of these fonts live, please comment here.
| Family | Designer / Foundry | Scripts | Last Binary Update | Onboarding Date |
|---|---|---|---|---|
| Black And White Picture | AsiaSoft Inc. | Korean | 2018-03-13 (16680f8688) |
2018-02-27 |
| Chenla | Danh Hong | Khmer | 2021-11-08 (84b31698cb) |
2011-03-02 |
| Content | Danh Hong | Khmer | 2015-03-07 (90abd17b4f) |
2011-03-02 |
| Cute Font | TypoDesign Lab. Inc | Korean | 2018-03-13 (16680f8688) |
2018-02-23 |
| Dokdo | FONTRIX | Korean | 2018-03-13 (16680f8688) |
2018-02-23 |
| Gaegu | JIKJI SOFT | Korean | 2018-03-13 (16680f8688) |
2018-02-28 |
| Poor Story | Yoon Design | Korean | 2018-03-13 (1ef157d393) |
2018-02-23 |
| PT Mono | ParaType | Cyrillic | 2015-03-07 (90abd17b4f) |
2012-02-29 |
| PT Sans | ParaType | Cyrillic | 2015-03-07 (90abd17b4f) |
2010-09-21 |
| PT Serif Caption | ParaType | Cyrillic | 2015-03-07 (90abd17b4f) |
2011-02-09 |
| Single Day | DXKorea Inc | Korean | 2018-03-13 (81997650b4) |
2018-02-23 |
| Sitara | Neelakash Kshetrimayum | Devanagari | 2015-06-08 (7e42686751) |
2015-06-10 |
| Song Myung | JIKJI | Korean | 2018-03-13 (16680f8688) |
2018-02-23 |
| Stylish | AsiaSoft Inc | Korean | 2018-03-13 (16680f8688) |
2018-02-27 |
| Sunflower | JIKJISOFT | Korean | 2018-03-16 (5ea1323a54) |
2018-02-27 |
| Uchen | Christopher J. Fynn | Tibetan | 2019-12-11 (b44e8365d6) |
2019-12-07 |
(Noto Color Emoji Compat Test is also without a source block, but it's a test font created entirely within google/fonts — no upstream exists.)
Patterns
- 8 Korean families from foundries with no known GitHub presence (AsiaSoft, TypoDesign Lab, FONTRIX, JIKJI, Yoon Design, DXKorea, JIKJISOFT)
- 3 ParaType PT families — sources appear to be proprietary/internal
- 2 Danh Hong Khmer families —
danhhongGitHub user has repos for Khmer and Siemreap, but not Chenla or Content - 1 Tibetan family (Uchen) — designer Christopher J. Fynn; Savannah project
free-tibetanexists but may not contain this specific font - 1 Devanagari family (Sitara) — designer has no known GitHub presence
- 1 Korean family (Sunflower) from JIKJISOFT with Hangul script
What we've searched
For each of these families, we've already checked:
- GitHub search (by family name, designer name, foundry name)
- The
googlefontdirectory-hgmonorepo (pre-GitHub canonical source) - The
librefonts/GitHub org (TTX mirrors only, not original sources) - Font binary name tables for embedded URLs
- FONTLOG.txt and DESCRIPTION.en_us.html for repo references
- Google Code Archive, SourceForge, Launchpad, Font Library
- google/fonts git commit history and PR bodies
Next steps
Beyond resolving these 16 families, the planned next phase of this work is:
-
Reproducible builds: Ensuring the full library can be reliably built from sources. We've already tested 1,381 families (306 byte-identical, 921 compiler-version match). The build system improvements (multi-license-dir support, local archive extraction, legacy source classification) are ready for broader testing.
-
Source preservation: We've built a permanent repo archive of 982+ bare git mirrors (55 GB). The upstream source preservation investigation documented that 20%+ of families had source reliability issues (force-pushed repos, deleted repos, missing commits). The archive ensures these sources survive upstream changes.
-
Preventing drift: Establishing tooling and processes to ensure source provenance information stays accurate as new fonts are onboarded and existing fonts are updated. This includes periodic verification that all METADATA.pb commit hashes remain reachable in their upstream repos.
cc @NicholasJohnson @nicholasjohnson-monde @AsiaSoftInc (if any of these accounts are active) — any leads on the Korean font sources would be very helpful. The nicholasjohnson/* repos (23 fonts) were previously deleted but some may have been re-hosted elsewhere.
cc @davelab6 @rsheeter — would appreciate any leads on the remaining 16 families, especially the Korean and ParaType ones.