CIA World Factbooks Archive 1990-2025

A complete, structured archive of the CIA World Factbook spanning 36 years (1990-2025), covering 281 entities with 1,071,603 data fields in a normalized SQLite database.

The CIA World Factbook was discontinued on February 4, 2026. This archive preserves every edition published since 1990 and creates a structured, queryable dataset.

Search the interactive archive | Project documentation | Executive Summary

Data Integrity: No Factbook content is added or altered. The parsing process structures the CIA's raw text into queryable fields — removing formatting artifacts, sectioning headers, and deduplicating noise lines — but the actual data values are exactly as the CIA published them. The only additions to the source data are reference lookup tables (FIPS-to-ISO code mappings, entity classifications, COCOM regional assignments) that sit alongside the original data, not inside it. In FieldValues, a small number of rows are derived by computation from neighboring sub-values (e.g. total life expectancy averaged from male/female in pre-1995 data); these are clearly flagged with IsComputed = 1.

Database Statistics

Metric	Value
Years covered	1990-2025 (36 editions)
Entities	281 (192 sovereign states, 65 territories, 6 disputed, and more)
Country-year records	9,536
Category records	83,682
Data fields	1,071,603
Content size	~662 MB (with FieldValues + FTS)
Field name variants	1,132 mapped to 416 canonical names
Structured sub-values	1,775,588 parsed from raw text (2,599 sub-fields)

Data Sources

Years	Source	Method
1990-1995, 1997-1999	Project Gutenberg	Plain text parsing (4 format variants across the decade)
1996	CIA original + Gutenberg	CIA's own text file from Wayback Machine (replaced truncated Gutenberg data for 7 countries)
2000	Wayback Machine	HTML zip download + classic format parser
2001	Project Gutenberg	Text fallback (HTML zip was corrupted)
2002-2020	Wayback Machine	HTML zip archives from cia.gov, 4 parser generations
2021-2025	factbook/cache.factbook.json	Git history with year-end commit snapshots

Year-by-Year Breakdown

Year	Source	Countries	Fields
1990	Text	249	15,750
1991	Text	247	14,903
1992	Text	264	17,372
1993	Text	266	18,509
1994	Text	266	28,633
1995	Text	266	19,599
1996	Text	266	21,118
1997	Text	266	23,405
1998	Text	266	23,524
1999	Text	266	25,178
2000	HTML	267	25,724
2001	Text	265	27,281
2002	HTML	268	27,430
2003	HTML	268	28,676
2004	HTML	271	28,958
2005	HTML	271	28,728
2006	HTML	274	28,962
2007	HTML	266	29,103
2008	HTML	263	30,755
2009	HTML	260	30,818
2010	HTML	262	30,805
2011	HTML	263	33,635
2012	HTML	271	35,192
2013	HTML	269	36,731
2014	HTML	268	36,680
2015	HTML	268	36,870
2016	HTML	268	36,804
2017	HTML	268	37,046
2018	HTML	268	37,285
2019	HTML	268	37,394
2020	HTML	268	36,687
2021	JSON	260	39,714
2022	JSON	260	37,344
2023	JSON	260	37,558
2024	JSON	260	34,838
2025	JSON	260	32,594

Repository Structure

data/
  master_countries.sql       # 281 canonical entities
  countries.sql              # 9,536 country-year records
  categories.sql             # 83,682 category records
  field_name_mappings.sql    # 1,132 field name standardization rules
  fields/
    country_fields_1990.sql.gz  # Split by year (36 gzipped files)
    ...
    country_fields_2025.sql.gz
schema/
  create_tables.sql          # DDL for all 5 tables
  create_field_values.sql    # DDL for FieldValues (structured sub-values)
etl/
  build_archive.py           # HTML parser (2000-2020)
  load_gutenberg_years.py    # Text parser (1990-2001)
  reload_json_years.py       # JSON loader (2021-2025)
  build_field_mappings.py    # Field name standardization
  classify_entities.py       # Entity type classification
  repair_1996_truncated.py   # CIA original text parser for 7 truncated 1996 countries
  validate_integrity.py      # Data quality checks
  export_to_sqlite.py        # SQL Server -> SQLite export (with FTS5)
  structured_parsing/
    parse_field_values.py    # Decompose text blobs into typed sub-values (55 parsers)
    validate_field_values.py # Validation: spot checks, coverage, numeric stats
    export_field_values_to_sqlite.py  # Export FieldValues + refs to SQLite
    dashboard_preview.py     # Local preview dashboard (8 chart panels)
    DESIGN.md                # Architecture and design document
  stardict/
    build_stardict.py        # Generate StarDict dictionaries for KOReader/GoldenDict
scripts/
  factbook_search.py         # Command-line search utility
  validate_cocom.py          # COCOM region validation
  validate_integrity.py      # Data integrity checks
queries/
  sample_queries.sql         # 18 analytical queries for Power BI / analysis
  search_cli.py              # Command-line search tool
docs/
  DATABASE_SCHEMA.md         # Table definitions and relationships
  ETL_PIPELINE.md            # How the archive was built
  FIELD_EVOLUTION.md         # How CIA field names changed over time
  METHODOLOGY.md             # Complete methodology: parsing, standardization, validation
  screenshots/               # 49 PNGs + 9 animated GIFs of the web application
  index.html                 # GitHub Pages static landing page

ETL Pipeline & Python Scripts

There are two ways to get the data:

Import the pre-built SQL dumps (in data/) — no Python required. See How to Restore below.
Re-run the ETL pipeline from scratch — the Python scripts in etl/ download raw Factbook data directly from their original sources (Wayback Machine, Project Gutenberg, GitHub), parse every format variant, and load the structured results into SQL Server via pyodbc. This is how the archive was originally built.

The raw CIA World Factbook changed format at least 10 times between 1990 and 2025. Every script in etl/ exists because a previous version of the parser broke on a new year's data. The pipeline handles 4 plain-text format variants, 5 HTML layout generations, and a final JSON era — each requiring its own parser.

Scripts

Script	Lines	Years	What it does
`build_archive.py`	986	2000-2020	Downloads HTML zips from the Wayback Machine, detects which of 5 HTML layouts each year uses, and parses fields. Handles CDX API fallback when downloads fail, filters template/print pages, and deduplicates entries.
`load_gutenberg_years.py`	1,043	1990-2001	Parses plain-text Gutenberg editions with 4 distinct format variants (tagged, asterisk, at-sign, colon). Includes fuzzy country name matching to handle aliases like "Burma" and "Ivory Coast."
`reload_json_years.py`	413	2021-2025	Checks out year-end git commits from the factbook/cache.factbook.json repo and loads structured JSON. Strips embedded HTML from content fields.
`build_field_mappings.py`	783	All	Maps 1,132 raw field name variants to 416 canonical names using a 7-rule system: identity, dash normalization, CIA renames, consolidation, country-specific entries, noise detection, and manual review. Uses case-sensitive collation (`COLLATE Latin1_General_CS_AS`) to ensure case variants get separate mapping rows. Pipe (`\|`) delimiters in Content for sub-field boundaries across all eras.
`classify_entities.py`	283	All	Auto-classifies 281 entities into 9 types (sovereign, territory, disputed, etc.) based on Dependency Status and Government Type fields, with hardcoded overrides for edge cases.
`repair_1996_truncated.py`	189	1996	Parses CIA's original `wfb-96.txt.gz` (page-header format with centered country/section names) and replaces truncated Gutenberg entries for 7 countries: Venezuela, Armenia, Greece, Luxembourg, Malta, Monaco, Tuvalu.
`validate_integrity.py`	296	All	Read-only validation suite with 10 checks: field count benchmarks, US population/GDP ground truth, year-over-year consistency, source provenance, NULL detection, and unmapped field name gap analysis.
`structured_parsing/parse_field_values.py`	1,400+	All	Decomposes 1,071,603 raw text blobs into 1,775,588 typed sub-values using 55 field-specific parsers + generic fallback. Each row includes a `SourceFragment` column showing the exact text slice that produced the value, and an `IsComputed` flag distinguishing source-extracted values (0) from parser-derived values (1). Extracts land/water area, male/female life expectancy, age brackets, sex ratios, literacy, budget, elevation, dependency ratios, GDP composition, CO2 emissions, water/sanitation, and more.
`structured_parsing/validate_field_values.py`	231	All	Validates FieldValues against the source database: row counts, coverage, numeric extraction rate, spot checks against known ground truth values (US population, Russia area, Japan military).
`structured_parsing/export_field_values_to_sqlite.py`	269	All	Exports FieldValues + all reference tables to a self-contained SQLite database (factbook.db, ~662 MB with FTS5 + ISOCountryCodes when using `--webapp`). Includes post-export integrity check that detects and backfills any unmapped field names.

Why parsing was so difficult

The CIA never maintained a stable schema. Every few years the HTML layout changed completely, field names were renamed without notice, and entire categories were restructured. Examples:

1990-1999 (plain text): Four different formatting conventions across the decade. 1990-1993 used Country: Name headers with indented fields. 1994 introduced tagged markers (_@_, _*_, _#_). 1996 switched to bare section headers with no markers at all. 1999 changed the delimiter scheme again. Each variant required its own regex-based parser.
2000-2020 (HTML): The CIA redesigned the Factbook website at least 5 times. The 2000 edition used <b>FieldName:</b> inline formatting. By 2004 it switched to <td class="FieldLabel"> table layouts. 2008 introduced CollapsiblePanel JavaScript widgets. 2014 changed to expand/collapse <h2> sections. 2017 moved to field-anchor <div> structures. A parser that worked on 2006 data would produce garbage on 2010 data.
2001: The Wayback Machine's HTML zip for 2001 was corrupted, so parsing fell back to the Project Gutenberg plain-text edition — the only year where the HTML and text pipelines had to swap.
Field name drift: The CIA renamed fields silently over the decades. "GDP - real growth rate" became "Real GDP growth rate." "Telephones" split into "Telephones - fixed lines" and "Telephones - mobile cellular." Oil sub-fields were consolidated into "Petroleum." The build_field_mappings.py script maps all 1,132 variants through 7 rule layers to maintain time-series continuity.
Country-specific noise: 1990s editions embedded government body names (parliaments, assemblies, courts) as top-level fields. Dissolved countries (Serbia and Montenegro, Netherlands Antilles) appear and disappear. Turkish Cyprus, Malaysian states, and Netherlands Antilles islands show up as sub-entries that need special handling.

See docs/METHODOLOGY.md and docs/ETL_PIPELINE.md for the full technical details.

How to Restore

Prerequisites

SQL Server 2017+ (or Azure SQL)
ODBC Driver 18 for SQL Server
Python 3.8+ with pyodbc (for ETL scripts and CLI search tool)

Steps

Create the database:
```
CREATE DATABASE CIA_WorldFactbook;
```

Run the schema script:

sqlcmd -S localhost -d CIA_WorldFactbook -i schema/create_tables.sql

Import data (in order):

sqlcmd -S localhost -d CIA_WorldFactbook -i data/master_countries.sql
sqlcmd -S localhost -d CIA_WorldFactbook -i data/countries.sql
sqlcmd -S localhost -d CIA_WorldFactbook -i data/categories.sql
sqlcmd -S localhost -d CIA_WorldFactbook -i data/field_name_mappings.sql

Import field data (decompress and import each year):

cd data/fields
# On Linux/macOS:
for f in *.sql.gz; do gunzip -k "$f" && sqlcmd -S localhost -d CIA_WorldFactbook -i "${f%.gz}"; done
# On Windows (PowerShell):
Get-ChildItem *.sql.gz | ForEach-Object {
    $sql = $_.FullName -replace '\.gz$',''
    [System.IO.Compression.GZipStream]::new(
        [System.IO.File]::OpenRead($_.FullName),
        [System.IO.Compression.CompressionMode]::Decompress
    ).CopyTo([System.IO.File]::Create($sql))
    sqlcmd -S localhost -d CIA_WorldFactbook -i $sql
}

Verify:

SELECT COUNT(*) FROM CountryFields;  -- Should return 1,071,603

Structured Field Values Database (NEW)

The raw text in CountryFields.Content has been decomposed into 1,775,588 typed sub-values across 2,599 distinct sub-fields using 55 dedicated parsers. Each row includes a SourceFragment showing the exact text slice that produced the value, and an IsComputed flag distinguishing values extracted directly from source text (0) from values derived by computation (1, e.g. averaging male/female life expectancy for pre-1995 data). Sub-field boundaries in Content use pipe (|) delimiters for unambiguous parsing. This enables SQL queries that were previously impossible without per-query regex — for example, ranking countries by land-vs-water ratio, comparing male vs female life expectancy, or charting budget deficit trends.

Download: factbook.db (~662 MB) from Release v3.5 — single self-contained database with all tables, FTS5 search index, and ISO country codes.

Live dashboard: worldfactbookarchive.org/analysis/structured-data — interactive charts showing new queries with SQL and source data tabs.

The database is self-contained with all reference tables included. Example query:

-- Top 10 countries by land area (2025) with land/water split
SELECT c.Name,
  MAX(CASE WHEN fv.SubField = 'land' THEN fv.NumericVal END) AS land_sqkm,
  MAX(CASE WHEN fv.SubField = 'water' THEN fv.NumericVal END) AS water_sqkm
FROM FieldValues fv
JOIN CountryFields cf ON fv.FieldID = cf.FieldID
JOIN Countries c ON cf.CountryID = c.CountryID
JOIN FieldNameMappings fnm ON cf.FieldName = fnm.OriginalName
WHERE c.Year = 2025 AND fnm.CanonicalName = 'Area'
GROUP BY c.Name
ORDER BY land_sqkm DESC LIMIT 10;

See etl/structured_parsing/DESIGN.md for the full architecture, parser registry, and sub-field catalog.

StarDict Dictionaries (for KOReader / GoldenDict)

Offline dictionaries in StarDict format for use with KOReader, GoldenDict-ng, and other StarDict-compatible apps. Each (country, field) pair is its own dictionary entry (e.g., "Afghanistan - Population"), matching the per-field format used by KOReader's existing factbook dictionary. Entries are searchable by country name, ISO Alpha-2 code, or FIPS 10-4 code combined with the field name.

36 dictionaries (one per year, 1990-2025), ~2.1 million entries total, ~130 MB compressed.

Download: StarDict tarballs from Release v3.5-stardict — grab individual years or all 36.

Generate locally:

pip install pyglossary python-idzip
python etl/stardict/build_stardict.py                    # all 36 years
python etl/stardict/build_stardict.py --years 2025       # just 2025

Output goes to data/stardict/. Each dictionary is a directory with .ifo, .idx, .dict.dz, and .syn files. Copy the directory into your dictionary app's data folder (e.g., koreader/data/dict/ for KOReader).

See etl/stardict/README.md for format details, synonym resolution, validation (16/16 tests), and installation guides.

Alternative: SQLite (No SQL Server Required)

A pre-built SQLite database (factbook.db, ~662 MB) is available as a release download for users who don't need SQL Server. The file exceeds GitHub's 100 MB file-size limit, so it is not included in the repository itself.

Download: factbook.db from the latest release

Place the downloaded file at data/factbook.db in the project root. SQLite requires no installation — Python's built-in sqlite3 module can query it directly.

	SQL Server	SQLite
Setup	Install SQL Server + ODBC driver, run schema + import scripts	Download one `.db` file
Size	~263 MB across 36 gzipped SQL files	~662 MB single file
Query tool	SSMS, sqlcmd, pyodbc	Python `sqlite3`, DB Browser, any SQLite client
Best for	Power BI, enterprise analytics, large-scale joins	Quick exploration, scripting, lightweight apps
Schema	Identical 5-table structure	Identical 5-table structure

The SQLite database contains the same tables (including FieldValues with IsComputed), same indexes, and same 1,071,603 fields as the SQL Server version, plus an FTS5 full-text search index for fast keyword and boolean search. This is what the live webapp runs on.

Note on numeric values: The CountryFields table stores raw text content exactly as published by the CIA. There is no pre-computed Value column. The CSV and Excel exports on the webapp parse numeric values at download time using regex extraction. To extract numeric values yourself, join through FieldNameMappings and apply a pattern for the field you need:

import sqlite3, re

db = sqlite3.connect("factbook.db")
db.row_factory = sqlite3.Row

# Example: extract Population as an integer
rows = db.execute("""
    SELECT c.Year, mc.CanonicalName, cf.Content
    FROM CountryFields cf
    JOIN Countries c ON cf.CountryID = c.CountryID
    JOIN MasterCountries mc ON c.MasterCountryID = mc.MasterCountryID
    JOIN FieldNameMappings fm ON cf.FieldName = fm.OriginalName
    WHERE fm.CanonicalName = 'Population'
    ORDER BY mc.CanonicalName, c.Year
""").fetchall()

for r in rows:
    m = re.search(r'[\d,]{5,}', r["Content"])
    value = int(m.group().replace(",", "")) if m else None
    if value:
        print(f"{r['CanonicalName']} ({r['Year']}): {value:,}")

Common extraction patterns for other fields:

Field	Pattern	Example
Population	`re.search(r'[\d,]{5,}', text)` → remove commas, cast to int	331,449,281
Real GDP per capita	`re.search(r'\$([\d,]+)', text)` → remove commas, cast to int	$63,500
Life expectancy at birth	`re.search(r'total population:\s*([\d.]+)', text)` → float	78.5 years
Population growth rate	`re.search(r'(-?[\d.]+)%', text)` → float	0.7%
Military expenditures	`re.search(r'([\d.]+)%\sof\sG[DN]P', text)` → float	3.45% of GDP
Area	`re.search(r'total\s:?\s([\d,]+)\ssq\skm', text)` → remove commas, cast to int	9,833,517 sq km
Birth/Death rate	`re.search(r'([\d.]+)\s(?:births\|deaths)\s/\s*1,000', text)` → float	11.0 births/1,000

Live Web Application

The archive is served as a FastAPI + Jinja2 web application at worldfactbookarchive.org. Key features:

Full-text search with Library of Congress boolean syntax (AND, OR, NOT, "phrase", truncation)
Library — card-based overview of the entire archive collection
Browse archive by year (1990-2025) and country (281 entities)
Country profiles with category drill-down, field time series, and data export (CSV, Excel, PDF)
Text diff — side-by-side year comparison with highlighted changes for any country
Country dictionary with ISO codes, entity types, and COCOM region assignments
Interactive maps gallery — browsable country maps with search and filtering
Data export — CSV, Excel, print-ready PDF reports, and a complete analysis-ready dataset download
Intelligence analysis dashboards powered by Mapbox GL JS and Apache ECharts:
- Regional Dashboard — global choropleth with 6 COCOM regions, hover popups, capital city markers, and click-to-zoom
- COCOM Region Detail — per-region map with ranked bar charts for GDP, population, military spending
- Timeline Map — animated choropleth across 36 years with multi-country time series and playback speed controls
- Map Compare — two synced maps for side-by-side year comparison with shared color scale
- Communications Analysis — internet, mobile, broadband penetration with digital divide indicators
- Global Trends — multi-indicator time series for any country
- Global Rankings — sortable country rankings by any indicator with year selection
- Change Detection — year-over-year field changes with trend charts and region filtering
- Field Explorer — browse all 414 canonical data fields with coverage statistics across 36 years
- Advanced Analytics Explorer — correlation scatter, heatmap matrix, ranking race, and treemap visualizations
- Scatter Plot Analysis — multi-indicator scatter plots with regression lines and outlier detection
- Query Builder — custom analytical queries across all indicators with flexible filters
- Dashboard Builder — custom multi-panel dashboards with drag-and-drop indicator selection
- Trade Networks — geographic and force-directed graph of global import/export relationships
- Organization Networks — international organization memberships and alliance visualization
- Geopolitical Atlas — territorial disputes, infrastructure mapping, and missile site tracking
- Political Stability — government type choropleth, regime change tracking, and regional peer comparison
- Natural Resources & Economy — resource production maps, commodity scatter plots, and country profiles
- Demographics — population pyramids with country comparison and year-over-year animation
- Dissolved States — historical entities no longer in the Factbook with archived indicator data
- Data Coverage — field availability matrix across all 36 years and 281 entities
- Structured Field Data — interactive dashboard of 1,775,588 parsed sub-values with Chart/SQL/Source tabs showing exactly where each number was extracted from
CIA Studies in Intelligence — searchable archive of declassified CIA analytical journal articles with dashboard and browse views
World Leaders Database — comprehensive leadership data with governance analysis, power concentration metrics, security apparatus tracking, and interactive choropleth maps
Intelligence dossiers following ICD 203 analytic standards
Regional threat briefs with instability and security indicators
Factbook Quiz — 4 modes: country identification, capital cities, higher-or-lower, and flag recognition
Bug reporting via GitHub Issues integration

Stack: Python 3.12, FastAPI, Jinja2, SQLite (FTS5), Mapbox GL JS v3, Apache ECharts 5, deployed on Fly.io.

The web application source code is maintained in a separate private repository.

Entity Types

Type	Count	Description
sovereign	192	Independent states
territory	65	Dependencies, overseas territories
misc	7	Oceans, World, EU
disputed	6	Kosovo, Gaza Strip, West Bank, etc.
crown_dependency	3	Jersey, Guernsey, Isle of Man
freely_associated	3	Marshall Islands, Micronesia, Palau
special_admin	2	Hong Kong, Macau
dissolved	2	Netherlands Antilles, Serbia and Montenegro
antarctic	1	Antarctica

Field Name Standardization

The CIA renamed many fields over the 36-year span. The FieldNameMappings table maps 1,132 raw field name variants to 416 canonical names (case-sensitive, so Natural hazards and natural hazards get separate rows):

Mapping Type	Count	Description
identity	184	Modern field names (unchanged)
rename	159	CIA renamed the field (e.g. "GDP - real growth rate" -> "Real GDP growth rate")
dash_format	64	Formatting differences (single vs double dashes)
consolidation	48	Sub-fields merged into parents (e.g. Oil production/consumption -> Petroleum)
country_specific	354	Regional sub-entries, government body names
noise	281	Parser artifacts, fragments (flagged IsNoise=1)

Sample Queries

See queries/sample_queries.sql for 18 ready-to-use analytical queries, including:

Population time series (single country)
GDP comparison across G7 nations
Military expenditure trends
Internet users growth
Countries that appear/disappear over time
Consolidated petroleum sub-field view

Screenshots



Homepage — Database statistics, navigation, and live search	Full-Text Search — Keyword search across 1,071,603 fields

Country Profile — Complete factbook data by category	Library — Card-based overview of the entire archive collection

Regional Dashboard — Mapbox GL JS choropleth with COCOM drill-down	Geopolitical Atlas — Territorial disputes, infrastructure, and missile sites

Timeline Map — Animated Mapbox choropleth 1990-2025 with time series	Communications Analysis — Internet, mobile, broadband indicators

Trade Networks — Geographic and force-directed graph of global trade	Global Rankings — Sortable country rankings by any indicator

Political Stability — Government type choropleth and regime change tracking	Demographics — Population pyramids with country comparison and animation

CIA Studies in Intelligence — Searchable archive of declassified CIA journal articles	World Leaders Map — Interactive choropleth of governance metrics

49 screenshots and 9 animated GIF demos are available in docs/screenshots/. See the full gallery on the project page.

License

The CIA World Factbook is a work of the United States Government and is in the public domain (17 U.S.C. § 105). This archive is released under CC0 1.0 Universal (Public Domain Dedication).

About

The CIA World Factbook was first published in 1962 as a classified document and made publicly available starting in 1971. The online edition launched in the 1990s and was updated weekly until its discontinuation on February 4, 2026.

This archive was built as a preservation project to ensure this unique longitudinal dataset remains accessible for researchers, analysts, and the public.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
CIA_Factbook_Tars		CIA_Factbook_Tars
data		data
docs		docs
etl		etl
queries		queries
schema		schema
scripts		scripts
tools/screenshot		tools/screenshot
wiki		wiki
.gitignore		.gitignore
.zenodo.json		.zenodo.json
LICENSE		LICENSE
README.md		README.md
README.txt		README.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CIA World Factbooks Archive 1990-2025

Database Statistics

Data Sources

Year-by-Year Breakdown

Repository Structure

ETL Pipeline & Python Scripts

Scripts

Why parsing was so difficult

How to Restore

Prerequisites

Steps

Structured Field Values Database (NEW)

StarDict Dictionaries (for KOReader / GoldenDict)

Alternative: SQLite (No SQL Server Required)

Live Web Application

Entity Types

Field Name Standardization

Sample Queries

Screenshots

License

About

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CIA World Factbooks Archive 1990-2025

Database Statistics

Data Sources

Year-by-Year Breakdown

Repository Structure

ETL Pipeline & Python Scripts

Scripts

Why parsing was so difficult

How to Restore

Prerequisites

Steps

Structured Field Values Database (NEW)

StarDict Dictionaries (for KOReader / GoldenDict)

Alternative: SQLite (No SQL Server Required)

Live Web Application

Entity Types

Field Name Standardization

Sample Queries

Screenshots

License

About

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages