goldilocks-core is a research-grade Python package for organizing and recommending DFT calculation inputs from structures, machine-learning models, and parsed pseudopotentials.
The project is designed around domain-focused modules such as k-mesh construction, pseudopotential parsing, recommendation advisors, and thin CLI entry points.
goldilocks-core currently focuses on two main workflows:
- recommending k-mesh settings from structure-aware logic and ML-predicted
k_index - parsing UPF pseudopotential files and building local pseudopotential registries
The package is intended to grow toward code- and task-aware input recommendation, where structure, pseudopotential choice, and calculation settings can be coordinated in a clean and testable way.
- generate candidate k-distance values from reciprocal lattice geometry
- convert k-distance values into Monkhorst-Pack-style meshes
- build indexed
KMeshEntryobjects - compute mesh-related metadata such as k-point density intervals and reduced-k-point counts
- map ML-predicted
k_indexvalues onto concrete k-mesh recommendations - expose a minimal CLI entry point for k-mesh recommendation
- parse real UPF files into structured metadata
- support both attribute-style and text-style
PP_HEADER - supplement header parsing with
PP_INFOwhen needed - normalize key fields such as:
elementpseudo_typefunctionalrelativisticz_valence
- scan a local pseudo library into a list of
PseudoMetadata - filter registry entries by element
This project uses uv for environment and dependency management.
Clone the repository and sync the environment:
uv syncIf you want development tools as well:
uv sync --group devfrom pathlib import Path
from goldilocks_core.advisors import advise_kpoints
from goldilocks_core.io.structures import load_structure
from goldilocks_core.shared.types import ModelSpec
structure = load_structure("path/to/structure.cif")
spec = ModelSpec(
name="local-kmesh-model",
version="v0",
model_type="random_forest",
target="k_index",
feature_set="cslr",
source="local",
location="path/to/model.joblib",
revision=None,
)
advice = advise_kpoints(structure, spec)
print(advice.grid)from goldilocks_core.pseudo.parse_upf import parse_upf_metadata
metadata = parse_upf_metadata("path/to/pseudo.UPF")
print(metadata)from goldilocks_core.pseudo.registry import load_pseudo_metadata, filter_by_element
metadata_list = load_pseudo_metadata("path/to/pseudopotentials")
si_pseudos = filter_by_element(metadata_list, "Si")
print(len(metadata_list))
print(len(si_pseudos))The current Python-facing entry points are:
goldilocks_core.advisors.advise_kpointsgoldilocks_core.kmeshgoldilocks_core.io.structures.load_structure
goldilocks_core.pseudo.parse_upf.parse_upf_metadatagoldilocks_core.pseudo.registry.load_pseudo_metadatagoldilocks_core.pseudo.registry.filter_by_element
goldilocks_core.shared.types
This package is intended to be notebook-friendly, but the package modules and tests should remain the source of truth rather than notebook-only logic.
A minimal k-mesh CLI entry point is available.
Show help:
uv run goldilocks-kmesh --helpCurrent usage pattern:
uv run goldilocks-kmesh path/to/structure.cif --model path/to/model.joblibAt this stage, the CLI is intentionally small and thin. The main logic lives in the Python package APIs.
src/goldilocks_core/
├── advisors/
├── cli/
├── io/
├── kmesh.py
├── ml/
├── pseudo/
└── shared/
-
advisors/Coordinates recommendation workflows and policy decisions. -
cli/Exposes thin command-line entry points. -
io/Handles structure loading and normalization. -
kmesh.pyContains k-mesh construction and interval logic. -
ml/Contains feature extraction, model loading, and inference utilities. -
pseudo/Contains UPF parsing and local pseudopotential registry logic. -
shared/Contains reusable shared data models and type definitions.
For a fuller explanation, see docs/architecture.md.
Run the test suite:
uv run pytestRun formatting and checks:
uv run pre-commit run --all-filesA typical development loop is:
uv run pytest
uv run pre-commit run --all-filesThis project uses two complementary validation styles:
- portable tests built from synthetic fixtures under
tmp_path - local exploratory validation against real pseudopotential libraries and notebook experiments
When a local exploration reveals an important behavior, it should be turned into a focused regression test whenever possible.
This project is under active design and development.
The current codebase already has:
- a working ML-driven k-mesh recommendation path
- real UPF parsing across multiple pseudo-library styles
- a local pseudo registry foundation
- an evolving domain-oriented package structure
The next major steps are expected to include:
- richer pseudo registry filtering
- pseudopotential selection logic
- electron metadata derived from selected pseudos
- clearer user-facing workflows for local pseudo management