Set repository basepath to '~/.cache/skyllh' by default#307
Set repository basepath to '~/.cache/skyllh' by default#307tomaskontrimas wants to merge 3 commits intomasterfrom
'~/.cache/skyllh' by default#307Conversation
There was a problem hiding this comment.
Pull request overview
This PR sets a non-None default repository base path intended to make public-data analyses work out-of-the-box, and updates related docstrings to describe the new default.
Changes:
- Set
cfg['repository']['base_path']default to~/.cache/skyllh(expanded) in the global base config. - Update dataset and dataset-collection docstrings to mention the new default base path behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
skyllh/core/config.py |
Changes the default repository base path from None to a user-cache location. |
skyllh/core/dataset.py |
Updates Dataset constructor docs to mention the default base path. |
skyllh/datasets/i3/TestData.py |
Updates dataset-collection docs to mention the default base path. |
skyllh/datasets/i3/PublicData_10y_ps.py |
Updates dataset-collection docs to mention the default base path. |
skyllh/datasets/i3/PublicData_10y_ps_wMC.py |
Updates dataset-collection docs to mention the default base path. |
skyllh/datasets/i3/PublicData_14y_ps.py |
Updates dataset-collection docs to mention the default base path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'repository': { | ||
| # A base path of repository datasets. | ||
| 'base_path': None, | ||
| 'base_path': Path('~/.cache/skyllh').expanduser(), |
There was a problem hiding this comment.
_BASECONFIG['repository']['base_path'] is now a Path object. Most of the codebase (and the surrounding docstrings/type hints) treat config values as plain serializable primitives (here: str | None). Keeping a Path in the config can surprise callers (e.g., string concatenation, YAML/JSON serialization) and diverges from the documented str type. Consider storing this as a string (e.g., expand ~ to a str) and only converting to Path at the call site when needed.
| 'base_path': Path('~/.cache/skyllh').expanduser(), | |
| 'base_path': str(Path('~/.cache/skyllh').expanduser()), |
| If set to ``None``, ``cfg['repository']['base_path']`` is used, | ||
| which defaults to ``~/.cache/skyllh``. |
There was a problem hiding this comment.
The base_path parameter docstring now refers to cfg['repository']['base_path'], but cfg is not a documented parameter of Dataset.__init__ (it’s only passed implicitly via **kwargs to HasConfig). This makes the docstring harder to follow and introduces an undefined name in the parameter docs. Consider rephrasing to reference the dataset’s config (self.cfg['repository']['base_path']) or the Config setting in general, without relying on an undocumented cfg symbol.
| If set to ``None``, ``cfg['repository']['base_path']`` is used, | |
| which defaults to ``~/.cache/skyllh``. | |
| If set to ``None``, the dataset configuration's repository base | |
| path setting is used, which defaults to ``~/.cache/skyllh``. |
Provide a default path that is mostly useful for public data analyses as i3skyllh overwrites it to
/data/ana/analyses(avoiding data copy).After this is merged we should update tutorial notebooks in the
documentationbranch, removing hardcoded paths.