cacheR tracks your data and code so you don't have to
Also available as a Python package: pycacheR (pip install pycacheR)
It automatically checks for changes in code and input data and re-runs the code if necessary.
It's like snakemake/nextflow, but on the fly
-
Keeping the analysis up to date
-
Saving time
-
Not using obsolete results
-
Reusing heavy computations safely and transparently
# From GitHub
remotes::install_github("BIMSBbioinfo/cacheR")# From Guix
guix install -f https://raw.githubusercontent.com/BIMSBbioinfo/cacheR/main/guix.scmThe package introduces:
cacheFile()— a caching decorator%@%— an operator for applying decoratorscacheTree_*()— functions for inspecting the cache tree
library(cacheR)
cache_dir <- file.path(tempdir(), "cache_test")
dir.create(cache_dir, recursive = TRUE, showWarnings = FALSE)
# Define cached functions
inner <- cacheFile(cache_dir) %@% function(x) x + 1
outer <- cacheFile(cache_dir) %@% function(x) inner(x) * 2
# Execute
outer(3)
#> 8
A cached call is reused only if all of the following are unchanged:
- The function body (including inline code changes)
- The arguments (up to hashing / comparison rules)
- The tracked files / directories, where relevant
- The package versions of any non-base functions used
- The environment variables used by the function
If any of these change, cacheR invalidates the old entry and recomputes.
-
Package boundaries:
cacheR stops tracking when it hits a function imported from a package.
Instead, it records the package name and version. It does not inspect the internals of those functions. -
Native code / C / external tools:
C/C++ code and external tools (e.g.system("bwa mem ...")) are not tracked. If they change, cacheR will not notice unless their inputs / outputs change in a tracked place. -
Side effects:
Functions with side effects (writing to global variables, random seeds, databases, etc.) are not fully “safe” to cache. Prefer pure, data-in/data-out functions.
- Highly stateful / interactive code where caching would confuse you more than it helps
- Situations where you need full workflow orchestration, scheduling, and cluster execution (use snakemake/nextflow/targets/etc. instead)
