cacheR

cacheR tracks your data and code so you don't have to

Also available as a Python package: pycacheR (pip install pycacheR)

What does cacheR do?

It automatically checks for changes in code and input data and re-runs the code if necessary.

It's like snakemake/nextflow, but on the fly

What is it useful for?

Keeping the analysis up to date
Saving time
Not using obsolete results
Reusing heavy computations safely and transparently

Installation

# From GitHub
remotes::install_github("BIMSBbioinfo/cacheR")

# From Guix
guix install -f https://raw.githubusercontent.com/BIMSBbioinfo/cacheR/main/guix.scm

Basic usage

The package introduces:

cacheFile() — a caching decorator
%@% — an operator for applying decorators
cacheTree_*() — functions for inspecting the cache tree

library(cacheR)

cache_dir <- file.path(tempdir(), "cache_test")
dir.create(cache_dir, recursive = TRUE, showWarnings = FALSE)

# Define cached functions
inner <- cacheFile(cache_dir) %@% function(x) x + 1
outer <- cacheFile(cache_dir) %@% function(x) inner(x) * 2

# Execute
outer(3)
#> 8

How does cacheR decide to recompute?

A cached call is reused only if all of the following are unchanged:

The function body (including inline code changes)
The arguments (up to hashing / comparison rules)
The tracked files / directories, where relevant
The package versions of any non-base functions used
The environment variables used by the function

If any of these change, cacheR invalidates the old entry and recomputes.

Limitations & caveats

Package boundaries:
cacheR stops tracking when it hits a function imported from a package.
Instead, it records the package name and version. It does not inspect the internals of those functions.
Native code / C / external tools:
C/C++ code and external tools (e.g. system("bwa mem ...")) are not tracked. If they change, cacheR will not notice unless their inputs / outputs change in a tracked place.
Side effects:
Functions with side effects (writing to global variables, random seeds, databases, etc.) are not fully “safe” to cache. Prefer pure, data-in/data-out functions.

When you probably shouldn’t use cacheR

Highly stateful / interactive code where caching would confuse you more than it helps
Situations where you need full workflow orchestration, scheduling, and cluster execution (use snakemake/nextflow/targets/etc. instead)

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
Meta		Meta
R		R
doc		doc
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
._.DS_Store		._.DS_Store
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
EXPANSION_PROPOSALS.md		EXPANSION_PROPOSALS.md
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
Rplots.pdf		Rplots.pdf
_pkgdown.yml		_pkgdown.yml
cacheR		cacheR
cacheR.Rproj		cacheR.Rproj
guix.scm		guix.scm
todo		todo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cacheR

What does cacheR do?

What is it useful for?

Installation

Basic usage

How does cacheR decide to recompute?

Limitations & caveats

When you probably shouldn’t use cacheR

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cacheR

What does cacheR do?

What is it useful for?

Installation

Basic usage

How does cacheR decide to recompute?

Limitations & caveats

When you probably shouldn’t use cacheR

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages