This document outlines the style guide for epinowcast family of packages.
This guide is a work in progress and will be updated as the community packages evolve.
We welcome contributions to this guide and encourage you to raise issues or submit PRs if you have any suggestions.
Generally, we follow the tidyverse style guide.
This guide provides extensions and exceptions to that tidyverse style guide.
All code should be wrapped at 80 characters per line. This applies to R code, Stan code, and other source files. One sentence per line takes priority in Markdown and Quarto files.
Short function calls should stay on a single line:
obs <- coerce_dt(obs, dates = TRUE)
data.table::setkeyv(metaobs, c(".group", "date"))When a function call needs to wrap, the first argument stays on the same line as the function name if it fits within the 80-character limit. Subsequent arguments go on new lines, each indented by 2 spaces. The closing bracket goes on its own line:
# Good: first argument fits on same line
nowcast <- epinowcast(pobs,
fit = enw_fit_opts(
save_warmup = FALSE, pp = TRUE,
chains = 2, iter_warmup = 500,
iter_sampling = 500
)
)When the function name plus the first argument would exceed 80 characters, the first argument moves to a new line:
# Good: first arg on new line when the line would be too long
retro_nat_germany <- enw_filter_reference_dates(
retro_nat_germany,
include_days = 40
)
obs <- coerce_dt(
obs,
required_cols = c("new_confirm", "reference_date", "delay"),
group = TRUE
)Do not align arguments to the opening parenthesis in function calls:
# Bad: arguments aligned to opening parenthesis
reports <- data.table::dcast(obs,
.group + reference_date ~ delay,
value.var = "new_confirm",
fill = 0)
# Good: arguments indented 2 spaces
reports <- data.table::dcast(obs,
.group + reference_date ~ delay,
value.var = "new_confirm", fill = 0
)- For most packages, we use a short prefix for exported functions (e.g. we use
enw_prefix for theepinowcastpackage). This helps preclude to conflicts with other packages and also makes it more convenient for users to tab-complete / browse functions from the package. - There are some exceptions:
- For functions that work with S3 objects (e.g.
plot.epinowcast(), they must be named accordingly. - Internal functions (i.e. functions that are not exported) do not require this prefix. However: we do have standard internal prefixes as well:
check_for functions that validate arguments,coerce_for functions that convert to a particular type. - For functions where the name is intended to leverage other R-wide naming conventions (e.g.
(d|r|p|q)DISTROstyle naming)
- For functions that work with S3 objects (e.g.
Also note that these are conventions, not hard rules.
In general we aim to minimise dependencies on packages outside the epinowcast community where possible. This makes it easier to maintain our packages and reduces the risk of breaking changes in other packages impacting our users. However, additional dependencies are sometimes necessary to improve the functionality of the package.
The following guidelines should be followed when using adding dependencies:
- Added to the
ImportsorSuggestsfieldDESCRIPTIONfile in alphabetical order. A dependency should be anImportsif it is required for the package to function and aSuggestsif it is only required for certain non-core functions or vignettes. - In the PR that adds the dependency this should be clearly stated in the PR description along with a justification for the dependency, the number and type of downstream dependencies, and an assessment of the risk of the dependency breaking. In general, the barrier for adding dependencies should be high but is lower for
Suggestsdependencies.
More generally when adding functions from external packages (i.e. even if they are already a dependency) the following should be followed:
- Documented in function documentation using the
@importFromtag. - Used within functions using the
package::functionformat (though we make exception for functions fromdata.tableas these are all imported byepinowcast).
- Any required inputs should be clearly documented in the function documentation, particularly in terms of type, but also other constraints (e.g. presence/absence of columns).
- Any expressed constraint in exported functions should be verified using some sort of
check_expression, and ideally unit tests written correspondingly to confirm thatcheck_rejects bad input. - Many of methods across the
epinowcastpackages work withdata.frameinputs (and subclasses ofdata.framelikedata.tableandtibble). Internally, for performance and syntax reasons, we prefer to usedata.tables explicitly and that is generally the type returned by functions. However, we will continue to acceptdata.frame-like arguments as inputs. - Translation from
data.frametodata.tablecan be handled bycoerce_dt,check_dtormake_dt(coerce_dtandcheck_dtcombined) which will be provided in a to-be-released packagemake_dt - In general, functions should be side-effect-free on their arguments. This is generally the case in R, but notably
data.tablearguments may be modified inside functions. To maintain the benefits of usingdata.table, you may wish to allow side effects, either by method flag or with internal methods.
In general, we aim to check the inputs for all external facing functions. This is to ensure that the user is aware of any issues with the input data and to provide a consistent error message. For an example of this philosophy, review usage in epinowcast more widely, such as the functions in R/data-converters.R.
data.tableobjects are used for internal data manipulation. If you are unfamiliar withdata.tableplease see the documentation and cheatsheet. Prototype code may be written with other tools but will generally need be refactored to usedata.tablebefore submission (in PRs where help is needed with this please clearly state this).- We aim to use more readable vs efficient
data.tablesyntax where there is a trade-off (of course the exact trade-off requirers developer judgement). For example, rather than bracket chaining we prefer the use of one-line statements with re-assignment. The following functions demonstrate these patterns (and the reason why we avoid them - the chained dt actually yields a different result fordt_chainvsdt):
library(data.table)
# we prefer this
dt <- as.data.table(mtcars)
dt[, mpg := mpg + 1]
dt[mpg > 20, cyl := 10]
dt[, cyl := cyl + 1]
#over this
dt_chain <- as.data.table(mtcars)[, mpg := mpg + 1][mpg > 20, cyl := 10][, cyl := cyl + 1]- We also use
liststructures for more complex objects or wheredata.tableis not appropriate. If the appropriate data structure is unclear for the problem at hand please flag this in the issue you are addressing or in the PR discussion.
- For external functions we aim for the output to be a
data.tableobject if possible unless a custom class is used (which we generally aim to inherit from thedata.tableclass). This is to ensure consistency with the input types and to allow for easy chaining of functions. - All returned
data.tableobjects should be followed with[]as this ensures the object prints automatically. This holds for both internal and external functions in order to improve both the user and developer experience. The following functions demonstrate this pattern:
library(data.table)
no_print_iris <- function(dt) {
dt <- coerce_dt(dt)
return(dt)
}
print_iris <- function(dt) {
dt <- coerce_dt(dt)
return(dt[])
}
no_print_iris(iris)
print_iris(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.9 3.0 1.4 0.2 setosa
# 3: 4.7 3.2 1.3 0.2 setosa
# 4: 4.6 3.1 1.5 0.2 setosa
# 5: 5.0 3.6 1.4 0.2 setosa
# ---
# 146: 6.7 3.0 5.2 2.3 virginica
# 147: 6.3 2.5 5.0 1.9 virginica
# 148: 6.5 3.0 5.2 2.0 virginica
# 149: 6.2 3.4 5.4 2.3 virginica
# 150: 5.9 3.0 5.1 1.8 virginica