- Form hypotheses about what the issue could be.
- Find a way to test these hypotheses and test them. If necessary, ask for assistance from the human, who e.g. may need to interact manually with the software
- If you accept a hypothesis, apply an appropriate fix. The fix may not work and the hypothesis may turn out to be false; in that case, undo the fix unless it actually improves code quality overall. Do not leave unnecessary fixes for imaginary issues that never materialized clog up the code.
- Simplicity: Favor small, focused components and avoid unnecessary complexity in design or logic.
- This also means: avoid overly defensive code. Observe the typical level of defensiveness when looking at the code.
- Idiomaticity: Solve problems the way they "should" be solved, in the respective language: the way a professional in that language would have approached it.
- Readability and maintainability are primary concerns, even at the cost of conciseness or performance.
- Doing it right is better than doing it fast. You are not in a rush. Never skip steps or take shortcuts.
- Tedious, systematic work is often the correct solution. Don't abandon an approach because it's repetitive - abandon it only if it's technically wrong.
- Honesty is a core value. Be honest about changes you have made and potential negative effects, these are okay. Be honest about shortcomings of other team members' plans and implementations, we all care more about the project than our egos. Be honest if you don't know something: say "I don't know" when appropriate. </guiding_principles> <project_info>
mlr3pipelines is a package that extends the mlr3 ecosystem by adding preprocessing operations and a way to compose them into computational graphs.
- The package is very object-oriented; most things use R6.
- Coding style: we use
snake_casefor variables,UpperCamelCasefor R6 classes. We use=for assignment and mostly use the tidyverse style guide otherwise. We use block-indent (two spaces), not visual indent; i.e., we don't align code with opening parentheses in function calls, we align by block depth. - User-facing API (
@exported things, public R6 methods) always need checkmateasserts_***()argument checks. Otherwise don't be overly defensive, look at the other code in the project to see our esired level of paranoia. - Always read at least
R/PipeOp.RandR/PipeOpTaskPreproc.Rto see the base classes you will need in almost every task. - Read
R/Graph.RandR/GraphLearner.Rto understand the Graph architecture. - Before you start coding, look at other relevant
.Rfiles that do something similar to what you are supposed to implement. - We use
testthat, and most test files are intests/testthat/. Read the additional important helpers ininst/testthat/helper_functions.Rto understand ourPipeOpTaskPreprocauto-test framework. - Always write tests, execute them with
devtools::test(filter = ); the entirety of our tests take a long time, so only run tests for what you just wrote. - Tests involving the
$manfield, and tests involving parallelization, do not work well when the package is loaded withdevtools::load_all(), because of conflicts with the installed version. Ignore these failures, CI will take care of this. - The quality of our tests is lower than it ideally should be. We are in the process of improving this over time. Always leave the
tests/testthat/folder in a better state than what you found it in! - If
roxygenize()/document()produce warnings that are unrelated to the code you wrote, ignore them. Do not fix code or formatting that is unrelated to what you are working on, but do mention bugs or problems that you noticed it in your final report. - When you write examples, make sure they work.
- A very small number of packages listed in
Suggests:used by some tests / examples is missing; ignore warnings in that regard. You will never be asked to work on things that require these packages. - Packages that we rely on; they generally have good documentation that can be queried, or they can be looked up on GitHub
mlr3, providesTask,Learner,Measure,Prediction, various***Resultclasses; basically the foundation on which we build. https://github.com/mlr-org/mlr3mlr3misc, provides a lot of helper functions that we prefer to use over base-R when available. https://github.com/mlr-org/mlr3miscparadox, provides the hyperparameters-/configuration space:ps(),p_int(),p_lgl(),p_fct(),p_uty()etc. https://github.com/mlr-org/paradox- For the mlr3-ecosystem as a whole, also consider the "mlr3 Book" as a reference, https://mlr3book.mlr-org.com/
- Semantics of paradox ParamSet parameters to pay attention to:
- there is a distinction between "default" values and values that a parameter is initialized to: a "default" is the behaviour that happens when the parameter is not given at all; e.g. PipeOpPCA
centerdefaults toTRUE, since the underlying function (prcomp)'s does centering when thecenterargument is not given at all. In contrast, a parameter is "initialized" to some value if it is set to some value upon construction of a PipeOp. In rare cases, this can differ from default, e.g. if the underlying default behaviour is suboptimal for the use for preprocessing (e.g. it stores training data unnecessarily by default). - a parameter can be marked as "required" by having the tag
"required". It is a special tag that causes an error if the value is not set. A "required" parameter can not have a "default", since semantically this is a contradiction: "default" would describe what happens when the param is not set, but param-not-set is an error. - When we write preprocessing method ourselves we usually don't do "default" behaviour and instead mark most things as "required". "default" is mostly if we wrap some other library's function which itself has a function argument default value.
- We initialize a parameter by giving the
p_xxx(init = )argument. Some old code doesparam_set$values = list(...)orparam_set$values$param = ...in the constructor. This is deprecated; we do not unnecessarily change it in old code, but new code should haveinit =. A parameter should be documented as "initialized to" something if and only if the value is set through one of these methods in the constructor. - Inside the train / predict functions of PipeOps, hyperparameter values should be obtained through
pv = self$param_set$get_values(tags = ), wheretagsis often"train","predict", or some custom tag that groups hyperparameters by meaning somehow (e.g. everything that should be passed to a specific function). A nice pattern is to call a functionfnamewith many options configured throughpvwhile also explicitly passing some arguments asinvoke(fname, arg1 = val1, arg2 = val2, .args = pv), usinginvokefrommlr3misc. - paradox does type-checking and range-checking automatically;
get_values()automatically checks that"required"params are present and notNULL. Therefore, we only do additional parameter feasibility checks in the rarest of cases.
- there is a distinction between "default" values and values that a parameter is initialized to: a "default" is the behaviour that happens when the parameter is not given at all; e.g. PipeOpPCA
- Minor things to be aware of:
- Errors that are thrown in PipeOps are automatically wrapped by Graph to also mention the PipeOp ID, so it is not necessary to include that in error messages.
</project_info> <agent_notes>
- R unit tests in this repo assume helper
expect_man_exists()is available. If you need to call it in a new test and you are working without mlr3pipelines installed, define a local fallback at the top of that test file beforeexpect_learner()is used. - Revdep helper scripts live in
attic/revdeps/.download_revdeps.Rdownloads reverse dependency source tarballs;install_revdep_suggests.Rinstalls Suggests for those revdeps without pulling the revdeps themselves. - When writing
paradox::ParamSetcustom checks (e.g.p_uty(custom_check = ...)), you do not need to special-caseTuneTokens.paradoxskips custom validators forTuneTokeninputs before evaluating them, so the check only sees concrete values.
</agent_notes> <your_task> Again, when implementing something, focus on:
- Think things through and plan ahead.
- Tests before implementation, if possible. In any case, write high quality tests, try to be better than the tests you find in this project.
- Once you started, work independently; we can always undo things if necessary.
- Create sensible intermediate commits.
- Check your work, make sure tests pass. But do not run all tests, they take a long time.
- Write a report to the user at the end, informing about decisoins that were made autonomously, unexpected issues etc. </your_task>