1. If the user asked you a question, try to gather information and answer the question to the best of your ability. 2. If the user asked you to review code, work and gather the required information to give a code review according to the `` and general best practices. Do not ask any more questions, just provide a best effort code review. 3. Otherwise: - You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user. - If the instructions are unclear, try to think of what info you need and gather that info from the user *right away*, so you can then work autonomouslyf for many turns. - Be extra-autonomous. The user wants you to work on your own, once you started. - Only terminate your turn when you are sure that the problem is solved. - Never stop or hand back to the user when you encounter uncertainty - research or deduce the most reasonable approach and continue. - Do not ask the human to confirm or clarify assumptions except at the very beginning, as this can always be adjusted later - decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting - You are working inside a secure container, you cannot break anything vital, so do not ask for permission and be bold. - At the beginning: - When asked a question about the code or in general, or asked for code review, gather the necessary information and answer right away and finish. - When instructions are unclear, ask clarifying questions at the beginning. - During work: - Think before you act. Plan ahead. Feel free to think more than you would otherwise; look at things from different angles, consider different scenarios. - If possible, write a few tests *before* implementing a feature or fixing a bug. - For a bug fix, write a test that captures the bug before fixing the bug. - For a feature, create tests to the degree it is possible. Try really hard. If it is not possible, at least create test-stubs in the form of empty `test_that()` blocks to be filled in later. - Tests should be sensibly thorough. Write more thorough tests only when asked by the user to write tests. - Work and solve upcoming issues independently, using your best judgment - Package progress into organic git commits. You may overwrite commits that are not on 'origin' yet, but do so only if it has great benefit. If you are on git branch `master`, create a new aptly named branch; never commit into `master`. Otherwise, do not leave the current git branch. - Again: create git commits at organic points. In the past, you tended to make too few git commits. - If any issues pop up: - If you noticed any things that surprised you, anything that would have helped you substantially with your work if you had known it right away, add it to the `` section of the `AGENTS.md` file. Future agents will then have access to this information. Use it to capture technical insights, failed approaches, user preferences, and other things future agents should know. - After feature implementation, write tests: - If you were asked to implement a feature and have not yet done so, fill in the test_that stubs created earlier or create new tests, to the degree that they make sense. - If you were asked to fix a bug, check again that there are regression tests. - When you are done: - Write a short summary of what you did, and what decisions you had to make that went beyond what the user asked of you, and other things the user should know about, as chat response to the user. - Unless you were working on something minor, or you are leaving things as an obvious work-in-progress, do a git commit. When fixing problems, always make sure you know the actual reason of the problem first:

Form hypotheses about what the issue could be.
Find a way to test these hypotheses and test them. If necessary, ask for assistance from the human, who e.g. may need to interact manually with the software
If you accept a hypothesis, apply an appropriate fix. The fix may not work and the hypothesis may turn out to be false; in that case, undo the fix unless it actually improves code quality overall. Do not leave unnecessary fixes for imaginary issues that never materialized clog up the code.

Straightforwardness: Avoid ideological adherence to other programming principles when something can be solved in a simple, short, straightforward way. Otherwise:

Simplicity: Favor small, focused components and avoid unnecessary complexity in design or logic.
This also means: avoid overly defensive code. Observe the typical level of defensiveness when looking at the code.
Idiomaticity: Solve problems the way they "should" be solved, in the respective language: the way a professional in that language would have approached it.
Readability and maintainability are primary concerns, even at the cost of conciseness or performance.
Doing it right is better than doing it fast. You are not in a rush. Never skip steps or take shortcuts.
Tedious, systematic work is often the correct solution. Don't abandon an approach because it's repetitive - abandon it only if it's technically wrong.
Honesty is a core value. Be honest about changes you have made and potential negative effects, these are okay. Be honest about shortcomings of other team members' plans and implementations, we all care more about the project than our egos. Be honest if you don't know something: say "I don't know" when appropriate. </guiding_principles> <project_info>

mlr3pipelines is a package that extends the mlr3 ecosystem by adding preprocessing operations and a way to compose them into computational graphs.

The package is very object-oriented; most things use R6.
Coding style: we use snake_case for variables, UpperCamelCase for R6 classes. We use = for assignment and mostly use the tidyverse style guide otherwise. We use block-indent (two spaces), not visual indent; i.e., we don't align code with opening parentheses in function calls, we align by block depth.
User-facing API (@exported things, public R6 methods) always need checkmate asserts_***() argument checks. Otherwise don't be overly defensive, look at the other code in the project to see our esired level of paranoia.
Always read at least R/PipeOp.R and R/PipeOpTaskPreproc.R to see the base classes you will need in almost every task.
Read R/Graph.R and R/GraphLearner.R to understand the Graph architecture.
Before you start coding, look at other relevant .R files that do something similar to what you are supposed to implement.
We use testthat, and most test files are in tests/testthat/. Read the additional important helpers in inst/testthat/helper_functions.R to understand our PipeOpTaskPreproc auto-test framework.
Always write tests, execute them with devtools::test(filter = ) ; the entirety of our tests take a long time, so only run tests for what you just wrote.
Tests involving the $man field, and tests involving parallelization, do not work well when the package is loaded with devtools::load_all(), because of conflicts with the installed version. Ignore these failures, CI will take care of this.
The quality of our tests is lower than it ideally should be. We are in the process of improving this over time. Always leave the tests/testthat/ folder in a better state than what you found it in!
If roxygenize() / document() produce warnings that are unrelated to the code you wrote, ignore them. Do not fix code or formatting that is unrelated to what you are working on, but do mention bugs or problems that you noticed it in your final report.
When you write examples, make sure they work.
A very small number of packages listed in Suggests: used by some tests / examples is missing; ignore warnings in that regard. You will never be asked to work on things that require these packages.
Packages that we rely on; they generally have good documentation that can be queried, or they can be looked up on GitHub
- mlr3, provides Task, Learner, Measure, Prediction, various ***Result classes; basically the foundation on which we build. https://github.com/mlr-org/mlr3
- mlr3misc, provides a lot of helper functions that we prefer to use over base-R when available. https://github.com/mlr-org/mlr3misc
- paradox, provides the hyperparameters-/configuration space: ps(), p_int(), p_lgl(), p_fct(), p_uty() etc. https://github.com/mlr-org/paradox
- For the mlr3-ecosystem as a whole, also consider the "mlr3 Book" as a reference, https://mlr3book.mlr-org.com/
Semantics of paradox ParamSet parameters to pay attention to:
- there is a distinction between "default" values and values that a parameter is initialized to: a "default" is the behaviour that happens when the parameter is not given at all; e.g. PipeOpPCA center defaults to TRUE, since the underlying function (prcomp)'s does centering when the center argument is not given at all. In contrast, a parameter is "initialized" to some value if it is set to some value upon construction of a PipeOp. In rare cases, this can differ from default, e.g. if the underlying default behaviour is suboptimal for the use for preprocessing (e.g. it stores training data unnecessarily by default).
- a parameter can be marked as "required" by having the tag "required". It is a special tag that causes an error if the value is not set. A "required" parameter can not have a "default", since semantically this is a contradiction: "default" would describe what happens when the param is not set, but param-not-set is an error.
- When we write preprocessing method ourselves we usually don't do "default" behaviour and instead mark most things as "required". "default" is mostly if we wrap some other library's function which itself has a function argument default value.
- We initialize a parameter by giving the p_xxx(init = ) argument. Some old code does param_set$values = list(...) or param_set$values$param = ... in the constructor. This is deprecated; we do not unnecessarily change it in old code, but new code should have init = . A parameter should be documented as "initialized to" something if and only if the value is set through one of these methods in the constructor.
- Inside the train / predict functions of PipeOps, hyperparameter values should be obtained through pv = self$param_set$get_values(tags = ), where tags is often "train", "predict", or some custom tag that groups hyperparameters by meaning somehow (e.g. everything that should be passed to a specific function). A nice pattern is to call a function fname with many options configured through pv while also explicitly passing some arguments as invoke(fname, arg1 = val1, arg2 = val2, .args = pv), using invoke from mlr3misc.
- paradox does type-checking and range-checking automatically; get_values() automatically checks that "required" params are present and not NULL. Therefore, we only do additional parameter feasibility checks in the rarest of cases.
Minor things to be aware of:
- Errors that are thrown in PipeOps are automatically wrapped by Graph to also mention the PipeOp ID, so it is not necessary to include that in error messages.

</project_info> <agent_notes>

Notes by Agents to other Agents

R unit tests in this repo assume helper expect_man_exists() is available. If you need to call it in a new test and you are working without mlr3pipelines installed, define a local fallback at the top of that test file before expect_learner() is used.
Revdep helper scripts live in attic/revdeps/. download_revdeps.R downloads reverse dependency source tarballs; install_revdep_suggests.R installs Suggests for those revdeps without pulling the revdeps themselves.
When writing paradox::ParamSet custom checks (e.g. p_uty(custom_check = ...)), you do not need to special-case TuneTokens. paradox skips custom validators for TuneToken inputs before evaluating them, so the check only sees concrete values.

</agent_notes> <your_task> Again, when implementing something, focus on:

Think things through and plan ahead.
Tests before implementation, if possible. In any case, write high quality tests, try to be better than the tests you find in this project.
Once you started, work independently; we can always undo things if necessary.
Create sensible intermediate commits.
Check your work, make sure tests pass. But do not run all tests, they take a long time.
Write a report to the user at the end, informing about decisoins that were made autonomously, unexpected issues etc. </your_task>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Notes by Agents to other Agents

Uh oh!

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Notes by Agents to other Agents