-
Notifications
You must be signed in to change notification settings - Fork 33
Expand file tree
/
Copy pathREADME.Rmd
More file actions
140 lines (101 loc) · 6.9 KB
/
README.Rmd
File metadata and controls
140 lines (101 loc) · 6.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
output: github_document
---
# tidypredict <a href="https://tidypredict.tidymodels.org"><img src="man/figures/logo.png" align="right" height="138" alt="tidypredict website" /></a>
[](https://github.com/tidymodels/tidypredict/actions/workflows/R-CMD-check.yaml)
[](https://CRAN.R-project.org/package=tidypredict)
[](https://CRAN.R-project.org/package=tidypredict)
[](https://app.codecov.io/gh/tidymodels/tidypredict)
[](https://lifecycle.r-lib.org/articles/stages.html)
```{r pre, include = FALSE}
if (!rlang::is_installed("randomForest")) {
knitr::opts_chunk$set(
eval = FALSE
)
}
```
```{r setup, include=FALSE}
library(dplyr)
library(tidypredict)
library(randomForest)
```
The main goal of `tidypredict` is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL. In other words, it is able to parse a model such as this one:
```{r}
model <- lm(mpg ~ wt + cyl, data = mtcars)
```
`tidypredict` can return a SQL statement that is ready to run inside the database. Because it uses `dplyr`'s database interface, it works with several databases back-ends, such as MS SQL:
```{r}
tidypredict_sql(model, dbplyr::simulate_mssql())
```
## Installation
Install `tidypredict` from CRAN using:
```{r, eval = FALSE}
install.packages("tidypredict")
```
Or install the **development version** using `devtools` as follows:
```{r, eval = FALSE}
install.packages("remotes")
remotes::install_github("tidymodels/tidypredict")
```
## Functions
`tidypredict` has only a few functions, and it is not expected that number to grow much. The main focus at this time is to add more models to support.
| Function | Description
|-----------------------------|--------------------------------------------------------------------------------|
|`tidypredict_fit()` | Returns an R formula that calculates the prediction |
|`tidypredict_sql()` | Returns a SQL query based on the formula from `tidypredict_fit()` |
|`tidypredict_to_column()` | Adds a new column using the formula from `tidypredict_fit()` |
|`tidypredict_test()` | Tests `tidypredict` predictions against the model's native `predict()` function |
|`tidypredict_interval()` | Same as `tidypredict_fit()` but for intervals (only works with `lm` and `glm`) |
|`tidypredict_sql_interval()` | Same as `tidypredict_sql()` but for intervals (only works with `lm` and `glm`) |
|`parse_model()` | Creates a list spec based on the R model |
|`as_parsed_model()` | Prepares an object to be recognized as a parsed model |
## How it works
<img src="man/figures/howitworks.png">
Instead of translating directly to a SQL statement, `tidypredict` creates an R formula. That formula can then be used inside `dplyr`. The overall workflow would be as illustrated in the image above, and described here:
1. Fit the model using a base R model, or one from the packages listed in [Supported Models](#supported-models)
1. `tidypredict` reads model, and creates a list object with the necessary components to run predictions
1. `tidypredict` builds an R formula based on the list object
1. `dplyr` evaluates the formula created by `tidypredict`
1. `dplyr` translates the formula into a SQL statement, or any other interfaces.
1. The database executes the SQL statement(s) created by `dplyr`
### Parsed model spec
`tidypredict` writes and reads a spec based on a model. Instead of simply writing the R formula directly, splitting the spec from the formula adds the following capabilities:
1. No more saving models as `.rds` - Specifically for cases when the model needs to be used for predictions in a Shiny app.
1. Beyond R models - Technically, anything that can write a proper spec, can be read into `tidypredict`. It also means, that the parsed model spec can become a good alternative to using *PMML.*
## Supported models
The following models are supported by `tidypredict`:
- Linear Regression - `lm()`
- Generalized Linear model - `glm()`
- Elastic net models - `glmnet::glmnet()`
- Random Forest models - `randomForest::randomForest()`
- Random Forest models, via `ranger` - `ranger::ranger()`
- MARS models - `earth::earth()`
- Decision tree models - `rpart::rpart()`
- XGBoost models - `xgboost::xgb.Booster`
- LightGBM models - `lightgbm::lgb.Booster`
- CatBoost models - `catboost::catboost.Model`
- Cubist models - `Cubist::cubist()`
- Tree models, via `partykit` - `partykit::ctree()`
### `parsnip`
`tidypredict` supports models fitted via the `parsnip` interface. The ones confirmed currently work in `tidypredict` are:
- `lm()` - `parsnip`: `linear_reg()` with *"lm"* as the engine.
- `glmnet::glmnet()` - `parsnip`: `linear_reg()` or `logistic_reg()` with *"glmnet"* as the engine.
- `randomForest::randomForest()` - `parsnip`: `rand_forest()` with *"randomForest"* as the engine.
- `ranger::ranger()` - `parsnip`: `rand_forest()` with *"ranger"* as the engine.
- `earth::earth()` - `parsnip`: `mars()` with *"earth"* as the engine.
- `rpart::rpart()` - `parsnip`: `decision_tree()` with *"rpart"* as the engine.
- `xgboost::xgb.Booster` - `parsnip`: `boost_tree()` with *"xgboost"* as the engine.
- `lightgbm::lgb.Booster` - `parsnip`: `boost_tree()` with *"lightgbm"* as the engine (via `bonsai`).
- `catboost::catboost.Model` - `parsnip`: `boost_tree()` with *"catboost"* as the engine (via `bonsai`).
### `broom`
The `tidy()` function from broom works with linear models parsed via `tidypredict`
```{r}
pm <- parse_model(lm(wt ~ ., mtcars))
tidy(pm)
```
## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/tidypredict/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).