diff --git a/R/README.md b/R/README.md index fc6530b74..73bdef99a 100644 --- a/R/README.md +++ b/R/README.md @@ -1,6 +1,6 @@ -# Some Notes on the Design of `parsnip` +# Some Notes on the Design of parsnip -`parsnip` is trying to solve the issues of unified interfaces for the myriad R modeling functions that have very heterogeneous interfaces and return values. It defines a set of modules, which are specific tasks, such as +The parsnip package is trying to solve the issues of unified interfaces for the myriad R modeling functions that have very heterogeneous interfaces and return values. It defines a set of modules, which are specific tasks, such as * fitting the model * obtaining numeric predictions for regression models @@ -8,17 +8,17 @@ and so on. The list of modules is likely to grow over time to include variable importance scores and so on,. -`caret` was written for the same purpose. The approach there was to encapsulate the modules as functions (see [this directory](https://github.com/topepo/caret/tree/master/models/files) for examples). The issue with having these modules as functions are: +The caret package was written for the same purpose. The approach there was to encapsulate the modules as functions (see [this directory](https://github.com/topepo/caret/tree/master/models/files) for examples). The issue with having these modules as functions are: * A lot of code duplication. * More difficult to maintain. * Any functions in open code had to be a dependency of some sort. This led to a long ago version having about 200 package dependencies which was problematic. -To get around the last point, `caret` _compiles_ these modules into a large list and saves it in the package as an RData file. This avoids `R CMD check` from noticing that code and triggering warnings about dependencies. +To get around the last point, caret _compiles_ these modules into a large list and saves it in the package as an RData file. This avoids `R CMD check` from noticing that code and triggering warnings about dependencies. ## Model Fitting Modules -`parsnip` approaches the problem differently and relies more on using `call` objects for the modules. In the simple cases, the fit module is a list that contains information about the module including the package and function name for the call as well as any default options. For example, for logistic regression using `glm`, the module may look like: +parsnip approaches the problem differently and relies more on using `call` objects for the modules. In the simple cases, the fit module is a list that contains information about the module including the package and function name for the call as well as any default options. For example, for logistic regression using `glm`, the module may look like: ```r list( @@ -77,9 +77,9 @@ The same is true for quosures. Making predictions is done in a manner similar to fitting models; a call is created in the same way. However, there are additional complexities. -First, the data or model fit object may require some preprocessing to make the predict function work. This does _not_ include executing a formula method on the data but may include coercing the new data into an appropriate format. It can also be used to check for specific fit object requirements. For example, an additional option is required for the `ranger` package to compute class probabilities. The `pre` element of a prediction module can be used to check that the relevant option is set correctly. +First, the data or model fit object may require some preprocessing to make the predict function work. This does _not_ include executing a formula method on the data but may include coercing the new data into an appropriate format. It can also be used to check for specific fit object requirements. For example, an additional option is required for the ranger package to compute class probabilities. The `pre` element of a prediction module can be used to check that the relevant option is set correctly. -Second, there is a high likelihood that the results of executing the prediction code will require post-processing to put the results into a usable format. `ranger`, for example, returns an object of specific class that contains the predicted values for the new data. The `post` element of the prediction module would extract this value and put it into a more consistent format. +Second, there is a high likelihood that the results of executing the prediction code will require post-processing to put the results into a usable format. ranger, for example, returns an object of specific class that contains the predicted values for the new data. The `post` element of the prediction module would extract this value and put it into a more consistent format. The postprocessor can also be used to coerce the results into a [_tidy format_](https://tidymodels.github.io/model-implementation-principles/model-predictions.html#return-values). diff --git a/R/aaa_models.R b/R/aaa_models.R index 321f7de3b..7af9c94b9 100644 --- a/R/aaa_models.R +++ b/R/aaa_models.R @@ -511,7 +511,7 @@ check_interface_val <- function(x) { #' and `raw`. #' @param pkg An options character string for a package name. #' @param parsnip A single character string for the "harmonized" argument name -#' that `parsnip` exposes. +#' that parsnip exposes. #' @param original A single character string for the argument name that #' underlying model function uses. #' @param value A list that conforms to the `fit_obj` or `pred_obj` description @@ -525,7 +525,7 @@ check_interface_val <- function(x) { #' @keywords internal #' @details These functions are available for users to add their #' own models or engines (in a package or otherwise) so that they can -#' be accessed using `parsnip`. This is more thoroughly documented +#' be accessed using parsnip. This is more thoroughly documented #' on the package web site (see references below). #' #' In short, `parsnip` stores an environment object that contains diff --git a/R/add_in.R b/R/add_in.R index d17df0e31..ec6d8e712 100644 --- a/R/add_in.R +++ b/R/add_in.R @@ -1,7 +1,7 @@ #' Start an RStudio Addin that can write model specifications #' #' `parsnip_addin()` starts a process in the RStudio IDE Viewer window -#' that allows users to write code for `parsnip` model specifications from +#' that allows users to write code for parsnip model specifications from #' various R packages. The new code is written to the current document at the #' location of the cursor. #' diff --git a/R/engines.R b/R/engines.R index 79f75362f..51a22dbff 100644 --- a/R/engines.R +++ b/R/engines.R @@ -72,7 +72,7 @@ load_libs <- function(x, quiet, attach = FALSE) { #' #' - _Main arguments_ are more commonly used and tend to be available across #' engines. These names are standardized to work with different engines in a -#' consistent way, so you can use the \pkg{parsnip} main argument `trees`, +#' consistent way, so you can use the parsnip main argument `trees`, #' instead of the heterogeneous arguments for this parameter from \pkg{ranger} #' and \pkg{randomForest} packages (`num.trees` and `ntree`, respectively). Set #' these in your model type function, like `rand_forest(trees = 2000)`. @@ -154,10 +154,10 @@ set_engine.default <- function(object, engine, ...) { #' Display currently available engines for a model #' #' The possible engines for a model can depend on what packages are loaded. -#' Some \pkg{parsnip} extension add engines to existing models. For example, +#' Some parsnip extension add engines to existing models. For example, #' the \pkg{poissonreg} package adds additional engines for the [poisson_reg()] #' model and these are not available unless \pkg{poissonreg} is loaded. -#' @param x The name of a `parsnip` model (e.g., "linear_reg", "mars", etc.) +#' @param x The name of a parsnip model (e.g., "linear_reg", "mars", etc.) #' @return A tibble. #' #' @examplesIf !parsnip:::is_cran_check() diff --git a/R/extract.R b/R/extract.R index d85f05d3f..08da71cd5 100644 --- a/R/extract.R +++ b/R/extract.R @@ -30,7 +30,7 @@ #' importance/explainers. #' #' However, users should not invoke the `predict()` method on an extracted -#' model. There may be preprocessing operations that `parsnip` has executed on +#' model. There may be preprocessing operations that parsnip has executed on #' the data prior to giving it to the model. Bypassing these can lead to errors #' or silently generating incorrect predictions. #' diff --git a/R/model_object_docs.R b/R/model_object_docs.R index 6b59f35d0..c4175329d 100644 --- a/R/model_object_docs.R +++ b/R/model_object_docs.R @@ -38,7 +38,7 @@ #' software will be used. It can be a package name or a technology #' type. #' -#' This class and structure is the basis for how \pkg{parsnip} +#' This class and structure is the basis for how parsnip #' stores model objects prior to seeing the data. #' #' @section Argument Details: @@ -53,7 +53,7 @@ #' arguments. For example, when calling `mean(dat_vec)`, the object #' `dat_vec` is immediately evaluated inside of the function. #' -#' `parsnip` model functions do not do this. For example, using +#' parsnip model functions do not do this. For example, using #' #'\preformatted{ #' rand_forest(mtry = ncol(mtcars) - 1) diff --git a/R/repair_call.R b/R/repair_call.R index de6d5741f..fcede6649 100644 --- a/R/repair_call.R +++ b/R/repair_call.R @@ -7,10 +7,10 @@ #' #' `repair_call()` call can adjust the model objects call to be usable by other #' functions and methods. -#' @param x A fitted `parsnip` model. An error will occur if the underlying model +#' @param x A fitted parsnip model. An error will occur if the underlying model #' does not have a `call` element. #' @param data A data object that is relevant to the call. In most cases, this -#' is the data frame that was given to `parsnip` for the model fit (i.e., the +#' is the data frame that was given to parsnip for the model fit (i.e., the #' training set data). The name of this data object is inserted into the call. #' @return A modified `parsnip` fitted model. #' @examplesIf !parsnip:::is_cran_check() @@ -21,7 +21,7 @@ #' fit(mpg ~ ., data = mtcars) #' #' # In this call, note that `data` is not `mtcars` and the `model = ~TRUE` -#' # indicates that the `model` argument is an `rlang` quosure. +#' # indicates that the `model` argument is an rlang quosure. #' fitted_model$fit$call #' #' # All better: diff --git a/R/tidy_glmnet.R b/R/tidy_glmnet.R index 38309076e..59761ca68 100644 --- a/R/tidy_glmnet.R +++ b/R/tidy_glmnet.R @@ -1,8 +1,8 @@ #' tidy methods for glmnet models #' #' `tidy()` methods for the various `glmnet` models that return the coefficients -#' for the specific penalty value used by the `parsnip` model fit. -#' @param x A fitted `parsnip` model that used the `glmnet` engine. +#' for the specific penalty value used by the parsnip model fit. +#' @param x A fitted parsnip model that used the `glmnet` engine. #' @param penalty A _single_ numeric value. If none is given, the value specified #' in the model specification is used. #' @param ... Not used diff --git a/R/tidy_liblinear.R b/R/tidy_liblinear.R index eea2c2872..8cfb416e4 100644 --- a/R/tidy_liblinear.R +++ b/R/tidy_liblinear.R @@ -1,8 +1,8 @@ #' tidy methods for LiblineaR models #' #' `tidy()` methods for the various `LiblineaR` models that return the -#' coefficients from the `parsnip` model fit. -#' @param x A fitted `parsnip` model that used the `LiblineaR` engine. +#' coefficients from the parsnip model fit. +#' @param x A fitted parsnip model that used the `LiblineaR` engine. #' @param ... Not used #' @return A tibble with columns `term` and `estimate`. #' @keywords internal diff --git a/R/translate.R b/R/translate.R index d93c9d4de..b2c59f599 100644 --- a/R/translate.R +++ b/R/translate.R @@ -20,7 +20,7 @@ #' the model fitting function/engine. #' #' This function can be useful when you need to understand how -#' `parsnip` goes from a generic model specific to a model fitting +#' parsnip goes from a generic model specific to a model fitting #' function. #' #' **Note**: this function is used internally and users should only use it diff --git a/README.md b/README.md index e845cf8dd..1621cf67f 100644 --- a/README.md +++ b/README.md @@ -153,8 +153,8 @@ rand_forest(mtry = 10, trees = 2000) %>% #> Target node size: 5 #> Variable importance mode: impurity #> Splitrule: variance -#> OOB prediction error (MSE): 5.976917 -#> R squared (OOB): 0.8354559 +#> OOB prediction error (MSE): 5.725636 +#> R squared (OOB): 0.8423737 ``` A list of all parsnip models across different CRAN packages can be found diff --git a/issue_template.md b/issue_template.md index f39c391a7..e94fa125f 100644 --- a/issue_template.md +++ b/issue_template.md @@ -3,7 +3,7 @@ name: Bug report or feature request about: Describe a bug you've seen or make a case for a new feature --- -# PLEASE READ: Making a new issue for `parsnip` +# PLEASE READ: Making a new issue for parsnip Please follow the template below. diff --git a/man/details_logistic_reg_gee.Rd b/man/details_logistic_reg_gee.Rd index df4b6db1e..52f2d8072 100644 --- a/man/details_logistic_reg_gee.Rd +++ b/man/details_logistic_reg_gee.Rd @@ -62,7 +62,7 @@ call would look like: \if{html}{\out{
}}\preformatted{gee(breaks ~ tension, id = wool, data = warpbreaks, corstr = "exchangeable") }\if{html}{\out{
}} -With \code{parsnip}, we suggest using the formula method when fitting: +With parsnip, we suggest using the formula method when fitting: \if{html}{\out{
}}\preformatted{library(tidymodels) data("toenail", package = "HSAUR3") diff --git a/man/extract-parsnip.Rd b/man/extract-parsnip.Rd index f6544fbc9..8d893114a 100644 --- a/man/extract-parsnip.Rd +++ b/man/extract-parsnip.Rd @@ -55,7 +55,7 @@ model (via \code{print()}, \code{summary()}, \code{plot()}, etc.) or for variabl importance/explainers. However, users should not invoke the \code{predict()} method on an extracted -model. There may be preprocessing operations that \code{parsnip} has executed on +model. There may be preprocessing operations that parsnip has executed on the data prior to giving it to the model. Bypassing these can lead to errors or silently generating incorrect predictions. diff --git a/man/model_spec.Rd b/man/model_spec.Rd index 2342fa7ca..3345eb89e 100644 --- a/man/model_spec.Rd +++ b/man/model_spec.Rd @@ -39,7 +39,7 @@ software will be used. It can be a package name or a technology type. } -This class and structure is the basis for how \pkg{parsnip} +This class and structure is the basis for how parsnip stores model objects prior to seeing the data. } \section{Argument Details}{ @@ -55,7 +55,7 @@ For example, most R functions immediately evaluate their arguments. For example, when calling \code{mean(dat_vec)}, the object \code{dat_vec} is immediately evaluated inside of the function. -\code{parsnip} model functions do not do this. For example, using +parsnip model functions do not do this. For example, using \preformatted{ rand_forest(mtry = ncol(mtcars) - 1) diff --git a/man/parsnip_addin.Rd b/man/parsnip_addin.Rd index 0d019ca80..e008e6f1a 100644 --- a/man/parsnip_addin.Rd +++ b/man/parsnip_addin.Rd @@ -8,7 +8,7 @@ parsnip_addin() } \description{ \code{parsnip_addin()} starts a process in the RStudio IDE Viewer window -that allows users to write code for \code{parsnip} model specifications from +that allows users to write code for parsnip model specifications from various R packages. The new code is written to the current document at the location of the cursor. } diff --git a/man/repair_call.Rd b/man/repair_call.Rd index cc0653fda..1997ca969 100644 --- a/man/repair_call.Rd +++ b/man/repair_call.Rd @@ -7,11 +7,11 @@ repair_call(x, data) } \arguments{ -\item{x}{A fitted \code{parsnip} model. An error will occur if the underlying model +\item{x}{A fitted parsnip model. An error will occur if the underlying model does not have a \code{call} element.} \item{data}{A data object that is relevant to the call. In most cases, this -is the data frame that was given to \code{parsnip} for the model fit (i.e., the +is the data frame that was given to parsnip for the model fit (i.e., the training set data). The name of this data object is inserted into the call.} } \value{ @@ -36,7 +36,7 @@ fitted_model <- fit(mpg ~ ., data = mtcars) # In this call, note that `data` is not `mtcars` and the `model = ~TRUE` -# indicates that the `model` argument is an `rlang` quosure. +# indicates that the `model` argument is an rlang quosure. fitted_model$fit$call # All better: diff --git a/man/rmd/logistic_reg_gee.html b/man/rmd/logistic_reg_gee.html new file mode 100644 index 000000000..d8580b099 --- /dev/null +++ b/man/rmd/logistic_reg_gee.html @@ -0,0 +1,494 @@ + + + + + + + + + + + + + +logistic_reg_gee + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +

For this engine, there is a single mode: classification

+
+

Tuning Parameters

+

This model has no formal tuning parameters. It may be beneficial to +determine the appropriate correlation structure to use, but this +typically does not affect the predicted value of the model. It +does have an effect on the inferential results and parameter +covariance values.

+
+
+

Translation from parsnip to the original package

+

The multilevelmod extension package is required to +fit this model.

+
library(multilevelmod)
+
+logistic_reg() %>% 
+  set_engine("gee") %>% 
+  translate()
+
## Logistic Regression Model Specification (classification)
+## 
+## Computational engine: gee 
+## 
+## Model fit template:
+## multilevelmod::gee_fit(formula = missing_arg(), data = missing_arg(), 
+##     family = binomial)
+

multilevelmod::gee_fit() is a wrapper model around +gee::gee().

+
+
+

Preprocessing requirements

+

There are no specific preprocessing needs. However, it is helpful to +keep the clustering/subject identifier column as factor or character +(instead of making them into dummy variables). See the examples in the +next section.

+
+
+

Other details

+

The model cannot accept case weights.

+

Both gee:gee() and gee:geepack() specify +the id/cluster variable using an argument id that requires +a vector. parsnip doesn’t work that way so we enable this model to be +fit using a artificial function id_var() to be used in the +formula. So, in the original package, the call would look like:

+
gee(breaks ~ tension, id = wool, data = warpbreaks, corstr = "exchangeable")
+

With parsnip, we suggest using the formula method when fitting:

+
library(tidymodels)
+data("toenail", package = "HSAUR3")
+
+logistic_reg() %>% 
+  set_engine("gee", corstr = "exchangeable") %>% 
+  fit(outcome ~ treatment * visit + id_var(patientID), data = toenail)
+

When using tidymodels infrastructure, it may be better to use a +workflow. In this case, you can add the appropriate columns using +add_variables() then supply the GEE formula when adding the +model:

+
library(tidymodels)
+
+gee_spec <- 
+  logistic_reg() %>% 
+  set_engine("gee", corstr = "exchangeable")
+
+gee_wflow <- 
+  workflow() %>% 
+  # The data are included as-is using:
+  add_variables(outcomes = outcome, predictors = c(treatment, visit, patientID)) %>% 
+  add_model(gee_spec, formula = outcome ~ treatment * visit + id_var(patientID))
+
+fit(gee_wflow, data = toenail)
+

The gee::gee() function always prints out warnings and +output even when silent = TRUE. The parsnip +"gee" engine, by contrast, silences all console output +coming from gee::gee(), even if +silent = FALSE.

+

Also, because of issues with the gee() function, a +supplementary call to glm() is needed to get the rank and +QR decomposition objects so that predict() can be used.

+
+
+

Case weights

+

The underlying model implementation does not allow for case +weights.

+
+
+

References

+
    +
  • Liang, K.Y. and Zeger, S.L. (1986) Longitudinal data analysis +using generalized linear models. Biometrika, 73 13–22.

  • +
  • Zeger, S.L. and Liang, K.Y. (1986) Longitudinal data analysis for +discrete and continuous outcomes. Biometrics, 42 +121–130.

  • +
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/man/rmd/logistic_reg_gee.md b/man/rmd/logistic_reg_gee.md index 9a63f2c58..4ca4e3371 100644 --- a/man/rmd/logistic_reg_gee.md +++ b/man/rmd/logistic_reg_gee.md @@ -47,7 +47,7 @@ Both `gee:gee()` and `gee:geepack()` specify the id/cluster variable using an ar gee(breaks ~ tension, id = wool, data = warpbreaks, corstr = "exchangeable") ``` -With `parsnip`, we suggest using the formula method when fitting: +With parsnip, we suggest using the formula method when fitting: ```r library(tidymodels) diff --git a/man/rmd/nearest-neighbor.md b/man/rmd/nearest-neighbor.md index c89b534ac..3ae0d4674 100644 --- a/man/rmd/nearest-neighbor.md +++ b/man/rmd/nearest-neighbor.md @@ -44,7 +44,7 @@ nearest_neighbor() %>% ``` For `kknn`, the underlying modeling function used is a restricted version of -`train.kknn()` and not `kknn()`. It is set up in this way so that `parsnip` can +`train.kknn()` and not `kknn()`. It is set up in this way so that parsnip can utilize the underlying `predict.train.kknn` method to predict on new data. This also means that a single value of that function's `kernel` argument (a.k.a `weight_func` here) can be supplied diff --git a/man/set_engine.Rd b/man/set_engine.Rd index 327b7ed34..a032316db 100644 --- a/man/set_engine.Rd +++ b/man/set_engine.Rd @@ -42,7 +42,7 @@ Modeling functions in parsnip separate model arguments into two categories: \itemize{ \item \emph{Main arguments} are more commonly used and tend to be available across engines. These names are standardized to work with different engines in a -consistent way, so you can use the \pkg{parsnip} main argument \code{trees}, +consistent way, so you can use the parsnip main argument \code{trees}, instead of the heterogeneous arguments for this parameter from \pkg{ranger} and \pkg{randomForest} packages (\code{num.trees} and \code{ntree}, respectively). Set these in your model type function, like \code{rand_forest(trees = 2000)}. diff --git a/man/set_new_model.Rd b/man/set_new_model.Rd index 0f0435a06..bb08bb812 100644 --- a/man/set_new_model.Rd +++ b/man/set_new_model.Rd @@ -54,7 +54,7 @@ get_encoding(model) \item{eng}{A single character string for the model engine.} \item{parsnip}{A single character string for the "harmonized" argument name -that \code{parsnip} exposes.} +that parsnip exposes.} \item{original}{A single character string for the argument name that underlying model function uses.} @@ -96,7 +96,7 @@ package. \details{ These functions are available for users to add their own models or engines (in a package or otherwise) so that they can -be accessed using \code{parsnip}. This is more thoroughly documented +be accessed using parsnip. This is more thoroughly documented on the package web site (see references below). In short, \code{parsnip} stores an environment object that contains diff --git a/man/show_engines.Rd b/man/show_engines.Rd index cd4ba6c37..31656ed73 100644 --- a/man/show_engines.Rd +++ b/man/show_engines.Rd @@ -7,14 +7,14 @@ show_engines(x) } \arguments{ -\item{x}{The name of a \code{parsnip} model (e.g., "linear_reg", "mars", etc.)} +\item{x}{The name of a parsnip model (e.g., "linear_reg", "mars", etc.)} } \value{ A tibble. } \description{ The possible engines for a model can depend on what packages are loaded. -Some \pkg{parsnip} extension add engines to existing models. For example, +Some parsnip extension add engines to existing models. For example, the \pkg{poissonreg} package adds additional engines for the \code{\link[=poisson_reg]{poisson_reg()}} model and these are not available unless \pkg{poissonreg} is loaded. } diff --git a/man/tidy._LiblineaR.Rd b/man/tidy._LiblineaR.Rd index 03c9b7ba3..8c391f471 100644 --- a/man/tidy._LiblineaR.Rd +++ b/man/tidy._LiblineaR.Rd @@ -7,7 +7,7 @@ \method{tidy}{`_LiblineaR`}(x, ...) } \arguments{ -\item{x}{A fitted \code{parsnip} model that used the \code{LiblineaR} engine.} +\item{x}{A fitted parsnip model that used the \code{LiblineaR} engine.} \item{...}{Not used} } @@ -16,6 +16,6 @@ A tibble with columns \code{term} and \code{estimate}. } \description{ \code{tidy()} methods for the various \code{LiblineaR} models that return the -coefficients from the \code{parsnip} model fit. +coefficients from the parsnip model fit. } \keyword{internal} diff --git a/man/tidy._elnet.Rd b/man/tidy._elnet.Rd index fa55da639..01a0f8c1f 100644 --- a/man/tidy._elnet.Rd +++ b/man/tidy._elnet.Rd @@ -19,7 +19,7 @@ \method{tidy}{`_coxnet`}(x, penalty = NULL, ...) } \arguments{ -\item{x}{A fitted \code{parsnip} model that used the \code{glmnet} engine.} +\item{x}{A fitted parsnip model that used the \code{glmnet} engine.} \item{penalty}{A \emph{single} numeric value. If none is given, the value specified in the model specification is used.} @@ -32,6 +32,6 @@ multinomial mode is used, an additional \code{class} column is included. } \description{ \code{tidy()} methods for the various \code{glmnet} models that return the coefficients -for the specific penalty value used by the \code{parsnip} model fit. +for the specific penalty value used by the parsnip model fit. } \keyword{internal} diff --git a/man/translate.Rd b/man/translate.Rd index 4aa50d4d8..7a8c9c61d 100644 --- a/man/translate.Rd +++ b/man/translate.Rd @@ -34,7 +34,7 @@ It does contain the resolved argument names that are specific to the model fitting function/engine. This function can be useful when you need to understand how -\code{parsnip} goes from a generic model specific to a model fitting +parsnip goes from a generic model specific to a model fitting function. \strong{Note}: this function is used internally and users should only use it diff --git a/vignettes/articles/Submodels.Rmd b/vignettes/articles/Submodels.Rmd index 87158a4b6..7d0592b08 100644 --- a/vignettes/articles/Submodels.Rmd +++ b/vignettes/articles/Submodels.Rmd @@ -18,7 +18,7 @@ theme_set(theme_bw()) Some R packages can create predictions from models that are different than the one that was fit. For example, if a boosted tree is fit with 10 iterations of boosting, the model can usually make predictions on _submodels_ that have less than 10 trees (all other parameters being static). This is helpful for model tuning since you can cheaply evaluate tuning parameter combinations which often results in a large speed-up in the computations. -In `parsnip`, there is a method called `multi_predict()` that can do this. It's current methods are: +In parsnip, there is a method called `multi_predict()` that can do this. It's current methods are: ```{r methods} library(parsnip) diff --git a/vignettes/parsnip.Rmd b/vignettes/parsnip.Rmd index d4e8added..089477b93 100644 --- a/vignettes/parsnip.Rmd +++ b/vignettes/parsnip.Rmd @@ -30,7 +30,7 @@ This package provides functions and methods to create and manipulate functions c Modeling functions across different R packages can have very different interfaces. If you would like to try different approaches, there is a lot of syntactical minutiae to remember. The problem worsens when you move in-between platforms (e.g. doing a logistic regression in R's `glm` versus Spark's implementation). -`parsnip` tries to solve this by providing similar interfaces to models. For example, if you are fitting a random forest model and would like to adjust the number of trees in the forest there are different argument names to remember: +parsnip tries to solve this by providing similar interfaces to models. For example, if you are fitting a random forest model and would like to adjust the number of trees in the forest there are different argument names to remember: * `randomForest::randomForest` uses `ntree`, * `ranger::ranger` uses `num.trees`, @@ -51,7 +51,7 @@ Some terminology: * The **mode** of the model denotes how it will be used. Two common modes are _classification_ and _regression_. Others would include "censored regression" and "risk regression" (parametric and Cox PH models for censored data, respectively), as well as unsupervised models (e.g. "clustering"). * The **computational engine** indicates how the actual model might be fit. These are often R packages (such as `randomForest` or `ranger`) but might also be methods outside of R (e.g. Stan, Spark, and others). -`parsnip`, similar to `ggplot2`, `dplyr` and `recipes`, separates the specification of what you want to do from the actual doing. This allows us to create broader functionality for modeling. +The parsnip package, similar to ggplot2, dplyr and recipes, separates the specification of what you want to do from the actual doing. This allows us to create broader functionality for modeling. ### Placeholders for Parameters @@ -79,7 +79,7 @@ The arguments to the default function are: args(rand_forest) ``` -However, there might be other arguments that you would like to change or allow to vary. These are accessible using `set_engine`. For example, `ranger` has an option to set the internal random number seed. To set this to a specific value: +However, there might be other arguments that you would like to change or allow to vary. These are accessible using `set_engine`. For example, `ranger()` from the ranger package has an option to set the internal random number seed. To set this to a specific value: ```{r rf-seed} rf_with_seed <- @@ -151,7 +151,7 @@ rf_with_seed %>% Note that the call objects show `num.trees = ~2000`. The tilde is the consequence of `parsnip` using [quosures](https://adv-r.hadley.nz/evaluation.html#quosures) to process the model specification's arguments. -Normally, when a function is executed, the function's arguments are immediately evaluated. In the case of `parsnip`, the model specification's arguments are _not_; the [expression is captured](https://www.tidyverse.org/blog/2019/04/parsnip-internals/) along with the environment where it should be evaluated. That is what a quosure does. +Normally, when a function is executed, the function's arguments are immediately evaluated. In the case of parsnip, the model specification's arguments are _not_; the [expression is captured](https://www.tidyverse.org/blog/2019/04/parsnip-internals/) along with the environment where it should be evaluated. That is what a quosure does. -`parsnip` uses these expressions to make a model fit call that is evaluated. The tilde in the call above reflects that the argument was captured using a quosure. +parsnip uses these expressions to make a model fit call that is evaluated. The tilde in the call above reflects that the argument was captured using a quosure.