ModelTrackeR
Description¶
This tool is designed to keep track of all changes made to your models, and to track any metrics you want to run on your models. For example if you make many changes to your model's formula, just run
track(model)
after every iteration to store all the models you tested and it will track them, store them in a dataframe, and sort them by time. It can also keep track of metrics, functions you want to run on your model, and custom metrics.
Usage¶
track(metrics, metrics=NULL, customMetrics=NULL, ...)
Arguments¶
metrics
A list of strings that exist as attributes on the model.
customFunctions
A list of strings that exists as functions you have defined.
...
Additional arguments to be entered as new columns in the MODELS
dataframe.
Source code¶
library(plyr)
library(dplyr)
track <- function(model, metrics=NULL, customFunctions=NULL, ...){
kwargs <- list(...);
environmentVars <- list(
datetime = format(Sys.time(), "%Y-%m-%d %H:%M:%S"),
username = Sys.info()[["user"]],
className = class(model)[[1]]
)
form <- data.frame(formula=paste(deparse(formula(model)), collapse=''));
for (env in names(environmentVars)){
form[[env]] <- environmentVars[[env]];
}
if (!is.null(metrics)){
for (metric in metrics){
form[[metric]] <- summary(model)[[metric]];
}
}
if (!is.null(customFunctions)){
for (cmetric in customFunctions){
form[[cmetric]] <- get(cmetric)(model);
}
}
for (arg in names(kwargs)){
form[[arg]] <- kwargs[[arg]];
}
DIRTY_METRICS <- FALSE;
if (exists('METRICS_LIST')){
DIRTY_METRICS <- FALSE %in%
unique(
append(
names(kwargs), unlist(
append(
metrics, customFunctions
)
)
) == METRICS_LIST
)
if (DIRTY_METRICS){
MMODELS <<- form;
}
}
METRICS_LIST <<- append(
names(kwargs), unlist(
append(
metrics, customFunctions
)
)
)
if (exists('MODELS')){
COUNTER <<- COUNTER + 1;
} else {
MODELS <<- form;
COUNTER <<- 1;
}
if (exists('METRICS_LIST') & DIRTY_METRICS){
for (env in names(environmentVars)){
MMODELS[[env]] <<- environmentVars[[env]];
}
if (!is.null(metrics)){
for (metric in metrics){
MMODELS[[metric]] <<- summary(model)[[metric]];
}
}
if (!is.null(customFunctions)){
for (cmetric in customFunctions){
MMODELS[[cmetric]] <<- get(cmetric)(model);
}
}
for (arg in names(kwargs)){
MMODELS[[arg]] <<- kwargs[[arg]];
}
}
if (COUNTER > 1){
if (DIRTY_METRICS){
combinedDf <- rbind.fill(MODELS, MMODELS);
MODELS <<- combinedDf[with(combinedDf, order(datetime, decreasing=TRUE)),];
} else {
combinedDf <- rbind.fill(MODELS, form);
MODELS <<- combinedDf[with(combinedDf, order(datetime, decreasing=TRUE)),];
}
}
return(form);
}
Read in data¶
df <- read.csv('middle_tn_schools.csv')
head(df)
ModelTracker default options¶
By default, ModelTracker records your model's formula, the execution timestamp, the username, and the model's class name.
model.lm <- lm(avg_score_16 ~ stu_teach_ratio + school_type, data=df)
summary(model.lm)
track(model.lm)
Details about the models you have tested are stored in MODELS
.
MODELS
Metrics¶
track
accepts an argument called metrics
. This has to be an attribute that exists on your model's summary object. To extract it, put the attribute name in a list of strings like
list('r.squared', 'sigma')
We'll also add another variable (state_percentile_15
) to our formula to display how ModelTracker keeps track of the formulas.
model.lm <- lm(avg_score_16 ~ stu_teach_ratio + school_type + state_percentile_15, data=df)
metrics <- list('r.squared', 'sigma')
summary(model.lm)
track(model.lm, metrics=metrics)
Notice our MODELS
object now contains two additional metrics, r.squared and sigma. The first model, since it wasn't tracking those metrics, shows NA for those values.
MODELS
Custom functions¶
ModelTracker also accepts an argument called customFunctions
. This is for defining custom functions that accept the model as input and return a single value. Be sure that the functions only accept one argument, your model.
For example, I define two functions, getr2
and getFstat
.
getr2 <- function(model){
return(summary(model)$r.squared);
}
getFstat <- function(model){
return(summary(model)$fstatistic[['value']])
}
Now I will create a list and put my custom functions in them then pass them to track
. Note, you must specify them as strings.
model.lm <- lm(avg_score_16 ~ reduced_lunch + poly(size, 2), data=df)
summary(model.lm)
customFunctions <- list('getr2', 'getFstat')
track(model.lm, customFunctions=customFunctions)
Notice that the MODELS
object sorts your models by time in descending order.
MODELS
Custom metrics¶
Finally, my favorite. This feature lets you track any metric by calculating it before running track
and then just inputting the value into our track
call.
For instance, here I calculate pseudoR2
and retrieve the coefficient for reduced_lunch
. Then I add them to track
using whatever names I want; here I use pseudoR2
and reducedLunchCoef
.
model.glm <- glm(avg_score_16 ~ I(reduced_lunch^2) + sqrt(size) + factor(school_type),
data=df)
summary(model.glm)
pseudoR2 <- 1-(model.glm$deviance/model.glm$null.deviance)
reducedLunchCoef <- summary(model.glm)$coefficients[2,1]
track(model.glm, pseudoR2=pseudoR2, reducedLunchCoef=reducedLunchCoef)
Whatever name you pass to track
will show up as the column name.
MODELS