The purpose of the ‘mustashe’ R package is to save objects that result from some computation, then load the object from file the next time the computation is performed. In other words, the first time a chunk of code is evaluated, the output can be stashed for the next time the code chunk is run.
Below is a brief example outlining the use of the primary function
from the package, stash()
. First we must load the
‘mustashe’ library.
Say we are performing a long-running computation (simulated here
using Sys.sleep()
to pause for a few seconds) that produces
the object named x
. The name of the object to stash
"x"
and the code itself are passed to stash()
as follows. (I used the package ‘tictoc’ to time
the execution of the code.)
tic("long-running computation")
stash("x", {
Sys.sleep(5)
x <- 5
})
#> Stashing object.
toc()
#> long-running computation: 5.459 sec elapsed
‘mustashe’ tells us that the object was stashed, and we can see that
x
was successfully assigned the value 5.
x
#> [1] 5
Say we are done for the day, so we close RStudio and go home. When we
return the next day and continue on the same analysis, we really don’t
want to have to run the same computation again since it will have the
same result as yesterday. Thanks to ‘mustashe’, the code is not
evaluated and, instead, the object x
is loaded from
file.
tic("long-running computation")
stash("x", {
Sys.sleep(5)
x <- 5
})
#> Loading stashed object.
toc()
#> long-running computation: 0.018 sec elapsed
That’s the basic use case of ‘mustashe’! Any issues and feedback can be submitted here. Continue reading below for explanations of other useful features of ‘mustashe’.
cache()
function?
Originally I tried to use the cache()
function from ‘ProjectTemplate’,
but ran into a few problems.
The first was, to use it without modification, I would need to be using the ‘ProjectTemplate’ system for my whole analysis project. It first checks if all of the expected directories and components are in place, throwing an error when they are not.
ProjectTemplate::cache("x")
#> Current Directory: mustashe is not a valid ProjectTemplate directory because one or more mandatory directories are missing. If you believe you are in a ProjectTemplate directory and seeing this message in error, try running migrate.project(). migrate.project() will ensure the ProjectTemplate structure is consistent with your version of ProjectTemplate.
#> Change to a valid ProjectTemplate directory and run cache() again.
#> Error in .quietstop():
I then tried copying the source code for the cache()
function to my project and tweaking it to work (mainly removing internal
checks for ‘ProjectTemplate’ system). I did this and thought it was
working: on the first pass it would cache the result, and on the second
it would load from the cache. However, in a new session of R, it would
not just load from the cache, but, instead, evaluate the code and cache
the results. After a bit of exploring the cache()
source
code, I realized the problem was that ‘ProjectTemplate’ compares the
current value of the object to be cached with the object that is cached.
Of course, this requires the object to be in the environment already,
which it is in a ‘ProjectTemplate’ system after running
load.project()
because that loads the cache (lazily) into
the R environment. I do not want this behavior, and thus the caching
system used by ‘ProjectTemplate’ was insufficient for my needs.
That said, I heavily relied upon the code for
cache()
when creating stash()
. This would have
been far more difficult to do without reference to
‘ProjectTemplate’.
There are two major features of the stash()
function
from ‘mustashe’ not covered in the basic example above:
stash()
and
will re-evaluate the code if it has changed.These two features are demonstrated below.
If the code that creates an object changes, then the object itself is likely to have changed. Thus, ‘mustashe’ “remembers” the code and re-evaluates the code if it has been changed. Here is an example, again using ‘tictoc’ to indicate when the code is evaluated.
However, ‘mustashe’ is insensitive to changes in comments and other style-based adjustments to the code. In the next example, a comment has been added, but we see that the object is loaded from the stash.
And below is the code from a horrible person, but ‘mustashe’ still loads the object from the stash.
Dependencies can be explicitly linked to an object to make sure that if they change, the stashed object is re-evaluated. “Dependency” in this case could refer to data frames that are used to create another (e.g. summarizing a data frame’s columns), inputs to a function, etc.
The following demonstrates this with a simple example where
x
is used to calculate y
. By passing
"x"
to the depends_on
argument, when the value
of x
is changed, the code to create y
is
re-evaluated
x <- 1
stash("y", depends_on = "x", {
y <- x + 1
})
#> Stashing object.
# Value of `y`
y
#> [1] 2
The second time this is run without changing x
, the
value for y
is loaded from the stash.
stash("y", depends_on = "x", {
y <- x + 1
})
#> Loading stashed object.
However, if we change the value of x
, then the code is
re-evaluated and the stash for y
is updated.
x <- 100
stash("y", depends_on = "x", {
y <- x + 1
})
#> Updating stash.
# Value of `y`
y
#> [1] 101
Multiple dependencies can be passed as a vector to
depends_on
.
To round up the explanation of the ‘mustashe’ package, the stash can
be cleared using unstash()
and specific stashes can be
removed using unstash()
.
unstash("a")
#> Unstashing 'a'.
clear_stash()
#> Clearing stash.
In the examples above, stash()
does not return a value
(actually, it invisibly returns NULL
), instead assigning
the result of the computation to an object named using the
var
argument. Frequently, though, a return value is
desired. This behavior can be induced by setting the argument
functional = TRUE
.
The stash()
function can take other functions as
dependencies. The body and formals components of the function object are
checked to see if they have changed. (More information on the structure
of function objects in R can be found in Hadley Wickham’s Advanced
R - Functions: Function components.)
As an example, suppose you have a script with the following code. It
is run, and the value of 5 is stashed for a
and it is
dependent on the function add_x()
.
add_x <- function(y, x = 2) {
y + x
}
stash("a", depends_on = "add_x", {
a <- add_x(3)
})
#> Stashing object.
a
#> [1] 5
You continue working and change the function add_x()
to
use the default value of 5 instead of 2. This change will cause the code
for a
to be re-run and a
will be assigned the
value 8. Note that the code in the code
argument for
stash()
did not change, the code was re-run because a
dependency changed.
add_x <- function(y, x = 5) {
y + x
}
stash("a", depends_on = "add_x", {
a <- add_x(3)
})
#> Updating stash.
a
#> [1] 8
stash()
in functions
Because of the careful management of R environments,
stash()
can be used inside of functions. In the example
below, note that the stashed object will depend on the value of the
magic_number
object in the function.
magic_number <- 10
do_data_science <- function() {
magic_number <- 5
stash("rand_num", depends_on = c("magic_number"), {
runif(1, 0, 10)
})
return(rand_num)
}
do_data_science()
#> Stashing object.
#> [1] 9.397425
Changing the value of the magic_number
object in the
global environment will not invalidate the stash.
magic_number <- 11
do_data_science()
#> Loading stashed object.
#> [1] 9.397425
It is also possible to stash the results of sourcing and R script. If
the script changes, it will be re-sourced the next time around. Also,
the natural behavior of the source()
function is maintained
by returning the last evaluated value.
# Write a temporary R script.
temp_script <- tempfile()
write("print('Script to get 5 letters'); sample(letters, 5)", temp_script)
x <- stash_script(temp_script)
#> Stashing object.
#> [1] "Script to get 5 letters"
x
#> [1] "s" "m" "u" "k" "p"
x2 <- stash_script(temp_script)
#> Loading stashed object.
x2
#> [1] "s" "m" "u" "k" "p"
The ‘here’ package is useful for
handling file paths in R projects, particularly when using an RStudio
project. The main function, here::here()
, can be used to
create the file path for stashing an object by setting the ‘mustashe’
configuration option with the config_mustashe()
function.
config_mustashe(use_here = TRUE)
This behavior can be turned off, too.
config_mustashe(use_here = FALSE)
Defaults for the verbose
and functional
(see above) arguments of stashing functions can also be configured. For
example, you can have the functions run silently and return the result
by default.
config_mustashe(verbose = FALSE, functional = TRUE)
Any issues and feedback on ‘mustashe’ can be submitted here. Alternatively, I can be reached through the contact form on my website or on Twitter @JoshDoesa