If you are familiar with the concept of package managers, you can skip to the next section.
To ensure that a project is reproducible you need to deal with at least four things:
{rJava}
R
package on Linux);For the three first bullet points, the consensus seems to be a
mixture of Docker to deal with system dependencies, {renv}
for the packages (or {groundhog}
, or a fixed CRAN snapshot
like those Posit
provides) and the R
installation manager to install the correct version of R (unless you
use a Docker image as base that already ships the required version by
default).
As for the last bullet point, the only way out is to be able to compile the software for the target architecture. There’s a lot of moving pieces, and knowledge that you need to have to get it right.
But it turns out that this is not the only solution. Docker +
{renv}
(or some other way to deal with R packages) is
likely the most popular way to ensure reproducibility of your projects,
but there are other tools to achieve this. One such tool is called
Nix.
Nix is a package manager for Linux distributions, macOS and it even works on Windows if you enable WSL2. What’s a package manager? If you’re not a Linux user, you may not be aware. Let me explain it this way: in R, if you want to install a package to provide some functionality not included with a vanilla installation of R, you’d run this:
install.packages("dplyr")
It turns out that Linux distributions, like Ubuntu for example, work in a similar way, but for software that you’d usually install using an installer (at least on Windows). For example you could install Firefox on Ubuntu using:
sudo apt-get install firefox
(there’s also graphical interfaces that make this process “more
user-friendly”). In Linux jargon, packages
are simply what
we call software (or I guess it’s all “apps” these days). These packages
get downloaded from so-called repositories (think of CRAN, the
repository of R packages) but for any type of software that you might
need to make your computer work: web browsers, office suites, multimedia
software and so on.
So Nix is just another package manager that you can use to install software.
But what interests us is not using Nix to install Firefox, but instead to install R and the R packages that we require for our analysis (or any other programming language that we need). But why use Nix instead of the usual ways to install software on our operating systems?
The first thing that you should know is that Nix’s repository,
nixpkgs
, is huge. Humongously huge. As I’m writing these
lines, there’s more than
120’000 pieces of software available, and the entirety of CRAN
and Bioconductor is also available through nixpkgs
. So
instead of installing R as you usually do and then use
install.packages()
to install packages, you could use Nix
to handle everything. But still, why use Nix at all?
Nix has an interesting feature: using Nix, it is possible to install software in (relatively) isolated environments. So using Nix, you can install as many versions of R and R packages that you need. Suppose that you start working on a new project. As you start the project, with Nix, you would install a project-specific version of R and R packages that you would only use for that particular project. If you switch projects, you’d switch versions of R and R packages.
If you are familiar with {renv}
, you should see that
this is exactly the same thing: the difference is that not only will you
have a project-specific library of R packages, you will also have a
project-specific R version. So if you start a project now, you’d have R
version 4.2.3 installed (the latest version available in
nixpkgs
but not the latest version available, more on this
later), with the accompagnying versions of R packages, for as long as
the project lives (which can be a long time). If you start a project
next year, then that project will have its own R, maybe R version 4.4.2
or something like that, and the set of required R packages that would be
current at that time. This is because Nix always installs the software
that you need in separate, (isolated) environments on your computer. So
you can define an environment for one specific project.
But Nix goes even further: not only can you install R and R packages
using Nix (in isolated) project-specific environments, Nix even installs
the required system dependencies. So for example if I need
{rJava}
, Nix will make sure to install the correct version
of Java as well, always in that project-specific environment (so if you
already some Java version installed on your system, there won’t be any
interference).
What’s also pretty awesome, is that you can use a specific version of
nixpkgs
to always get exactly the same
versions of all the software whenever you build that
environment to run your project’s code. The environment gets defined in
a simple plain-text file, and anyone using that file to build the
environment will get exactly, byte by byte, the same environment as you
when you initially started the project. And this also regardless of the
operating system that is used.
Nix is a package manager that can be installed on your computer
(regardless of OS) and can be used to install software like with any
other package manager. If you’re familiar with the Ubuntu Linux
distribution, you likely have used apt-get
to install
software. On macOS, you may have used homebrew
for similar
purposes. Nix functions in a similar way, but has many advantages over
classic package managers, as it focuses on reproducible builds and
downloads packages from nixpkgs
, currently the largest
software repository.
This means that using Nix, it is possible to install not only R, but
also all the packages required for your project. The obvious question is
why use Nix instead of simply installing R and R packages as usual. The
answer is that Nix makes sure to install every dependency of any
package, up to required system libraries. For example, the
{xlsx}
package requires the Java programming language to be
installed on your computer to successfully install. This can be
difficult to achieve, and {xlsx}
bullied many R developers
throughout the years (especially those using a Linux distribution,
sudo R CMD javareconf
still plagues my nightmares).
But with Nix, it suffices to declare that we want the
{xlsx}
package for our project, and Nix figures out
automatically that Java is required and installs and configures it. It
all just happens without any required intervention from the user. The
second advantage of Nix is that it is possible to pin a certain
revision of the Nix packages’ repository (called
nixpkgs
) for our project. Pinning a revision ensures that
every package that Nix installs will always be at exactly the same
versions, regardless of when in the future the packages get
installed.
The idea of {rix}
is for you to declare the environment
you need using the provided rix()
function.
rix()
is the package’s main function and generates a file
called default.nix
which is then used by the Nix package
manager to build that environment. Ideally, you would set up such an
environment for each of your projects. You can then use this environment
to either work interactively, or run R scripts. It is possible to have
as many environments as projects, and software that is common to
environments will simply be re-used and not get re-installed to save
space. Environments are isolated for each other, but can still interact
with your system’s files, unlike with Docker where a volume must be
mounted. Environments can also interact with the software installed on
your computer through the usual means, which can sometimes lead to
issues. For example, if you already have R installed, and a user library
of R packages, more caution is required to properly use environments
managed by Nix.
It is important at this stage to understand that you should not call
install.packages()
from a running Nix environment. If you
want to add packages to a Nix environment while analyzing data, you need
to add it the default.nix
expression and rebuild the
environment. This is explained in greater detail in
vignette("d1-installing-r-packages-in-a-nix-environment")
.
To avoid interference between your main R library of packages and
your Nix environments, calling rix()
will also run
rix_init()
, which will create a custom
.Rprofile
in the project’s directory. This
.Rprofile
will ensure that if you have a user library of
packages, these won’t get loaded by an R version running in a Nix shell.
It will also redefine install.packages()
to throw an error
if you try to use it.
rix()
has several arguments:
.tar.gz
format to
install;default.nix
file (by default the
current working directory)For example:
The call above writes a default.nix
file in the current
working directory. This default.nix
can in turn be used by
Nix to build an environment containing the latest version of R included
in the upstream nixpkgs
, with the {dplyr}
and
{chronicler}
packages.
Take note of the ide = "other"
argument: this argument,
and the values it can take, are further discussed in the vignette
vignette("e-interactive-use")
but continue reading this
vignette and then vignettes numbered by a “d”.
You can instead provide a specific R version:
or a date:
Providing a specific R version or a date will not use the
upstream/official nixpkgs
package repository, but our
rstats-on-nix
fork. We decided to use a fork instead of the
official nixpkgs
for older releases, because it allows us
to backport fixes and improve package compatibility, which is espcially
important for Apple Silicon. It also allows us to provide newer releases
of R more quickly than through the official channels. For example, as of
writing (December 18th 2024), R version 4.4.2 is not yet included in
upstream nixpkgs
but is already available through our fork.
Thanks to our fork, we are also able to snapshot CRAN more frequently,
and thus include more CRAN and Bioconductor package versions than in the
upstream nixpkgs
repository.
{rix}
also includes an renv2nix()
function
that converts an renv.lock
file into a valid Nix
expression. Read the vignette vignette("f-renv2nix")
to
learn more.
The Nix package manager can be used to build reproducible development
environments according to the specifications found in the generated
default.nix
files, which contain a Nix expression.
An expression is Nix jargon for a function with multiple inputs
and one output, this output being our development environment.
{rix}
does not require Nix to be installed to generate
valid expressions (but does require an internet connection), so you
could generate expressions and use them on other machines. To actually
build an environment using a default.nix
file, go to where
you chose to write it (ideally in a new, empty folder that will be the
root folder of your project) and use the Nix package manager to build
the environment. Call the following function in a terminal:
nix-build
Nix install packages in a dedicated folder on your computer, called the Nix store.
Once Nix is done building the environment, you can start working on it interactively by using the following command in a terminal emulator (not the R console):
nix-shell
You will drop into a Nix shell which provides the installed
software. It is not mandatory to call nix-build
first: you
can immediately call nix-shell
. The advantage of using
nix-build
first is that it creates a file called
result
which will prevent the environment to get garbage
collected if you clean the Nix store.
If you want to build an environment for an older version of R, you might get a warning telling you that you cannot build the expression, but that you can directly drop into it.
If you want to completely isolate your Nix environment from the rest
of the system, we recommend using nix-shell --pure
to drop
into the environment, as described in the documentation of
rix_init()
.
Finally, if you want to delete an environment, delete the
result
file first (if you used nix-build
) and
then call nix-store --gc
, which will delete all the
orphaned packages.
Now that you know more about Nix and {rix}
, it is time
to get these tools installed on your system.
vignette("b1-setting-up-and-using-rix-on-linux-and-windows")
vignette("b2-setting-up-and-using-rix-on-macos")