General purpose Anaconda tips

Published: January 31, 2021   |   Read time:

Tagged:

I use Anaconda for managing my computational software environments. It’s flexible, easy to install (even if you don’t have sudo access), and is well-supported with computational biology tools thanks to bioconda. Here are some pragmatic tips for making conda environments easier to deal with.

Use mamba

Anaconda is, at heart, a Python installation, and is written in Python. To solve which packages and versions are required, it uses the pycosat Python package. While excellent, it can be slow for large repositories like bioconda and conda-forge. This can make solving packaging problems difficult and slow.

That’s where Mamba comes in. It is a C++ application that uses libsolv, a C++ satisfiability solver that manages dependecies for RedHat, Debian, and other Linux operating systems. This makes mamba fast. It also has other advantages, like multi-threaded downloads and better repository indexing. Check out mamba’s announcement post for some more details. You can install it replace almost every conda ... command with mamba ....

Installing it is easy.

conda install -c conda-forge mamba

And you’re ready to start using it. You can replace conda with mamba for almost every command in the rest of this post.

Undo changes with revisions

Anaconda stores the states of environment before and after installing new packages. It helps with rolling back to the earlier set of packages if something goes wrong during a new installation. You can see the state of your environment with

conda list --revisions

and roll back to a previous set of installed packages with

conda install --revision <N>

It’s a great little feature for when you accidentally update your R or Python version and need to update almost every other package in your environment. I rarely see this feature mentioned, but when it is I only ever see this blog post. It’s worth a read and expands a bit more on what I’ve listed here.

Exporting and importing environments

To help others with reproducible builds of your work, listing out packages and versions is useful. An easy way to do this is by pairing conda env export with conda env create.

After installing necessary packages via the command line, save the environment with

conda env export --no-builds > environment.yaml

The --no-builds flag means that you don’t export things related to a specific operating system, like Windows or Mac. It makes things a bit more flexible, but still keeps the rest of the package version number.

With the environment.yaml file, a collaborator (or just yourself on another computer) can recreate the environment with

conda env create -n <NAME> -f environment.yaml

This will build an environment from the environment.yaml file on the new computer.

Offline use

Let’s say you’re on a system that doesn’t have external internet access, like a private cluster partition. Not a problem! Installing packages can be done in offline mode, as long as the packages are locally available, with

conda install --offline

Conda will check the $CONDA_PREFIX/pkgs directory for available packages and solve dependencies based on this limitation. If the package can’t be installed because it requires a dependency that isn’t locally available, it will tell you. You can copy and paste packages into that folder, like from a USB key, to build up your available packages.

Periodically clean your conda folder

Over time, the $CONDA_PREFIX/pkgs folder will fill up with dozens or hundreds of packages. Conda uses these packages when solving dependencies, so if this folder becomes very cluttered, it can take an installation command a long time to finish.

Periodic cleaning with conda clean can remove downloaded packages from this folder, making future installation commands faster. Obviously, don’t do this if you’re only using conda install --offline. The clean subcommand can be used in a few ways to remove different types of files.

# remove only the package tarballs
conda clean -t

# remove the extracted packages
conda clean -p

# remove the index of packages to force a reindexing
conda clean -i

# remove everything
conda clean -a

I do this when I find my install commands taking a long time. If you want to, you could make a cron job out of it so you get the benefits automatically without thinking about it.

Build your own conda packages

Sometimes you find a Python package, or something from another language, that isn’t available from the default conda channels or conda-forge. No worries, you can often build your own packages pretty easily with conda build.

Even better, if the package you want to make a conda pacakge out of is on a major repository like PyPI or CRAN, you can use conda skeleton. This builds the conda package using structured information from the repository itself.

In many cases, you don’t have to do anything special for making a conda package. Sometimes you’ll have to modify the resultant meta.yaml, build.sh, or build.bat files, created from the conda skeleton command, if the original package requires special build commands or libraries.

I do this occasionally and have a few special conda packages in my own git repo. I then store these conda packages on the Anaconda.org hosting service with anaconda-client and add my personal channel to my conda configuration with

conda config --add channels <my_anaconda.org_username>

Conclusions

Anaconda is an extremely useful tool to help you get things done. I hope some of these tips will prove useful for people trying it out for the first time, or are hoping to find new ways to use it more effectively.