The woes of using bioinformatics software: a case study in trying to install ChAMP

Published: July 18, 2018   |   Read time:

Tagged:

Image Attribution:

Making high-quality bioinformatics software is hard. Trying to do so, myself, I have come face to face with many problems encountered by anyone who tries to do this. And while there are many, many people dedicated to making software for research purposes easier to use and somewhat manageable, there are still an overwhelming number of cases of poor-quality software that exist for a host of reasons.

What I’m not going to do with this blog post is lament about the state of bioinformatics software, generally, or talk about some of the causes that lead to issues in this area. There are people who’ve been around longer than me who have a better perspective on this and how to tackle these problems. There are a number of blog posts and papers written about the topic that are worth reading, if this interests you1.

What I will do is give a detailed example of the consequences of trying to use software that’s not well designed. I’m doing this in the hopes of giving a detailed written example of how much time is wasted by the users of these software tools trying to get it to even function in the first place.

Moreover, I don’t want this to be read as a rant against the developers of this software tool. I’ve seen their work in other contexts and I respect what they’ve done. But I want to highlight that all their effort for this project may be in vain because of its low degree of usability.

The case example is a tool for calculating differentially methylated regions for DNA methylation data, ChAMP. For my work, I’m looking at DNA methylation, and this is one of a handful of tools available for this use case.

Below are my notes that I wrote while attempting to install ChAMP. I had tried once in the past, and ran into minor issues and gave up. So I wanted to give it a legitimate try, and make sure I had a map of the things I tried in case I ever needed to do this again.

Spoiler: I don’t end up installing the R package. But hopefully the detailed notes are useful for anyone else going down this path.

Environment

Key Value
OS macOS High Sierra
RAM 16 GB 1600 MHz DDR3
CPU 3.2 GHz Intel Core i5
Model 27” iMac, late 2013
conda v4.5.5 with Python 3.6

Process

Fresh environment

I’m starting by creating a fresh environment using Anaconda.

> conda create -n champ-test r-base
> conda activate champ-test

Installing ChAMP from Bioconductor

Then, in a new R session, I install ChAMP according to the Bioconductor page

> source("https://bioconductor.org/biocLite.R")
> biocLite("ChAMP")

There are a number of warnings generated during the installation process. These seem to be related to the availability of certain dependent packages. The official ChAMP documentation recommends trying the installation again a few times, as each iteration will install some further missing dependencies.

This is a mark of really well designed software, I’m very happy to see they’ve noted this /s

This installation process has taken over 10 minutes from scratch and failed on the first run. I’ve decided to update certain Bioconductor packages after being asked, even though I’ve just installed them all for the first time.

The main package giving me errors is stringr, so I’ll try installing that from conda and trying again.

> conda install r::r-stringr -y

Then from R:

> biocLite("ChAMP")

This attempt also failed. I, again, decided to update all packages when Bioconductor asked me (stringi and stringr failed).

rhdf5 did not install properly and was the first package to do so. There were also a number of packages that couldn’t install due to the C compiler. I’m not sure if this is a machine-specific problem or not. These packages are:

  • httpuv
  • XML
  • affyio
  • Rcurl
  • igraph
  • preprocessCore

I’m installing them all via conda and trying again.

> conda install \
    bioconda::bioconductor-rhdf5 \
    r::r-httpuv \
    r::r-xml2 \
    bioconda::bioconductor-affyio \
    r::r-rcurl \
    r::r-igraph \
    bioconda::bioconductor-preprocessCore -y
> biocLite("ChAMP")

Note: this has actually downgraded R (from 3.4.3 to 3.4.1) and a few other packages. This lead to R not being able to run, even after deactivating and reactivating the environment.

Resolving the dyld library issue

The error is

> R
dyld: Library not loaded: @rpath/libintl.9.dylib
  Referenced from: /Users/hawleyj/anaconda3/envs/champ-test/lib/R/lib/libR.dylib
  Reason: image not found
Abort trap: 6

I’ve seen this issue previously, but I can’t remember how I resolved it. A couple questions on StackOverflow recommend removing everything and starting from scratch using CRAN instead of conda. That’s lame, I don’t want to do that.

After seeing this GitHub issue, I’m updating conda to the latest version (4.5.8), then installing r-essentials.

> conda activate base
> conda update conda
> conda activate champ-test
> conda install r::r-essentials -y

This didn’t work, I’m still getting the same error. My environment variables are set like so:

Key Value
DYLD_LIBRARY_PATH NA
DYLD_FALLBACK_LIBRARY_PATH /Users/hawleyj/anaconda3/lib:/usr/local/lib:/usr/lib:

I’m updating DYLD_LIBRARY_PATH like so:

> export DYLD_LIBRARY_PATH=/Users/hawleyj/anaconda3/lib:/usr/local/lib:/usr/lib:$DYLD_LIBRARY_PATH

This didn’t solve the issue and actually created more errors.

> conda activate base
Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 18, in <module>
    from urllib.request import url2pathname
  File "/Users/hawleyj/anaconda3/lib/python3.6/urllib/request.py", line 2585, in <module>
    from _scproxy import _get_proxy_settings, _get_proxies
ImportError: dlopen(/Users/hawleyj/anaconda3/lib/python3.6/lib-dynload/_scproxy.cpython-36m-darwin.so, 2): Symbol not found: __cg_jpeg_resync_to_restart
  Referenced from: /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO
  Expected in: /Users/hawleyj/anaconda3/lib/libJPEG.dylib
 in /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/cli/main.py", line 97, in main
    from ..activate import main as activator_main
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/activate.py", line 13, in <module>
    from .base.context import ROOT_ENV_NAME, context, locate_prefix_by_name
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/base/context.py", line 23, in <module>
    from ..common.configuration import (Configuration, LoadError, MapParameter, PrimitiveParameter,
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/configuration.py", line 33, in <module>
    from .path import expand
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 21, in <module>
    from urllib import unquote, url2pathname  # NOQA
ImportError: cannot import name 'unquote'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 18, in <module>
    from urllib.request import url2pathname
  File "/Users/hawleyj/anaconda3/lib/python3.6/urllib/request.py", line 2585, in <module>
    from _scproxy import _get_proxy_settings, _get_proxies
ImportError: dlopen(/Users/hawleyj/anaconda3/lib/python3.6/lib-dynload/_scproxy.cpython-36m-darwin.so, 2): Symbol not found: __cg_jpeg_resync_to_restart
  Referenced from: /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO
  Expected in: /Users/hawleyj/anaconda3/lib/libJPEG.dylib
 in /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/bin/conda", line 11, in <module>
    sys.exit(main())
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/cli/main.py", line 108, in main
    init_loggers()
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/cli/main.py", line 55, in init_loggers
    from ..gateways.logging import initialize_logging, set_verbosity
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/gateways/logging.py", line 12, in <module>
    from ..common.io import attach_stderr_handler
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/io.py", line 24, in <module>
    from .path import expand
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 21, in <module>
    from urllib import unquote, url2pathname  # NOQA
ImportError: cannot import name 'unquote'
(champ-test) hawleyj@opennet-33-233:~$ conda deactivate
Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 18, in <module>
    from urllib.request import url2pathname
  File "/Users/hawleyj/anaconda3/lib/python3.6/urllib/request.py", line 2585, in <module>
    from _scproxy import _get_proxy_settings, _get_proxies
ImportError: dlopen(/Users/hawleyj/anaconda3/lib/python3.6/lib-dynload/_scproxy.cpython-36m-darwin.so, 2): Symbol not found: __cg_jpeg_resync_to_restart
  Referenced from: /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO
  Expected in: /Users/hawleyj/anaconda3/lib/libJPEG.dylib
 in /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/cli/main.py", line 97, in main
    from ..activate import main as activator_main
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/activate.py", line 13, in <module>
    from .base.context import ROOT_ENV_NAME, context, locate_prefix_by_name
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/base/context.py", line 23, in <module>
    from ..common.configuration import (Configuration, LoadError, MapParameter, PrimitiveParameter,
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/configuration.py", line 33, in <module>
    from .path import expand
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 21, in <module>
    from urllib import unquote, url2pathname  # NOQA
ImportError: cannot import name 'unquote'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 18, in <module>
    from urllib.request import url2pathname
  File "/Users/hawleyj/anaconda3/lib/python3.6/urllib/request.py", line 2585, in <module>
    from _scproxy import _get_proxy_settings, _get_proxies
ImportError: dlopen(/Users/hawleyj/anaconda3/lib/python3.6/lib-dynload/_scproxy.cpython-36m-darwin.so, 2): Symbol not found: __cg_jpeg_resync_to_restart
  Referenced from: /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO
  Expected in: /Users/hawleyj/anaconda3/lib/libJPEG.dylib
 in /System/Library/Frameworks/ImageIO.framework/Versions/A/ImageIO

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hawleyj/anaconda3/bin/conda", line 11, in <module>
    sys.exit(main())
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/cli/main.py", line 108, in main
    init_loggers()
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/cli/main.py", line 55, in init_loggers
    from ..gateways.logging import initialize_logging, set_verbosity
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/gateways/logging.py", line 12, in <module>
    from ..common.io import attach_stderr_handler
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/io.py", line 24, in <module>
    from .path import expand
  File "/Users/hawleyj/anaconda3/lib/python3.6/site-packages/conda/common/path.py", line 21, in <module>
    from urllib import unquote, url2pathname  # NOQA
ImportError: cannot import name 'unquote'

I’m reverting that environment variable change via export DYLD_LIBRARY_PATH="", which avoided that problem, but doesn’t fix my R problem.

I’m going to try that advice on StackOverflow and start from scratch.

Installing ChAMP: Take 2

I’ve deleted the old conda environment and made a new one.

> conda deactivate
> conda env remove -n champ-test
> conda create -n champ-test2
> conda activate champ-test2

Then installed all the packages I think I require before installing ChAMP.

> conda install \
    r::r-base \
    r::r-stringr \
    bioconda::bioconductor-rhdf5 \
    r::r-httpuv \
    r::r-xml2 \
    bioconda::bioconductor-affyio \
    r::r-rcurl \
    r::r-igraph \
    bioconda::bioconductor-preprocessCore \
    bioconda::bioconductor-biocinstaller \
    bioconda::bioconductor-biobase \
    bioconda::bioconductor-biocgenerics \
    bioconda::bioconductor-genomicranges \
    bioconda::bioconductor-genomeinfodb \
    bioconda::bioconductor-summarizedexperiment \
    bioconda::bioconductor-annotate \
    bioconda::bioconductor-rtracklayer \
    -y

The dyld library error has returned. Something about the combination of packages is stopping R from running properly.

Installing ChAMP: Take 3

I’ve deleted the old conda environment and made a new one.

> conda deactivate
> conda env remove -n champ-test -y
> conda create -n champ-test3
> conda activate champ-test3

To make sure it’s working properly:

> conda install r::r-base -y
> R --version
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
http://www.gnu.org/licenses/.

This works properly. I think the issue comes from when R gets downgraded to v3.4.1. Let’s see what makes this happen

> conda install \
    r::r-stringr \
    r::r-httpuv \
    r::r-xml2 \
    r::r-rcurl \
    r::r-igraph \
    -y  # no issue here

It looks like the bioconductor packages have incompatibilities with r::r-rcurl.

> conda install \
    bioconda::bioconductor-affyio \
    bioconda::bioconductor-rhdf5 \
    bioconda::bioconductor-preprocessCore
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - bioconda::bioconductor-affyio
  - r::r-rcurl
Use "conda info <package>" to see the dependencies for each package.

I’ll try to install them via Bioconductor instead of conda.

> source("https://bioconductor.org/biocLite.R")
> biocLite()  # update to newest version
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.3 (2017-11-30).
Old packages: 'emdbook', 'RcppArmadillo', 'BH', 'httpuv', 'igraph', 'irlba',
  'Matrix', 'Rcpp', 'RCurl', 'stringi', 'stringr', 'xml2'
Update all/some/none? [a/s/n]: a
# ...
> biocLite("ChAMP")

Time check in: it has now been 1.5 h since I started this process and have made little progress. I love this. This does wonders for my sanity /s

So this is still not installing all the packages I need. I’m going to see if I can figure out that C compiler error from before.

Fixing the C compiler

This StackOverflow question suggests my gcc is out of date, but that page is 6 years old.

> gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I’m trying to update gcc from Homebrew to see if that changes anything.

> brew upgrade gcc

And I need to make sure that new compiler is used by default, by making ~/.R/Makevars and adding:

> CC=gcc-8
> CXX=g++-8

Conclusion: This did not resolve my issues.

Back to Conda

I’m going to try uninstalling r-xml2 and install r-xml.

> conda uninstall r-xml2 -y
> conda install r::r-xml -y
> source("https://bioconductor.org/biocLite.R")
> biocLite("ChAMP")

This might be working now. The rhdf5 package did not want to install before, but is now being installed properly. I think this is because of the Homebrew-based gcc being used by default in R now. It still ran into some errors and didn’t install, but this is some form of progress.

Time check in: now close to 2.5h with minimal progress. I’m starting to think I should avoid using ChAMP like the plague. This amount of effort to install an R package is insane.

Some more packages installed, but not all. I’ll try conda again.

> conda install bioconda::bioconductor-minfi -y

This won’t install because of package conflicts. It wants me to downgrade my R version to 3.3.1, which is likely to cause some major damage.

Installing ChAMP: Take 4 - Switching OS

I’m switching OSs to see if that changes anything. I had run into the dyld issue previously on my mac, and it appears to only occur when R is less than a certain version (i.e. < 3.4.2). But I’m running R 3.3.2 on a cluster I have access to (running CentOS), so maybe I can get around some of the issues I’ve run into above.

Creating a new environment, and installing as many dependencies listed on ChAMP’s Bioconductor page as I can:

> conda create -n champ-test
> conda activate champ-test
> conda install \
    r::r-base \
    r::r-stringr \
    r::r-httpuv \
    r::r-xml \
    r::r-rcurl \
    r::r-igraph \
    bioconda::bioconductor-rhdf5 \
    bioconda::bioconductor-affyio \
    bioconda::bioconductor-preprocessCore \
    bioconda::bioconductor-genomicranges \
    bioconda::bioconductor-genomicranges \
    bioconda::bioconductor-minfi \
    bioconda::bioconductor-DMRcate \
    bioconda::bioconductor-illuminahumanmethylationepicmanifest \
    conda-forge::r-prettydoc \
    r::r-Hmisc \
    bioconda::bioconductor-globaltest \
    bioconda::bioconductor-sva \
    bioconda::bioconductor-illuminaio \
    r::r-rmarkdown \
    bioconda::bioconductor-IlluminaHumanMethylation450kmanifest \
    bioconda::bioconductor-IlluminaHumanMethylationEPICanno.ilm10b2.hg19 \
    bioconda::bioconductor-limma \
    bioconda::bioconductor-dnacopy \
    bioconda::bioconductor-impute \
    bioconda::bioconductor-marray \
    bioconda::bioconductor-wateRmelon \
    bioconda::bioconductor-goseq \
    bioconda::bioconductor-missMethyl \
    bioconda::bioconductor-qvalue \
    r::r-doParallel \
    bioconda::bioconductor-bumphunter \
    r::r-quadprog \
    -y

This worked without causing any conflicts. I then tried to install ChAMP with the remaining dependencies through R.

> source("https://bioconductor.org/biocLite.R")
> biocLite("ChAMP")

The only issue I had was with the dendextend package. Its dependency, fpc, wouldn’t install correctly, and I couldn’t make it work with either conda or Bioconductor.

So, now I can add to the list of things I tried that I switched operating systems. Sadly, this still didn’t work

Giving up

This process has been a lot harder than it should be. After ~ 4 h of trying to get this to install, I’ve failed to install ChAMP.

Here’s a summary of what I’ve tried:

  • following the official ChAMP documentation
  • new blank R installation
  • packages only from Bioconductor
  • packages from a mix of Bioconductor and conda
  • different version of R
  • setting library paths and environment variables
  • changing C compilers and default settings for R
  • packages from CRAN
  • switched OS
  • extensive troubleshooting

And none of it worked. I’m just going to give up and see if I can make due with other software.

Conclusions

Your work as a developer is to tackle these types of issues so that others don’t have to. Not putting in the due diligence is a shame because it wastes time, energy, and money – both the users’ (because they try to use the software but can’t) and the developers’ (because no one wants to use their software, so the efforts are for naught).

Software may be new and clever, but without being relatively easy to use, it’s useless.

References & Footnotes