Now is an exciting time for mathematicians in the biological sciences

Published: October 03, 2019   |   Read time:

Tagged:

Image Attribution:

Biology is a very different field of research than it was 100 years ago. And that was a very different field of research than it was 100 years before that.

The statistical analysis brought to biology by the likes of Ronald Fisher and colleagues ushered in a new wave of analyses in the biological sphere, and I believe that we’ve seen another surge in mathematically-inspired methods in the presence of genome sequencing data.

I studied applied mathematics in my undergrad, and was exposed to a variety of analytical topics used in particle physics, fluid dynamics, synthetic biology, and quantum information. Even with all that exposure, there were still so many areas of mathematics that I had not seen before, until I started my PhD in computational biology.

I think that this influx of mathematically-oriented papers (or at the very least papers that use extremely novel mathematical techniques) to understand certain biological systems demonstrates the breadth and depth of biology as a whole, and how we can and should think differently to interrogate them. Simultaneously, it also demonstrates how exciting of a time it is to be a mathematician looking for inspiration, and why it’s an exciting time to be studying these fields from that more abstract perspective.

Here are a few examples of non-trivial mathematical tools and various papers they are used in.

  • Hidden Markov models
    • PhyloWGS1
    • Modelling cell division with DNA methylation errors2
    • Single-cell HiC clustering3
    • PhyloHMRF using hidden Markov random fields4
  • Graph community detection
    • 3dNetMod TAD caller5
    • VIPER6 and ARACNE7
  • de Bruijn graphs
    • de novo genome assembly8
    • k-mer based pseudoalignment9
    • Bcool graph-based sequence corrections10
  • High-dimensional data representations and reduction
    • tSNE11 and UMAP12 for single cell sequencing data13
    • PCA, MDS, NMF, ICA for dataset covariate detection14
    • (Semi-)automated segmentation algorithms (ChromHMM15, Segway16)
  • Random sampling, variance estimation, and bias estimation
    • Trickiest problems in RNA-seq/other differential analyses (Sleuth17, DESeq218, EdgeR1920)
    • Batch effect removal21 (Jeff Leek, ComBat, SVA)
    • Evidence of positive selection in evolutionary dynamics22
  • Error correcting codes
    • Adjustments in single-cell barcode/UMIs23
    • DNA as a storage medium24

Each of these tools are relatively simple to define conceptually, but have deep histories of mathematical study. Now, these tools are used to study sub-disciplines of biology that didn’t even exist 20 years ago.

Coming from mathematics, one may not be able to see immediately interesting problems without some guidance. But there are very interesting problems to be found; some that require simple mathematical intuition and some that require really abstract reasoning.

Challenges for mathematicians

The challenge for mathematicians is to think of problems in the biological sciences in a way that formally-trained biologists can’t, since they don’t have the same amount of time with severely abstract thinking and visualization. They also have to come up with realistic models of biological systems that are well supported by the data they find and generate.

Moreover, explaining these abstract ideas to non-mathematicians requires an extreme degree of intuition to make models palatable for biologists. This refinement requires drawing analogies and thinking of your models in a new light, which can provide a deeper understanding than you initially suspect it can.

Don’t mistake this kind of observation as saying that biologists aren’t intelligent nor capable of abstract thought. That is patently false. They are extremely intelligent, but their intelligence is of a different kind; one that you as a mathematician need to come to understand if you are to succeed in your work and produce meaningful results that the rest of the scientific community can work with.

Mathematics allows you to abstract small ideas to the extreme. Biology requires that you bring these grand abstractions back to earth in a verifiably measurable way. This is something that is rarely easy to do, but can be done through collaboration and respect.

Tangent: wondering about biologists

As a bit of a tangent, but along the same lines, I wonder what wonder how it feels for formally trained biologists to see their field transform into something they don’t understand or never learned about. It must be quite jarring to have your field change around you and you not know why or how to adapt. I knew coming into this field that there was a lot I didn’t know, since I had almost never studied any actual biology. But for biologists who go on to research settings who are trained in biochemistry and molecular biology and who know nothing of machine learning or differential equations, it must feel bizarre to have your research feel so foreign to what you read in journals, nowadays.

I hope there are ways they can adapt, because the mathematically- and computationally-inclined people can’t leave them behind. We need them to develop a broader understanding of the field and to make sure our predictions are measurable make biological sense.

Looking to the future

There are many more areas of biological study that are yet to be influenced by mathematical tools, but I can only assume there is more to come, given the past evidence of mathematics infecting and taking hold of other scientific fields.

But it’s an exciting time nonetheless to jump into a field of research and find so much to be discovered, using tools you’d never think to use.

References

  1. Deshwar, A. G. et al. PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biology 16, 35–35 (2015). doi: 10.1186/s13059-015-0602-8 

  2. Andrews, D. J., Lynch, A. G. & Tavaré, S. Using Methylation Patterns for Reconstructing Cell Division Dynamics. in Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology 3–15 (Elsevier, 2016). doi: 10.1016/B978-0-12-804203-8.00001-8 

  3. Zhou, J. et al. HiCluster: A Robust Single-Cell Hi-C Clustering Method Based on Convolution and Random Walk. bioRxiv 506717 (2018). doi: 10.1101/506717 

  4. Yang, Y., Zhang, Y., Ren, B., Dixon, J. R. & Ma, J. Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF. Cell Systems 8, 494-505.e14 (2019). doi: 10.1016/j.cels.2019.05.011 

  5. Norton, H. K. et al. Detecting hierarchical genome folding with network modularity. Nature Methods 15, 119–122 (2018). doi: 10.1038/nmeth.4560 

  6. Alvarez, M. J. et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature Genetics 48, 838–847 (2016). doi: 10.1038/ng.3593 

  7. Margolin, A. A. et al. ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics 7, S7 (2006). doi: 10.1186/1471-2105-7-S1-S7 

  8. Medvedev, P., Pham, S., Chaisson, M., Tesler, G. & Pevzner, P. Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers. Journal of Computational Biology 18, 1625–1634 (2011). doi: 10.1089/cmb.2011.0151 

  9. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology 34, 525–527 (2016). doi: 10.1038/nbt.3519 

  10. Limasset, A., Flot, J.-F. & Peterlongo, P. Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs. Bioinformatics doi: 10.1093/bioinformatics/btz102 

  11. Laurens van der Maaten & Geoffrey Hinton. Visualizing Data Using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008). Link 

  12. McInnes, L. & Healy, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat] (2018). arXiv: 1802.03426 

  13. Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. bioRxiv (2019). doi: 10.1101/576827 

  14. Stein-O’Brien, G. L. et al. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends in Genetics 34, 790–805 (2018). doi: 10.1016/j.tig.2018.07.003 

  15. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 9, 215–216 (2012). doi: 10.1038/nmeth.1906 

  16. Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 9, 473–476 (2012). doi: 10.1038/nmeth.1937 

  17. Pimentel, H., Bray, N. L., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nature Methods 14, 687–690 (2017). doi: 10.1038/nmeth.4324 

  18. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550 (2014). doi: 10.1186/s13059-014-0550-8 

  19. McCarthy, D. J., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 40, 4288–4297 (2012). doi: 10.1093/nar/gks042 

  20. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010). doi: 10.1093/bioinformatics/btp616 

  21. Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Storey JD, Zhang Y, Torres LC (2019). sva: Surrogate Variable Analysis. R package version 3.32.1. doi: 10.18129/B9.bioc.sva 

  22. Tajima, F. Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism. Genetics 123, 585–595 (1989). Link 

  23. Melsted, P., Ntranos, V. & Pachter, L. The barcode, UMI, set format and BUStools. Bioinformatics. doi: 10.1093/bioinformatics/btz279 

  24. Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature (2017). doi: 10.1038/nature23017