Biology is a very different field of research than it was 100 years ago. And that was a very different field of research than it was 100 years before that.
The statistical analysis brought to biology by the likes of Ronald Fisher and colleagues ushered in a new wave of analyses in the biological sphere, and I believe that we’ve seen another surge in mathematicallyinspired methods in the presence of genome sequencing data.
I studied applied mathematics in my undergrad, and was exposed to a variety of analytical topics used in particle physics, fluid dynamics, synthetic biology, and quantum information. Even with all that exposure, there were still so many areas of mathematics that I had not seen before, until I started my PhD in computational biology.
I think that this influx of mathematicallyoriented papers (or at the very least papers that use extremely novel mathematical techniques) to understand certain biological systems demonstrates the breadth and depth of biology as a whole, and how we can and should think differently to interrogate them. Simultaneously, it also demonstrates how exciting of a time it is to be a mathematician looking for inspiration, and why it’s an exciting time to be studying these fields from that more abstract perspective.
Here are a few examples of nontrivial mathematical tools and various papers they are used in.
 Hidden Markov models
 PhyloWGS^{1}
 Modelling cell division with DNA methylation errors^{2}
 Singlecell HiC clustering^{3}
 PhyloHMRF using hidden Markov random fields^{4}
 Graph community detection
 3dNetMod TAD caller^{5}
 VIPER^{6} and ARACNE^{7}
 de Bruijn graphs
 de novo genome assembly^{8}
 kmer based pseudoalignment^{9}
 Bcool graphbased sequence corrections^{10}
 Highdimensional data representations and reduction
 tSNE^{11} and UMAP^{12} for single cell sequencing data^{13}
 PCA, MDS, NMF, ICA for dataset covariate detection^{14}
 (Semi)automated segmentation algorithms (ChromHMM^{15}, Segway^{16})
 Random sampling, variance estimation, and bias estimation
 Trickiest problems in RNAseq/other differential analyses (Sleuth^{17}, DESeq2^{18}, EdgeR^{19}^{20})
 Batch effect removal^{21} (Jeff Leek, ComBat, SVA)
 Evidence of positive selection in evolutionary dynamics^{22}
 Error correcting codes
 Adjustments in singlecell barcode/UMIs^{23}
 DNA as a storage medium^{24}
Each of these tools are relatively simple to define conceptually, but have deep histories of mathematical study. Now, these tools are used to study subdisciplines of biology that didn’t even exist 20 years ago.
Coming from mathematics, one may not be able to see immediately interesting problems without some guidance. But there are very interesting problems to be found; some that require simple mathematical intuition and some that require really abstract reasoning.
Challenges for mathematicians
The challenge for mathematicians is to think of problems in the biological sciences in a way that formallytrained biologists can’t, since they don’t have the same amount of time with severely abstract thinking and visualization. They also have to come up with realistic models of biological systems that are well supported by the data they find and generate.
Moreover, explaining these abstract ideas to nonmathematicians requires an extreme degree of intuition to make models palatable for biologists. This refinement requires drawing analogies and thinking of your models in a new light, which can provide a deeper understanding than you initially suspect it can.
Don’t mistake this kind of observation as saying that biologists aren’t intelligent nor capable of abstract thought. That is patently false. They are extremely intelligent, but their intelligence is of a different kind; one that you as a mathematician need to come to understand if you are to succeed in your work and produce meaningful results that the rest of the scientific community can work with.
Mathematics allows you to abstract small ideas to the extreme. Biology requires that you bring these grand abstractions back to earth in a verifiably measurable way. This is something that is rarely easy to do, but can be done through collaboration and respect.
Tangent: wondering about biologists
As a bit of a tangent, but along the same lines, I wonder what wonder how it feels for formally trained biologists to see their field transform into something they don’t understand or never learned about. It must be quite jarring to have your field change around you and you not know why or how to adapt. I knew coming into this field that there was a lot I didn’t know, since I had almost never studied any actual biology. But for biologists who go on to research settings who are trained in biochemistry and molecular biology and who know nothing of machine learning or differential equations, it must feel bizarre to have your research feel so foreign to what you read in journals, nowadays.
I hope there are ways they can adapt, because the mathematically and computationallyinclined people can’t leave them behind. We need them to develop a broader understanding of the field and to make sure our predictions are measurable make biological sense.
Looking to the future
There are many more areas of biological study that are yet to be influenced by mathematical tools, but I can only assume there is more to come, given the past evidence of mathematics infecting and taking hold of other scientific fields.
But it’s an exciting time nonetheless to jump into a field of research and find so much to be discovered, using tools you’d never think to use.
References

Deshwar, A. G. et al. PhyloWGS: Reconstructing subclonal composition and evolution from wholegenome sequencing of tumors. Genome Biology 16, 35–35 (2015). doi: 10.1186/s1305901506028 ↩

Andrews, D. J., Lynch, A. G. & Tavaré, S. Using Methylation Patterns for Reconstructing Cell Division Dynamics. in Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology 3–15 (Elsevier, 2016). doi: 10.1016/B9780128042038.000018 ↩

Zhou, J. et al. HiCluster: A Robust SingleCell HiC Clustering Method Based on Convolution and Random Walk. bioRxiv 506717 (2018). doi: 10.1101/506717 ↩

Yang, Y., Zhang, Y., Ren, B., Dixon, J. R. & Ma, J. Comparing 3D Genome Organization in Multiple Species Using PhyloHMRF. Cell Systems 8, 494505.e14 (2019). doi: 10.1016/j.cels.2019.05.011 ↩

Norton, H. K. et al. Detecting hierarchical genome folding with network modularity. Nature Methods 15, 119–122 (2018). doi: 10.1038/nmeth.4560 ↩

Alvarez, M. J. et al. Functional characterization of somatic mutations in cancer using networkbased inference of protein activity. Nature Genetics 48, 838–847 (2016). doi: 10.1038/ng.3593 ↩

Margolin, A. A. et al. ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics 7, S7 (2006). doi: 10.1186/147121057S1S7 ↩

Medvedev, P., Pham, S., Chaisson, M., Tesler, G. & Pevzner, P. Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers. Journal of Computational Biology 18, 1625–1634 (2011). doi: 10.1089/cmb.2011.0151 ↩

Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Nearoptimal probabilistic RNAseq quantification. Nature Biotechnology 34, 525–527 (2016). doi: 10.1038/nbt.3519 ↩

Limasset, A., Flot, J.F. & Peterlongo, P. Toward perfect reads: selfcorrection of short reads via mapping on de Bruijn graphs. Bioinformatics doi: 10.1093/bioinformatics/btz102 ↩

Laurens van der Maaten & Geoffrey Hinton. Visualizing Data Using tSNE. Journal of Machine Learning Research 9, 2579–2605 (2008). Link ↩

McInnes, L. & Healy, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat] (2018). arXiv: 1802.03426 ↩

Hafemeister, C. & Satija, R. Normalization and variance stabilization of singlecell RNAseq data using regularized negative binomial regression. bioRxiv (2019). doi: 10.1101/576827 ↩

SteinO’Brien, G. L. et al. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends in Genetics 34, 790–805 (2018). doi: 10.1016/j.tig.2018.07.003 ↩

Ernst, J. & Kellis, M. ChromHMM: automating chromatinstate discovery and characterization. Nat Methods 9, 215–216 (2012). doi: 10.1038/nmeth.1906 ↩

Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 9, 473–476 (2012). doi: 10.1038/nmeth.1937 ↩

Pimentel, H., Bray, N. L., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNAseq incorporating quantification uncertainty. Nature Methods 14, 687–690 (2017). doi: 10.1038/nmeth.4324 ↩

Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biology 15, 550 (2014). doi: 10.1186/s1305901405508 ↩

McCarthy, D. J., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor RNASeq experiments with respect to biological variation. Nucleic Acids Res 40, 4288–4297 (2012). doi: 10.1093/nar/gks042 ↩

Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010). doi: 10.1093/bioinformatics/btp616 ↩

Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Storey JD, Zhang Y, Torres LC (2019). sva: Surrogate Variable Analysis. R package version 3.32.1. doi: 10.18129/B9.bioc.sva ↩

Tajima, F. Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism. Genetics 123, 585–595 (1989). Link ↩

Melsted, P., Ntranos, V. & Pachter, L. The barcode, UMI, set format and BUStools. Bioinformatics. doi: 10.1093/bioinformatics/btz279 ↩

Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. CRISPR–Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature (2017). doi: 10.1038/nature23017 ↩