Articles by category: work

Limit of a t distribution

I like to write some proofs when I have need of them but have a hard time tracking them down. Here is one about the limit of the t distribution with infinite degrees of freedom.

Documents as code and LaTeX

Lots of work goes into writing documents. Lots of that work is rule-based and can be made programmatic. This makes working with documents as a form of code that needs to be compiled as useful way to think about them, as well as working with them. LaTeX can handle these kinds of documents and may be the right place to start.

Illumina flow cell teardown

For many computational biologists, their work starts with sequencing data. But that data obviously doesn't appear out of nowhere. To bring that side of molecular biology a bit closer to home, I got my hands on an old flow cell and take it apart to see what it's made of.

Is your null hypothesis, or your model, more likely to be rejected?

Science and statistics are hard. There are lots of reasons that can make things go wrong, and it's important to remember that when looking at p-values and hypothesis tests.

Understanding file permissions

Here is a brief rundown of file permissions on Unix systems and how to change them.

General purpose Anaconda tips

I use Anaconda managing my computational software environments. Here are some pragmatic tips for making conda environments easier to deal with.

A solution to dependency hell: static binaries by default

Dependencies are complicated for computational biologists. Adapting a different development strategy can help your end users.

Check your references

Relying on first hand evidence is important for building an argument. Here I discuss how citation limits in journals may entice readers to read the review papers and not the original articles, but that doing this diligent work is important for research.

Experimenting with web monetization

This blog post is an experiment in running the web monetization protocol on this blog. If you don't have web monetization enabled in your browser, you won't see the contents of this post.

This article requires monetization to access
General purpose Emacs tips

Emacs is a text editor that has a lot of history and a lot of functionality. Because of its history and the philosophy behind it, it can be hard to find the "right" way to do anything with it. In this post, I want to compile some information that I've found over time, and things that have worked for me.

A more sane way to modify your PATH

The PATH environment variable is key for getting software to run on your computer. Sometimes you have to edit it by hand for your development purposes. Here is a tool to make that process a little more sane.

A collection of useful command line tools

Command line utilities are great. Here are a few of my favourites.

Aggregate peak analysis with Hi-C data

Hi-C data analysis is still a relatively new field in genomics. The data itself is quite large and expensive to make, which means datasets and exploration of the data is still immature, compared to other technologies like RNA-seq. Here, I discuss aggregate peak analysis, a commonly-used and poorly-documented analytical technique to verify identified features in Hi-C data.

What is differential analysis, anyway?

Differential analysis using sequencing data is, at its heart, a very simple idea that involves a lot of complicated statistics. It makes explaining the simple idea to newcomers in bioinformatics very difficult. Here, I want to break down the motivation behind differential analysis and explain where the complicated statistics come from.

The Central Limit Theorem and estimation of variance

The Central Limit Theorem is a pillar of statistics. We can apply the proof of the CLT to understand how different estimators converge in distribution with large sample sizes.

Notation and thinking in math

Mathematical notation is a signature of math. Almost anyone can recognize it instantly, even if they don't know what it is. I want to talk a bit of why notation is useful, why it can be confusing, and tackle some examples in statistics that are often confusin with some clear notation.

Tips for remote teaching

Like many academics, I've started giving presentations and tutorial sessions remotely. Here are some brief tips from my experience and resources for giving good lectures.

Teaching stats for statistical thinking

Some thoughts about what I'd like students taking the biostatistics course I'm TA'ing to take away from the class.

Anti-curl and Poincare's lemma

A brief look at potential functions in 3 dimensions, and how Poincare's lemma can make it easier to solve for vector potentials.

Physical symmetries result from what we think good scientific theories are

Why do physicists talk about symmetries and conservation laws all the time? It's because that's what a good scientific theory looks like.

A marketplace of half-baked ideas

A cursory look at the economics of scientific software, and the implications on its usability and longevity.

Creating simple genome annotation tables

Working with annotated genomes is not always an easy process. Here, I detail how to easily create tabular annotation data from GENCODE that can be easily used in any analysis.

Please, show your work

How incomprehensible machine learning models answer questions without providing the solutions we desire.

Creating a custom genome annotation for HiGlass

HiGlass is an interactive genome browser that's particularly useful for Hi-C data. Here, I describe how to create your own genome annotation file for HiGlass, allowing you to more easily display your work, regardless of the organism you work in.

Jeffrey Epstein, Harvard, and Martin Nowak

On May 1, 2020, Harvard published a report about the relationship between Harvard faculty and Jeffrey Epstein, detailing the numerous interactions, gifts, and acts of questionable behaviour or outright misconduct surrounding the now-deceased "scientific philanthropist".

Structuring code for ggplot

Brief thoughts on how to cleanly write ggplot code in R

High impact papers are not how you learn science

Journal articles are one way in which scientific research is disseminated. But they're not how one learns how to do science.

Well-defined biology

Why are definitions important, and what makes them "good"? Here I focus on is the topic of definitions being "good" and "well-defined" and how to ask good quantitative questions in biology.

Now is an exciting time for mathematicians in the biological sciences

There are many reasons to be excited about scientific progress in the biological sciences, especially if you're a mathematician of almost any kind.

Pragmatic guidelines for bioinformatics software tools

I offer 10 practical suggestions for designing robust, intuitive, and user-friendly software tools for bioinformatics.

Building Conda Packages

A brief introduction to creating your own conda packages.

Designing dark academic posters

Many academic posters look boring: white backgrounds, black text, some shade of neutral blue as an accent colour, etc. I've designed some posters with dark backgrounds, and I've learned a thing or two from making them that I'd like to share.

Academic twitter isn't for me

I'll go against the current trend and say that I don't think you should use Twitter as a tool for working in academia.

The cleverness behind Tajima's D statistic

I want to highlight how clever the derivation of Tajima's statistic is, and a great idea he puts forward in his 1989 paper.

Don't trust your data at first glance

A cautionary tale of trusting your data from another source.

"Support" over "coverage"

"Read coverage" in high throughput sequencing is a bit of an ambiguous term. Here, I make the argument for using the analogous term "support", coming from set theory and its interpretation.

The woes of using bioinformatics software: a case study in trying to install ChAMP

Making high-quality bioinformatics software is hard. Installing and using it shouldn't be, though. Here's a detailed description of all the work I did to try and install the ChAMP package.

PubMed + RSS for following specific authors

A quick method for keeping updated on works published by specific authors using PubMed's not so well known RSS feature.

Git flow for scientific analysis

Some thoughts about using Git's branching model for clean and clear scientific analysis.

How I Maintain a Reproducible Computational Environment

A brief description of how I try my best to keep a low-maintenance and reproducible software environment.

PubPeer for Microsoft Edge

I ported PubPeer's Chrome extension for Microsoft Edge.