"Support" over "coverage"

Published: August 24, 2018   |   Read time:

Tagged:

High throughput sequencing data comes in the form of a collection of reads1, small sequences of DNA that comes from some location in the genome of the cells you’ve sequenced. Often, these reads need to be aligned to some reference genome to understand where they come from and how the entire set of reads relate to each other.

After this mapping process, it’s often worth knowing how well represented each region of the genome is (i.e. how many reads are mapped to an individual nucleotide, or a contiguous interval of nucleotides). Some locations in the genome tend to have more reads mapped to them than others, and accounting for these differences can be important in analyses and interpretations of sequencing data. Typically, this concept is called depth of coverage, or just coverage, for short2. This property is important, so let’s explicitly state it:

Property 1: the number of unique reads that include a given nucleotide (or set of nucleotides) in the reconstructed sequence

A related concept is how much of your genome has a read mapped to it. This, understandably, can also be called coverage (i.e. “how much of your genome is covered by reads?”). Again, let’s explicitly state this:

Property 2: the number (or percentage) of nucleotides in the reference sequence that are spanned by at least one read

These two properties, both often referred to as coverage, are different but related. To avoid conflation of these ideas, the terms depth of coverage and genome coverage (or slight variants of these terms) have become popular to refer to Properties 1 and 2, respectively. See the BEDTools functions coverage3 and genomecov4 as examples of these definitions.

In my opinion, this is alright, but we can do better to avoid confusion.

Proposed terminology

I believe that depth is a good term for Property 1 since depth invokes thinking of layers or height. “How many layers deep do these reads make over a particular position?”, for example, is an intuitive question that makes sense, and aligns with the spirit of Property 1.

I propose an alternative term for Property 2 that is already well established in set theory: support5.

Wikipedia’s definition for support is as follows:

Let \(f:X \rightarrow \mathbb{R}\) be a real-valued function whose domain is an arbitrary set, \(X\). The set-theoretic support of \(f\) over \(X\), denoted \(supp(f)\), is the set of elements of \(X\) where \(f\) is non-zero. Explicitly, \(supp(f) = \{ x \in X | f(x) \neq 0 \}\).

This is an explicit rephrasing of Property 2, but generalized for set theory, which is what makes this such a good term, in my opinion. This already has a widely-accepted mathematical connotation and perfectly aligns with the spirity of Property 2, which allows for easier translation between people of different fields. It also avoids conflation with Property 1, since the words cover and coverage are nowhere to be found (covers6 in set theory are an entirely different topic in set theory, and isn’t worth discussing here).

tl;dr

Summary -80%

I propose using the term depth to refer to Property 1 and support to refer to Property 2. These terms have intuitive connotations to the Properties of interest, and are separate enough to not conflate the ideas behind these Properties. These connotations are present in everyday language, and particularly strong in mathematical language, to convey these Properties with clarity and precision.

References & Footnotes