Documents as code and LaTeX

Published: May 18, 2021   |   Read time:

Tagged:

Image Attribution:

For a document that you are writing, there will be some rules about how it should be formatted. These rules extend just beyond the visual elements, like font sizes and paragraph spacing, into elements of the text itself. Here are some common rules about how documents should be formatted:

  • a table of contents should list chapter or section headers and their correct page numbers
  • citations within a document should link to a consistently-formatted bibliography
  • front matter page numbers may follow a different numbering scheme than pages in the main matter but should still be numbered in the correct order
  • figure or table names should be listed ordinally and have those numbers reflected in the text where they are mentioned
  • acronyms should be listed fully the first time they are written, then abbreviated subsequently

All of these rules can be followed manually, of course. But if you are writing a document, you know that there are intermediate versions of your final product. These intermediate versions should also follow these rules to make the document easier to understand and edit. But manually running through all of these rules again and again, as you write your document, is tedious and time-consuming.

Thinking of documents as code

Given that computers are designed to follow and perform a strict set of rules and commands, writing a document as if it were a piece of code makes a lot of sense. Each version of the document is some list of instructions that roughly resemble the finished product, and the computer compiles the instructions according to what you’ve written and rules that you’ve set out for your document. Treating your document as code also nicely lends itself to tools like Git that can make tracking your changes and keeping distributed backups much easier.

While WYSIWYG editors like Microsoft Word and Google Docs let you manage some of these things, like page numbering and table of contents, there are many rules that these tools just can’t implement. Automatic numbering of references to tables and figures in the text, automatic styling of figure captions, integrated bibliographies that work with external databases, and more just don’t work well in these applications. It can also be harder to move content around in WYSIWYG editors since the document must be rendered on the fly while you are in the midst of moving something. Do you want to make sure the first occurrence of an abbreviation is correctly labelled every time you move a paragraph or a sentence around? No way. And don’t even think about trying to make a custom feature because it is extremely hard.

This is where tools like LaTeX (stylized as \(\LaTeX\)) come in. \(\LaTeX\) treats your document as code, then compiles your document into one of many output formats. I am using \(\LaTeX\) to write my PhD dissertation. Here is an annotated snippet of what packages I tell \(\LaTeX\) to use for me:

% Templates & Formatting
% ===============================================
% template for University of Toronto dissertation that handles the
% page numbering, spacing, etc
\documentclass[oneside,doublespacing]{ut-thesis}

% load math-specific fonts and symbols
\usepackage{amsmath}
\usepackage{amssymb}

% monotype and teletype fonts for code blocks
\usepackage{listings}

% special functions for handling the table of contents entries
\usepackage{tocloft}

% indent the first line of a paragraph everywhere
\usepackage{indentfirst}

% Include content from other files
% ===============================================
% Separate the text into multiple LaTeX files that can be linked together
% afterwards. This avoids having one gigantic file that is difficult to
% read by humans.
\usepackage{subfiles}

% commands for inserting images
\usepackage{graphicx}

% Intelligently handle long captions for figures and tables.
% This lets some captions be placed on the page before or after an image or table
% when they won't both fit on the same page.
\usepackage{ccaption}

% intelligently split long tables across pages
\usepackage{ltablex}

% customize how enumerated lists are rendered
\usepackage{enumerate}

% Extra features
% ===============================================
% commands for dealing with initializations, abbreviations, and glossaries
% loading before hyperref to avoid links
% `nopostdot` remove the period after the term in the glossary
% `record` uses bib2gls to fetch glossary terms from the sourced file, later
% `toc` adds the glossary to the ToC
\usepackage[nopostdot, record, toc]{glossaries-extra}

% set glossary acronym style
\setabbreviationstyle[acronym]{long-short}
\include{glossary.tex}

% Citations and references
% ===============================================
% handle citations to external documents
\usepackage[backend=biber, style=nature, sorting=none]{biblatex}

% create clickable hyperlinks for jumping around the document
\usepackage{hyperref}

% Handle internal document references to figures, tables, and more.
% This automatically labels them as Figure, or Table, or Equation
% with the correct numbering in the order that they appear in the document.
\usepackage[capitalize, noabbrev]{cleveref}

% load the Bibtex file containing all documents in my Zotero library
\addbibresource{thesis.bib}

I also can make my own rules and tell \(\LaTeX\) to run them for me. Here is a snippet for how I want \(\LaTeX\) to handle my figures, since they all follow a similar structure:

% Typical figure layout and placement
% #1 = image file
% #2 = figure title
% #3 = figure caption
% #4 = figure label for internal referencing
\newcommand{\newfigure}[4]{
  % Start figure environment
  \begin{figure}
    % horizontally align the figure to the centre of the page
    \centering
    % Include the figure (make it 90% the width of the text margins)
    \includegraphics[width=0.9\textwidth]{#1}
    % Add the figure caption with a bold title, followed by a period
    \caption[#2]{\textbf{#2.} #3}
    % Add the label for internal referencing
    \label{#4}
  \end{figure}
}

As you can see, this declarative structure in the document itself gives the writer lots of control over what they want the document to look like and what rules they want the document to follow. You can also see some added benefits of writing your document as code – comments. It’s very easy and natural to include comments to highlight certain things about the document itself. These can be notes for yourself, headers for making the raw text file easy to look at, or little explanations of the way things work so you don’t get confused when you come back later.

Drawbacks of writing documents as code

But you can probably also see the immediate downside for most people when writing documents like this. There is a lot of stuff the writer has to specify about the document itself instead of just focusing on the content of the document. This creates overhead when working with \(\LaTeX\) documents, and for people who aren’t used to writing code, that overhead is too high a price to pay. That is a legitimate concern and something power-users routinely forget to consider when talking about \(\LaTeX\), or software in general.

\(\LaTeX\) also has some cryptic compilation logs that can make it hard to figure out what actually went wrong during compilation and what you need to fix. If you’re making your own commands or doing something tricky, it can be hard to fix some problems.

Other solutions for writing documents like code, such as Markdown + Pandoc or Org Mode, also have these problems. There is going to be some configuration and compilation system you need to run to get your nicely compiled documents. These systems need some configuration or syntax or something that requires extra work and can take away from just writing.

“Documents as code” to combat complexity

The more strictly a document must adhere to a set of rules, the more valuable a tool like \(\LaTeX\) becomes, in my opinion. Writing a thesis or scientific article fits this bill, again, in my opinion. Just look at all the files and independent parts that I need to come together to write it in the first place.

Tree view of all the files I need to write my thesis -60%

There are pages and pages of formatting guidelines and rules for different journals, and expecting people to read, understand, and apply them almost perfectly is unreasonable. Providing templates that are easy to use and take care of these details behind the scenes makes a lot of sense and lets the writer focus on writing instead of wasting time adhering to a specific journal’s or university’s policies. It also gives flexibility to the writer, since they may have to submit the same article to a few different places before it finally gets published. Time spent manually re-writing an article to match a single journal’s specifications, then re-writing it again when it gets rejected, is valuable time wasted.

Useful tools for writing in LaTeX

Overleaf is an online tool for collaboratively writing \(\LaTeX\) documents. It takes care of all the technical details of actually running \(\LaTeX\) and it works extremely well for many purposes. I and colleagues of mine have used it in the past and enjoyed it.

If you don’t like working with web applications, you’ll need to install \(\LaTeX\) to your computer. I’d also highly recommend using Visual Studio Code with the following extensions:

  • LaTeX Workshop by James Yu
    • This extension contains many \(\LaTeX\) snippets and Intellisense-like tooltips for selecting BibTeX citations and glossary terms.
    • It also has compilation and live-viewing features built in for quick rendering and document viewing.
  • LaTeX language support by Long Nhat Nguyen
    • This extension enables nice text highlighting features for \(\LaTeX\) environments and some more snippets
  • Code Spell Checker by Street Side Software
    • This extension brings spell checking to your code, so you can still get this crucial feature without needing to copy-and-paste from something else.

Conclusions

\(\LaTeX\) is a solution to working with documents that scales for very complex documents. It’s not perfect, of course. There are many benefits to tools like Microsoft Word and Google Docs, such as simultaneous editing and sharing over the internet, that are tough to replicate with \(\LaTeX\). But problems like this are also beyond the scope of \(\LaTeX\) documents and the compiler, so it’s fair that this use case isn’t handled natively. Tools like Overleaf are helping by making \(\LaTeX\) more accessible to people and providing some features of other writing tools.

All in all, I would argue that the value in using \(\LaTeX\) is directly tied to the number and the strictness of rules for the document you are attempting to write. If you

  1. don’t want to think of your document as code,
  2. don’t have strict rules about it, or
  3. want to write something quickly

then \(\LaTeX\) probably isn’t the right solution for you. But if you

  1. are used to writing code,
  2. want your document to look beautiful,
  3. have to write any math at all, or
  4. are making a highly structured document

I would advocate for writing it in \(\LaTeX\).