A Core Calculus for Documents
Or, Lambda: The Ultimate Document
Posted on 28 December 2023.
Document languages like Markdown, LaTeX, PHP, and Liquid are widely used to generate digital documents like PDFs and web pages. Document languages often come with programming features like variables and macros to help authors write complex documents. However, these document programming features are often designed or implemented in problematic ways, especially by the standards of a modern programming language. For example:
- LaTeX’s macro system is not hygienic, which can cause macro arguments to conflict with a macro body.
- PHP’s templating system relies on mutating a global output buffer, which means document fragments are not first-class values.
- Liquid’s variable system provides a small and fixed set of data types, which limits the expressiveness of computations.
These problems could all be addressed in more carefully designed document languages (we are personally fans of Scribble and Typst). But those languages, too, have their own failure modes. Therein lies a deeper problem: there are no theoretical tools for reasoning about the design of a document language. Programming language theorists can use the lambda calculus to reason about the design of general-purpose programming languages. No such formal model exists for document languages.
Our work addresses this issue by providing a document calculus, or a formal model of the programmatic aspects of document languages. We designed the document calculus in a series of levels that each correspond to a family of existing document languages. Each level consists of a domain, or the thing produced by the language, and a constructor, or the feature used to construct elements of the domain. The levels are summarized in this table:
Domain | Ctor | Example Languages | Example Syntax |
---|---|---|---|
String | Literal | Text files, quoted strings | |
Program | PLs with string APIs, such as Javascript | ||
Template Literal | C printf , Python f-strings, Javascript template literals, Perl interpolated strings |
||
Template Program | C preprocessor, PHP, LaTeX, Jinja (Python), Liquid (Ruby), Handlebars (JS) |
|
|
Article | Literal | CommonMark Markdown, Pandoc Markdown, HTML, XML | |
Program | PLs with document APIs, such as Javascript | ||
Template Literal | JSX Javascript, Scala 2, VB.NET, Scribble Racket, MDX Markdown, Lisp quasiquotes |
|
|
Template Program | Typst, Razor C#, Svelte Javascript, Markdoc Markdown |
|
The document calculus can inform the design of document languages in a few ways:
- The levels form a taxonomy which can help designers identify different points in the document language design space.
- The calculus provides a reference semantics for a clean way to desugar key features like variables and loops.
- The type system shows how to type-check a document prior to desugaring, including a proof of syntactic type safety.
For more details, see the paper.