A Benchmark for Tabular Types

Posted on 21 November 2021.

Tables are Everywhere

Tables are ubiquitous in the world. Newspapers print tables. Reports include tables. Even children as young as middle-school work comfortably with tables. Tables are, of course, also ubiquitous in programming. Because they provide an easy-to-understand, ubiquitous, already-parsed format, they are also valuable in programming education (e.g., DCIC works extensively with tables before moving on to other compound datatypes).

(Typed) Programming with Tables

When it comes to programming with tables, we have excellent tools like relational databases. However, using external databases creates impedance mismatches, so many programmers like to access tabular data from directly in the language, rather than construct external calls. The popularity of language-embedded query has not diminished with time.

Programming with tables, however, requires attention to types. Tables are inherently heterogeneous: each column is welcome to use whatever type makes most sense. This is all the more so if tables are a part of the language itself: while external data tend to be limited to “wire-format” types like numbers and strings, inside the language they can contain images, functions, other tables, and more. (For instance, we use all of these in Pyret.)

What is the type of a table? To make the output of tabular operations useful, it can’t be something flat like just Table. Because tables are heterogenous, they can’t have just a single type parameter (like Table<T>). It may conceptually make sense to have a type parameter for each column (e.g., Table<String, Number>), but real-world tables can have 17 or 37 columns! Programmers also like to access table columns by name, not only position. And so on.

Making Results Comparable

In Spring 2021, we ran a seminar to understand the state of knowledge of type systems for tables. While we read several excellent papers, we also came away very frustrated: authors simply did not seem to agree on what a “table” was or what operations to support. The result was an enormous degree of incommensurability.

Therefore, rather than invent Yet Another Tabular Type System, we decided to take a step back and address the incommensurability problem. What we need as a community is a shared, baseline understanding of several aspects of tables. That is what this work does: create a tables benchmark. This is not a performance benchmark, however; rather, it’s an expressivity and design benchmark. We call it B2T2: The Brown Benchmark for Tabular Types.

The benchmark doesn’t spring out of thin air. Rather, we extensively studied tabular support in widely-used industrial languages/libraries: R, Python/Pandas, and Julia. To cover educational needs, we also studied the Pyret-based Bootstrap:Data Science curriculum. You will notice that all are based on dynamic languages. (Though Pyret has an optional static type system, it currently does not support tables in any meaningful manner, so tabular programming is essentially dynamic.) This is intentional! If you start with a typed language, you end up reflecting the (potentially artificial and overly-restrictive) constraints of that type system. Rather, it’s healthy to study what programmers (seem to) want to say and do, filter these for reasonability, and reconcile that with the needs of static types (like decidability).

Do Now!

What do you expect to find in a tabular programming benchmark?

Make a list before you read on!

Benchmark Components

B2T2 has the following parts:

A definition of a table. There is actually a large space of possibilities here. We’ve chosen a definition that is both broad and interesting without being onerous.
Examples of tables. Why did we bother to provide these? We do so because many type systems may have all sorts of internal encodings. They are welcome to do so, but they cannot expect the outside world to conform to their representation. Therefore, these examples represent the canonical versions of these tables. Explaining how these will be converted to the internal format is the responsibility of the type system designers.
An API of table operations. This is of course the heart of the benchmark. In particular, different papers seem to use different subsets of operations. What is unclear is whether the missing operations are just as easy as the ones shown; difficult; or even impossible. This is therefore a big source of incommensurability.
Example programs. Depending on the representation of tables and the nature of the type systems and languages, these programs may have to be rewritten and may (to some observers) look quite unnatural.

All these might be predictable with some thought. There are two more components that may be a little more surprising:

Erroneous programs. In all sophisticated systems, there is a trade-off between complexity and explainability. We are disturbed by how little discussion there is of error-reporting in the papers we’ve read, and think the community should re-balance its emphasis. Even those who only care about technical depth (boo!) can take solace: there can be rich technical work in explaining errors, too! Furthermore, by making errors an explicit component, a team that does research into human factors—even if they leave all other aspects alone—has a “place to stand” to demonstrate their contribution.
A datasheet. To improve commensurability, we want authors to tell each other—and their users—in a standard format not only what they did but also where the bodies are buried.

Of course, all these parts are interesting even in the absence of types. We just expect that types will impose the most interesting challenges.

An Open Process

We expect this benchmark to grow and evolve. Therefore, we’ve put our benchmark in a public repository. You’re welcome to make contributions: correct mistakes, refine definitions, add features, provide more interesting examples, etc. You can also contribute solutions in your favorite language!

For More Details

You can read about all this in our paper and work with our repository.

The Brown PLT Blog