Typechecking Uses of the jQuery Language

Posted on 17 January 2014.

Manipulating HTML via JavaScript is often a frustrating task: the APIs for doing so, known as the Document Object Model (or DOM), are implemented to varying degrees of fidelity across different browsers, leading to browser-specific workarounds that contort the code. Worse, the APIs are too low-level: they amount to an “assembly language for trees”, allowing programmers to navigate from a node to its parent, children or adjacent siblings; add, remove, or modify a node at a time; etc. And like assembly programming, these low-level operations are imperative and often error-prone.

Fortunately, developers have created libraries that abstract away from this low-level tedium to provide a powerful, higher-level API. The best-known example of these is jQuery, and its API is so markedly different from the DOM API that it might as well be its own language — a domain-specific language for HTML manipulation.

With great powerlanguage comes great responsibility

Every language has its own idiomatic forms, and therefore has its own characteristic failure modes and error cases. Our group has been studying various (sub-)languages of JavaScript— JS itself, ADsafety, private browsing violations— so the natural question to ask is, what can we do to analyze jQuery as a language?

This post takes a tour of the solution we have constructed to this problem. In it, we will see how a sophisticated technique, a dependent type system, can be specialized for the jQuery language in a way that:

Retains decidable type-inference,
Requires little programmer annotation overhead,
Hides most of the complexity from the programmer, and yet
Yields non-trivial results.

Engineering such a system is challenging, and is aided by the fact that jQuery’s APIs are well-structured enough that we can extract a workable typed description of them.

What does a jQuery program look like?

Let’s get a feel for those idiomatic pitfalls of jQuery by looking at a few tiny examples. We’ll start with a (seemingly) simple question: what does this one line of code do?

$(".tweet span").next().html()

With a little knowledge of the API, jQuery’s code is quite readable: get all the ".tweet span" nodes, get their next() siblings, and…get the HTML code of the first one of them. In general, a jQuery expression consists of three parts: a initial query to select some nodes, a sequence of navigational steps that select nearby nodes related to the currently selected set, and finally some manipulation to retrieve or set data on those node(s).

This glib explanation hides a crucial assumption: that there exist nodes in the page that match the initial selector in the first place! So jQuery programs’ behavior depends crucially on the structure of the pages in which they run.

Given the example above, the following line of code should behave analogously:

$(".tweet span").next().text()

But it doesn’t—it actually returns the concatenated text content of all the selected nodes, rather than just the first. So jQuery APIs have behavior that depends heavily on how many nodes are currently selected.

Finally, we might try to manipulate the nodes by mapping a function across them via jQuery’s each() method, and here we have a classic problem that appears in any language: we must ensure that the mapped function is only applied to the correct sorts of HTML nodes.

Our approach

The examples above give a quick flavor of what can go wrong:

The final manipulation might not be appropriate for the type of nodes actually in the query-set.
The navigational steps might “fall off the edge of the tree”, navigating by too many steps and resulting in a vacuous query-set.
The initial query might not match any nodes in the document, or match too many nodes.

Our tool of choice for resolving all these issues is a type system for JavaScript. We’ve used such a system before to analyze the security of Firefox browser extensions. The type system is quite sophisticated to handle JavaScript idioms precisely, and while we take advantage of that sophistication, we also completely hide the bulk of it from the casual jQuery programmer.

To adapt our prior system for jQuery, we need two technical insights: we define multiplicities, a new kind that allows us to approximate the size of a query set in its type and ensure that APIs are only applied to correctly-sized sets, and we define local structure, which allows developers to inform the type system about the query-related parts of the page.

Technical details

Multiplicities

Typically, when type system designers are confronted with the need to keep track of the size of a collection, they turn to a dependently typed system, where the type of a collection can depend on, say, a numeric value. So “a list of five booleans” has a distinct type from “a list of six booleans”. This is very precise, allowing some APIs to be expressed with exactitude. It does come at a heavy cost: most dependent type systems lose the ability to infer types, requiring hefty programmer annotations. But is all this precision necessary for jQuery?

Examining jQuery’s APIs reveals that its behavior can be broken down into five cases: methods might require their invocant query-set contain

Zero elements of any type T, writen 0<T>
One element of some type T, written 1<T>
Zero or one elements of some type T, written 01<T>
One or more elements of some type T, written 1+<T>
Any number (zero or more) elements of some type T, written 0+<T>

These cases effectively abstract the exact size of a collection into a simple, finite approximation. None of jQuery’s APIs actually care if a query set contains five or fifty elements; all the matters is that it contains one-or-more. And therefore our system can avoid the very heavyweight onus of a typical dependent type system, instead using this simpler abstraction without any loss of necessary precision.

Moreover, we can manipulate these cases using interval arithmetic: for example, combining one collection of zero-or-one elements with another collection containing exactly one element yields a collection of one-or-more elements: 01<T> + 1<T> = 1+<T>. This is just interval addition.

jQuery’s APIs all effectively map some behavior across a queryset. Consider the next() method: given a collection of at least one element, it transforms each element into at most one element (because some element might not have any next sibling). The result is a collection of zero-or-more elements (if every element does not have a next sibling). Symbolically: 1+<01<T>> = 0+<T>. This is just interval multiplication.

We can use these two operations to describe the types of all of jQuery’s APIs. For example, the next() method can be given the type Forall T, 1+<T> -> 0+<@next<T>>.

Local structure

Wait — what’s @next? Recall that we need to connect the type system to the content of the page. So we ask programmers to define local structures that describe the portions of their pages that they intend to query. Think of them as “schemas for page fragments”: while it is not reasonable to ask programmers to schematize parts of the page they neither control nor need to manipulate (such as 3^rd-party ads), they certainly must know the structure of those parts of the page that they query! A local structure for, say, a Twitter stream might say:

(Tweet : Div classes = {tweet} optional = {starred}
   (Author : Span classes = {span})
   (Time : Span classes = {time})
   (Content : Span classes = {content})

Read this as “A Tweet is a Div that definitely has class tweet and might have class starred. It contains three children elements in order: an Author, a Time, and a Content. An Author is a Span with class span…”

(If you are familiar with templating libraries like mustache.js, these local structures might look familiar: they are just the widgets you would define for template expansion.)

From these definitions, we can compute crucial relationships: for instance, the @next sibling of an Author is a Time. And that in turn completes our understanding of the type for next() above: for any local structure type T, next() can be called on a collection of at least one T and will return collection of zero or more elements that are the @next siblings of Ts.

Note crucially that while the uses of @next are baked into our system, the function itself is not defined once-and-for-all for all pages: it is computed from the local structures that the programmer defined. In this way, we’ve parameterized our type system. We imbue it with knowledge of jQuery’s fixed API, but leave a hole for programmers to provide their page-specific information, and thereby specialize our system for their code.

Of course, @next is not the only function we compute over the local structure. We can compute @parents, @siblings, @ancestors, and more. But all of these functions are readily deducible from the local structure.

Ok, so what does this all do for me?

Continuing with our Tweet example above, our system provides the following output for these next queries:

// Mistake made: one too many calls to .next(), so the query set is empty

$(".tweet").children().next().next().next().css("color", "red")
// ==> ERROR: 'css' expects 1+<Element>, got 0<Element>


// Mistake made: one too few calls to .next(), so the query set is too big

$(".tweet #myTweet").children().next().css("color")
// ==>; ERROR: 'css' expects 1<Element>, got 1+<Author+Time>


// Correct query

$(".tweet #myTweet").children().next().next().css("color")
// ==> Typechecks successfully

The paper contains additional examples, showing how the types progress across more elaborate jQuery expressions.

The big picture

We have defined a set of types for the jQuery APIs that captures their intended behavior. These types are expressed using helper functions such as @next, whose behavior is specific to each page. We ask programmers merely to define the local structures of their page, and from that we compute the helper functions we need. And finally, from that, our system can produce type errors whenever it encounters the problematic situations we listed above. No additional programmer effort is needed, and the type errors produced are typically local and pertinent to fixing buggy code.

The Brown PLT Blog