Typechecking Uses of the jQuery Language
Tags: Browsers, JavaScript, Types
Posted on 17 January 2014.Manipulating HTML via JavaScript is often a frustrating task: the APIs for doing so, known as the Document Object Model (or DOM), are implemented to varying degrees of fidelity across different browsers, leading to browser-specific workarounds that contort the code. Worse, the APIs are too low-level: they amount to an “assembly language for trees”, allowing programmers to navigate from a node to its parent, children or adjacent siblings; add, remove, or modify a node at a time; etc. And like assembly programming, these low-level operations are imperative and often error-prone.
Fortunately, developers have created libraries that abstract away from this low-level tedium to provide a powerful, higher-level API. The best-known example of these is jQuery, and its API is so markedly different from the DOM API that it might as well be its own language — a domain-specific language for HTML manipulation.
With great powerlanguage comes great responsibility
Every language has its own idiomatic forms, and therefore has its own characteristic failure modes and error cases. Our group has been studying various (sub-)languages of JavaScript— JS itself, ADsafety, private browsing violations— so the natural question to ask is, what can we do to analyze jQuery as a language?
This post takes a tour of the solution we have constructed to this problem. In it, we will see how a sophisticated technique, a dependent type system, can be specialized for the jQuery language in a way that:
- Retains decidable type-inference,
- Requires little programmer annotation overhead,
- Hides most of the complexity from the programmer, and yet
- Yields non-trivial results.
Engineering such a system is challenging, and is aided by the fact that jQuery’s APIs are well-structured enough that we can extract a workable typed description of them.
What does a jQuery program look like?
Let’s get a feel for those idiomatic pitfalls of jQuery by looking at a few tiny examples. We’ll start with a (seemingly) simple question: what does this one line of code do?
With a little knowledge of the API, jQuery’s code is quite
readable: get all the ".tweet span"
nodes, get
their next()
siblings, and…get the HTML code of
the first one of them. In general, a jQuery expression consists of
three parts: a initial query to select some nodes, a
sequence of navigational steps that select nearby nodes
related to the currently selected set, and finally
some manipulation to retrieve or set data on those node(s).
This glib explanation hides a crucial assumption: that there exist nodes in the page that match the initial selector in the first place! So jQuery programs’ behavior depends crucially on the structure of the pages in which they run.
Given the example above, the following line of code should behave analogously:
But it doesn’t—it actually returns the concatenated text content of all the selected nodes, rather than just the first. So jQuery APIs have behavior that depends heavily on how many nodes are currently selected.
Finally, we might try to manipulate the nodes by mapping a function
across them via jQuery’s each()
method, and here we
have a classic problem that appears in any language: we must ensure
that the mapped function is only applied to the correct sorts of
HTML nodes.
Our approach
The examples above give a quick flavor of what can go wrong:
- The final manipulation might not be appropriate for the type of nodes actually in the query-set.
- The navigational steps might “fall off the edge of the tree”, navigating by too many steps and resulting in a vacuous query-set.
- The initial query might not match any nodes in the document, or match too many nodes.
Our tool of choice for resolving all these issues is a type system for JavaScript. We’ve used such a system before to analyze the security of Firefox browser extensions. The type system is quite sophisticated to handle JavaScript idioms precisely, and while we take advantage of that sophistication, we also completely hide the bulk of it from the casual jQuery programmer.
To adapt our prior system for jQuery, we need two technical insights: we define multiplicities, a new kind that allows us to approximate the size of a query set in its type and ensure that APIs are only applied to correctly-sized sets, and we define local structure, which allows developers to inform the type system about the query-related parts of the page.
Technical details
Multiplicities
Typically, when type system designers are confronted with the need to keep track of the size of a collection, they turn to a dependently typed system, where the type of a collection can depend on, say, a numeric value. So “a list of five booleans” has a distinct type from “a list of six booleans”. This is very precise, allowing some APIs to be expressed with exactitude. It does come at a heavy cost: most dependent type systems lose the ability to infer types, requiring hefty programmer annotations. But is all this precision necessary for jQuery?
Examining jQuery’s APIs reveals that its behavior can be broken down into five cases: methods might require their invocant query-set contain
- Zero elements of any type
T
, writen0<T>
- One element of some type
T
, written1<T>
- Zero or one elements of some type
T
, written01<T>
- One or more elements of some type
T
, written1+<T>
- Any number (zero or more) elements of some type
T
, written0+<T>
These cases effectively abstract the exact size of a collection into a simple, finite approximation. None of jQuery’s APIs actually care if a query set contains five or fifty elements; all the matters is that it contains one-or-more. And therefore our system can avoid the very heavyweight onus of a typical dependent type system, instead using this simpler abstraction without any loss of necessary precision.
Moreover, we can
manipulate these cases using interval arithmetic: for
example, combining one collection of zero-or-one elements with
another collection containing exactly one element yields a
collection of one-or-more elements: 01<T> + 1<T> =
1+<T>
. This is just interval addition.
jQuery’s APIs all effectively map some behavior across a queryset.
Consider the next()
method: given a collection of at
least one element, it transforms each element into at most one
element (because some element might not have any next sibling). The
result is a collection of zero-or-more elements (if every element
does not have a next sibling).
Symbolically: 1+<01<T>> = 0+<T>
. This
is just interval multiplication.
We can use these two operations to describe the types of all of
jQuery’s APIs. For example, the next()
method can be
given the type Forall T, 1+<T> ->
0+<@next<T>>
.
Local structure
Wait — what’s @next
? Recall that we need to
connect the type system to the content of the page. So we ask
programmers to define local structures that describe the
portions of their pages that they intend to query. Think of them as
“schemas for page fragments”: while it is not reasonable
to ask programmers to schematize parts of the page they neither control
nor need to manipulate (such as 3rd-party ads), they
certainly must know the structure of those parts of the page that they
query! A local structure for, say, a Twitter stream might say:
(Tweet : Div classes = {tweet} optional = {starred} (Author : Span classes = {span}) (Time : Span classes = {time}) (Content : Span classes = {content})
Read this as “A Tweet
is a Div
that definitely has class tweet
and might have
class starred
. It contains three children elements in order:
an Author
, a Time
, and
a Content
. An Author
is
a Span
with class span
…”
(If you are familiar with templating libraries like mustache.js, these local structures might look familiar: they are just the widgets you would define for template expansion.)
From these definitions, we can compute crucial relationships: for
instance, the @next
sibling of an Author
is a Time
. And that in turn completes our
understanding of the type for next()
above: for
any local structure
type T
, next()
can be called on a
collection of at least one T
and will
return collection of zero or more elements that are
the @next
siblings of T
s.
Note crucially that while the uses of @next
are baked into our system, the function itself is not
defined once-and-for-all for all pages: it is computed from the
local structures that the programmer defined. In this way,
we’ve parameterized our type system. We imbue it with
knowledge of jQuery’s fixed API, but leave a hole for programmers to
provide their page-specific information, and
thereby specialize our system for their code.
Of course, @next
is not the only function we compute
over the local structure. We can
compute @parent
s, @siblings
, @ancestors
,
and more. But all of these functions are readily deducible from the
local structure.
Ok, so what does this all do for me?
Continuing with our Tweet example above, our system provides the following output for these next queries:The big picture
We have defined a set of types for the jQuery APIs that captures
their intended behavior. These types are expressed using helper
functions such as @next
, whose behavior is specific to
each page. We ask programmers merely to define the local structures
of their page, and from that we compute the helper functions we
need. And finally, from that, our system can produce type errors
whenever it encounters the problematic situations we listed above.
No additional programmer effort is needed, and the type errors
produced are typically local and pertinent to fixing buggy code.
Further reading
Obviously we have elided many of the nitty-gritty details that make our system work. We’ve written up our full system, with more formal definitions of the types and worked examples of finding errors in buggy queries and successfully typechecking correct ones. The writeup also explains some surprising subtleties of the type environment, and proposes some API enhancements to jQuery that were suggested by difficulties in engineering and using the types we defined.