S5: A Semantics for Today's JavaScript
Tags: JavaScript, Semantics, Programming Languages
Posted on 11 November 2011.The JavaScript language isn't static―the ECMAScript committee is working hard to improve the language, and browsers are implementing features both in and outside the spec, making it difficult to understand just what "JavaScript" means at any point in time. Existing implementations aren't much help―their goal is to serve pages well and fast. We need a JavaScript architecture that can help us make sense of the upcoming (and existing!) features of the language.
To this end, we've developed S5, an ECMAScript 5 runtime, built on λJS, with the explicit goal of helping people understand and tinker with the language. We built it to understand the features in the new standard, building on our previous efforts for the older standard. We've now begun building analyses for this semantics, and are learning more about it as we do so. We're making it available with the hope that you can join us in playing with ES5, extending it with new features, and building tools for it.
S5 implements the core features of ES5 strict mode. How do we know this? We've tested S5 against Test262 to measure our progress. We are, of course, not feature complete, but we're happy with our progress, which you can check out here.
A Malleable Implementation
The semantics of S5 is designed to be two things: a language for writing down the algorithms of the specification, and a translation target for JavaScript programs. We've implemented an interpreter for S5, and a desugaring function that translates JavaScript source into S5 programs.
We have a number of choices to make in defining desugaring. The ECMAScript standard defines a whole host of auxiliary functions and library routines that we must model. Putting these implementations directly in the desugaring function would work, but would make desugaring unnecessary brittle, and require recompilation on every minor change. Instead, we implement the bulk of this functionality as an S5 program. The majority of our work happens in an environment file that defines the spec in S5 itself. The desugaring defines a translation from the syntactic forms of JavaScript to the (smaller) language of S5, filled with calls into the functions defined in this environment.
This separation of concerns is what makes our implementation so amenable to
exploration. If you want to try something out, you can edit the environment
file and rerun whatever tests you care to learn about. Want to try a different
implementation of the ==
operator? Just change the definition, as
it was pulled from the spec, at line
300. Want a more expressive Object.toString()
that doesn't
just print "[object Object]"
? That's right
here. No changing an interpreter, no recompiling a desugaring function
necessary.
The environment we've written reflects the standard's algorithms as we understand them in terms of S5. The desugaring from JavaScript to S5 code with calls into this library is informed by the specification's definitions of expression and statement evaluation. We have confidence in the combination of desugaring and library implementation, given our increasing test coverage. Further, we know how to continue―implement more of the spec and pass more test cases!
More than λJS
S5 is built on λJS, but extends it in three significant ways:
- Explicit getters and setters;
- Object fields with attributes, like
writable
andconfigurable
, built-in; - Support for
eval().
Attributes on objects weren't treated directly in the original
λJS.
In 5th Edition, they are crucial to several security-relevant operations on
objects. For example, the standard specifies Object.freeze()
,
which makes an object's properties forever unwritable. S5 directly models the
writable
and configurable
attributes that make this
operation possible, and make its implementation in S5 easy to understand.
λJS explicitly elided eval()
from its semantics. S5
implements eval()
by performing desugaring within the
interpreter and then interpreting the desugared code. We implement only
the strict mode version of eval
, which restricts the environment
that the eval
'd code can affect. With these restrictions, we can
implement eval
in a straightforward way within our interpreter.
We'll cover the details of how we do this, and why it works, in another post.
Building on S5
There's a ton we can do with S5. More, in fact, than we can do by ourselves:- Experiment with Harmony features: ECMAScript 6, or Harmony, as it is often called, is being designed right now. Proxies, string interpolation, syntactic sugar for classes, and modules are just a few of the upcoming features. Modeling them in S5 would help us understand these features better as they get integrated into the language.
- Build Verification Tools: Verification based on objects' attributes is an interesting research problem―what can we prove about interacting programs if we know about unwritable fields and inextensible objects? Building this knowledge into a type-checker or a program analysis could give interesting new guarantees.
- Abstract Our Machine: Matt Might and David van Horn wrote about abstracting λJS for program analysis. We've added new constructs to the language since then. Do they make abstraction any harder?
- Complete the Implementation: We've made a lot of progress, but there's still more ground to cover. We need support for more language features, like JSON and regular expressions, that would move our implementation along immensely. We'll work on this more, but anyone who wants to get involved is welcome to help.
If any of this sounds interesting, or if you're just curious, go ahead and check out S5! It's open source and lives in a Github repository. Let us know what you do with it!