Scope Inference, a.k.a. Resugaring Scope Rules
Tags: Programming Languages, Semantics
Posted on 12 June 2017.This is the second post in a series about resugaring. It focuses on resugaring scope rules. See also our posts on resugaring evaluation steps and resugaring type rules.
Many programming languages have syntactic sugar. We would hazard to
guess that most modern languages do. This is when a piece of syntax
in a language is defined in terms of the rest of the language. As a
simple example, x += expression
might be shorthand for x = x +
expression
. A more interesting sugar is
Pyret’s for
loops.
For example:
for fold(p from 1, n from range(1, 6)):
p * n
end
computes 5 factorial, which is 120. This for
is a piece of sugar,
though, and the above code is secretly shorthand for:
fold(lam(p, n): p * n end, 1, range(1, 6))
Sugars like this are great for language development: they let you grow a language without making it more complicated.
Languages also have scoping rules that say where variables are in scope. For instance, the scoping rules should say that a function’s parameters are in scope in the body of the function, but not in scope outside of the function. Many nice features in editors depend on these scoping rules. For instance, if you use autocomplete for variable names, it should only suggest variables that are in scope. Similarly, refactoring tools that rename variables need to know what is in scope.
This breaks down in the presence of syntactic sugar, though: how can your editor tell what the scoping rules for a sugar are?
The usual approach is to write down all of the scoping rules for all of the sugars. But this is error prone (you need to make sure that what you write down matches the actual behavior of the sugars), and tedious. It also goes against a general principle we hold: to add a sugar to a language, you should just add the sugar to the language. You shouldn’t also need to update your scoping rules, or update your type system, or update your debugger: that should all be done automatically.
We’ve just published a paper at ICFP that shows how to automatically
infer the scoping rules for a piece of sugar, like the for
example
above. Here is the
paper and implementation.
This is the latest work we’ve done with the goal of making the above
principle a reality. Earlier, we showed
how to automatically find evaluation steps that show how your program
runs in the presence of syntatic sugar.
How it Works
Our algorithm needs two things to run:
- The definitions of syntactic sugar. These are given as pattern-based rewrite rules, saying what patterns match and what they should be rewritten to.
- The scoping rules for the base (i.e. core) language.
It then automatically infers scoping rules for the full language, that includes the sugars. The final step to make this useful would be to add these inferred scoping rules to editors that can use them, such as Sublime, Atom, CodeMirror, etc.
For example, we have tested it on Pyret (as well as other languages).
We gave it scoping rules for Pyret’s base language (which included
things like lambda
s and function application), and we gave it rules
for how for
desugars, and it determined the scoping rules of
for
. In particular:
- The variables declared in each
from
clause are visible in the body, but not in the argument of anyfrom
clause. - If two
from
clauses both declare the same variable, the second one shadows the first one.
This second rule is exactly the sort of thing that is easy to overlook if you try to write these rules down by hand, resulting in obscure bugs (e.g. when doing automatic variable refactoring).
Here are the paper and implementation, if you want to read more or try it out.