Scope Inference

Tags: Programming Languages, Semantics

Posted on 12 June 2017.

Many programming languages have syntactic sugar. We would hazard to guess that most modern languages do. This is when a piece of syntax in a language is defined in terms of the rest of the language. As a simple example, x += expression might be shorthand for x = x + expression. A more interesting sugar is Pyret’s for loops. For example:

for fold(p from 1, n from range(1, 6)):
  p * n
end

computes 5 factorial, which is 120. This for is a piece of sugar, though, and the above code is secretly shorthand for:

fold(lam(p, n): p * n end, 1, range(1, 6))

Sugars like this are great for language development: they let you grow a language without making it more complicated.

Languages also have scoping rules that say where variables are in scope. For instance, the scoping rules should say that a function’s parameters are in scope in the body of the function, but not in scope outside of the function. Many nice features in editors depend on these scoping rules. For instance, if you use autocomplete for variable names, it should only suggest variables that are in scope. Similarly, refactoring tools that rename variables need to know what is in scope.

This breaks down in the presence of syntactic sugar, though: how can your editor tell what the scoping rules for a sugar are?

The usual approach is to write down all of the scoping rules for all of the sugars. But this is error prone (you need to make sure that what you write down matches the actual behavior of the sugars), and tedious. It also goes against a general principle we hold: to add a sugar to a language, you should just add the sugar to the language. You shouldn’t also need to update your scoping rules, or update your type system, or update your debugger: that should all be done automatically.

We’ve just published a paper at ICFP that shows how to automatically infer the scoping rules for a piece of sugar, like the for example above. Here is the paper and implementation. This is the latest work we’ve done with the goal of making the above principle a reality. Earlier, we showed how to automatically find evaluation steps that show how your program runs in the presence of syntatic sugar.

How it Works

Our algorithm needs two things to run:

  • The definitions of syntactic sugar. These are given as pattern-based rewrite rules, saying what patterns match and what they should be rewritten to.
  • The scoping rules for the base (i.e. core) language.

It then automatically infers scoping rules for the full language, that includes the sugars. The final step to make this useful would be to add these inferred scoping rules to editors that can use them, such as Sublime, Atom, CodeMirror, etc.

For example, we have tested it on Pyret (as well as other languages). We gave it scoping rules for Pyret’s base language (which included things like lambdas and function application), and we gave it rules for how for desugars, and it determined the scoping rules of for. In particular:

  • The variables declared in each from clause are visible in the body, but not in the argument of any from clause.
  • If two from clauses both declare the same variable, the second one shadows the first one.

This second rule is exactly the sort of thing that is easy to overlook if you try to write these rules down by hand, resulting in obscure bugs (e.g. when doing automatic variable refactoring).


Here are the paper and implementation, if you want to read more or try it out.