A Third Perspective on Hygiene

Tags: Programming Languages, Resugaring, Scope, Semantics

Posted on 20 June 2017.

In the last post, we talked about scope inference: automatically inferring scoping rules for syntatic sugar. In this post we’ll talk about the perhaps surprising connection between scope inference and hygiene.

Hygiene can be viewed from a number of perspectives and defined in a number of ways. We’ll talk about two pre-existing perspectives, and then give a third perspective that comes from having scope inference.

First Perspective

The traditional perspective on hygiene (going all the way back to Kohlbecker in ’86) defines it by what shouldn’t happen when you expand macros / syntactic sugar. To paraphrase the idea:

Expanding syntactic sugar should not cause accidental variable capture. For instance, a variable used in a sugar should not come to refer to a variable declared in the program simply because it has the same name.

Thus, hygiene in this tradition is defined by a negative.

It has also traditionally focused strongly on algorithms. One would expect papers on hygiene to state a definition of hygiene, give an algorithm for macro expansion, and then prove that the algorithm obeys these properties. But this last step, the proof, is usually suspiciously missing. At least part of the reason for this, we suspect, is that definitions of hygiene have been too informal to be used in a proof.

And a definition of hygiene has been surprisingly hard to pin down precisely. In 2015, a full 29 years after Kohlbecker’s seminal work, Adams writes that “Hygiene is an essential aspect of Scheme’s macro system that prevents unintended variable capture. However, previous work on hygiene has focused on algorithmic implementation rather than precise, mathematical definition of what constitutes hygiene.” He goes on to discuss “observed-binder hygiene”, which is “not often discussed but is implicitly averted by traditional hygiene algorithms”. The point we’re trying to make is that the traditional view on hygiene is subtle.

Second Perspective

There is a much cleaner definition of hygiene, however, that is more of a positive statement (and subsumes the preceding issues):

If two programs are α-equivalent (that is, they are the same up to renaming variables), then when you desugar them (that is, expand their sugars) they should still be α-equivalent.

Unfortunately, this definition only makes sense if we have scope defined on both the full and base languages. Most hygiene systems can’t use this definition, however, because the full language is not usually given explicit scoping rules; rather, it’s defined implicitly through translation into the base language.

Recently, Herman and Wand have advocated for specifying the scoping rules for the full language (in addition to the base language), and then verifying that this property holds. If the property doesn’t hold, then either the scope specification or the sugar definitions are incorrect. This is, however, an onerous demand to place on the developer of syntactic sugar, especially since scope can be surprisingly tricky to define precisely.

Third Perspective

Scope inference gives a third perspective. Instead of requiring authors of syntactic sugar to specify the scoping rules for the full language, we give an algorithm that infers them. We have to then define what it means for this algorithm to work “correctly”.

We say that an inferred scope is correct precisely if the second definition of hygiene holds: that is, if desugaring preserves α-equivalence. Thus, our scope inference algorithm finds scoping rules such that this property holds, and if no such scoping rules exist then it fails. (And if there are multiple sets of scoping rules to choose from, it chooses the ones that put as few names in scope as possible.)

An analogy would be useful here. Think about type inference: it finds type annotations that could be put in your program such that it would type check, and if there are multiple options then it picks the most general one. Scope inference similarly finds scoping rules for the full language such that desugaring preserves α-equivalence, and if there are multiple options then it picks the one that puts the fewest names in scope.

This new perspective on hygiene allows us to shift the focus from expansion algorithms to the sugars themselves. When your focus is on an expansion algorithm, you have to deal with whatever syntactic sugar is thrown your way. If a sugar introduces unbound identifiers, then the programmer (who uses the macro) just has to deal with it. Likewise, if a sugar uses scope inconsistently, treating a variable either as a declaration or a reference depending on the phase of the moon, the programmer just has to deal with it. In contrast, since we infer scope for the full language, we check check weather a sugar would do one of these bad things, and if so we can call the sugar unhygienic.

To be more concrete, consider a desugaring rule for bad(x, expression) that sometimes expands to lambda x: expression and sometimes expands to just expression, depending on the context. Our algorithm would infer from the first rewrite that the expression must be in scope of x. However, this would mean that the expression was allowed to contain references to x, which would become unbound when the second rewrite was used! Our algorithm detects this and rejects this desugaring rule. Traditional macro systems allow this, and only detect the potential unbound identifier problem when it actually occurred. The paper contains a more interesting example called “lambda flip flop” that is rejected because it uses scope inconsistently.

Altogether, scope inference rules out bad sugars that cannot be made hygienic, but if there is any way to make a sugar hygienic it will find it.

Again, here’s the paper and implementation, if you want to read more or try it out.

The Brown PLT Blog