Articles by tag: Scope
A Third Perspective on Hygiene
Scope Inference, a.k.a. Resugaring Scope Rules
Towards a Notional Machine for Runtime Stacks and Scope
Tags: Education, Misconceptions, Scope, Semantics, User Studies
Posted on 07 July 2022.Stacks are central to our understanding of program behavior; so is scope. These concepts become ever more important as ever more programming languages embrace concepts like closures and advanced control (like generators). Furthermore, stacks and scope interact in an interesting way, and these features really exercise their intersection.
Over the years we’ve seen students exhibit several problematic conceptions about stacks (and scope). For instance, consider a program like this:
def f(x):
return g(x + 1)
def g(y):
return y + x
f(3)
What is its value? You want an error: that x is not bound. But think
about your canonical stack diagram for this program. You have a frame
for g atop that for x, and you have been told that you “look down
the stack”. (Or vice versa, depending on how your stacks grow.) So
it’s very reasonable to conclude that this program produces 7, the
result produced by dynamic scope.
We see students thinking exactly this.
Consider this program:
def f(x):
return lambda y: x + y
p = f(3)
p(4)
This one, conversely, should produce 7. But students who have been
taught a conventional notion of call-and-return assume that f’s
stack frame has been removed after the call completed (correct!), so
p(4) must result in an error that x is not bound.
We see students thinking exactly this, too.
The paper sets out to do several things.
First, we try to understand the conceptions of stacks that students have coming into an upper-level programming languages course. (It’s not great, y’all.)
Second, we create some tooling to help students learn more about stacks. More on that below. The tooling seems to work well for students who get some practice using it.
Third, we find that even after several rounds of direct instruction and practice, some misconceptions remain. In particular, students do not properly understand how environments chain to get scope right.
Fourth, in a class that had various interventions including interpreters, students did much better than in a class where students learned from interpreters alone. Though we love interpreters and think they have various valuable uses in programming languages education, our results make us question some of the community’s beliefs about the benefits of using interpreters. In particular, some notions of transfer that we would have liked to see do not occur. We therefore believe that the use of interpreters needs much more investigation.
As for the tooling: One of the things we learned from our initial study is that students simply do not have a standardized way of presenting stacks. What goes in them, and how, were all over the map. We conjecture there are many reasons: students mostly see stacks and are rarely asked to draw them; and when they do, they have no standard tools for doing so. So they invent various ad hoc notations, which in turn don’t necessarily reinforce all the aspects that a stack should represent.
We therefore created a small tool for drawing stacks. What we did was repurpose Snap! to create a palette of stack, environment, and heap blocks. It’s important to understand these aren’t runnable programs: these are just static representations of program states. But Snap! is fine with that. This gave us a consistent notation that we could use everywhere: in class, in the textbook, and in homeworks. The ability to make stacks very quickly with drag-and-drop was clearly convenient to students who gained experience with the tool, because many used it voluntarily; it was also a huge benefit for in-class instruction over a more conventional drawing tool. An unexpected success for block syntaxes!
For more details, see the paper.
A Third Perspective on Hygiene
Tags: Programming Languages, Resugaring, Scope, Semantics
Posted on 20 June 2017.In the last post, we talked about scope inference: automatically inferring scoping rules for syntatic sugar. In this post we’ll talk about the perhaps surprising connection between scope inference and hygiene.
Hygiene can be viewed from a number of perspectives and defined in a number of ways. We’ll talk about two pre-existing perspectives, and then give a third perspective that comes from having scope inference.
First Perspective
The traditional perspective on hygiene (going all the way back to Kohlbecker in ’86) defines it by what shouldn’t happen when you expand macros / syntactic sugar. To paraphrase the idea:
Expanding syntactic sugar should not cause accidental variable capture. For instance, a variable used in a sugar should not come to refer to a variable declared in the program simply because it has the same name.
Thus, hygiene in this tradition is defined by a negative.
It has also traditionally focused strongly on algorithms. One would expect papers on hygiene to state a definition of hygiene, give an algorithm for macro expansion, and then prove that the algorithm obeys these properties. But this last step, the proof, is usually suspiciously missing. At least part of the reason for this, we suspect, is that definitions of hygiene have been too informal to be used in a proof.
And a definition of hygiene has been surprisingly hard to pin down precisely. In 2015, a full 29 years after Kohlbecker’s seminal work, Adams writes that “Hygiene is an essential aspect of Scheme’s macro system that prevents unintended variable capture. However, previous work on hygiene has focused on algorithmic implementation rather than precise, mathematical definition of what constitutes hygiene.” He goes on to discuss “observed-binder hygiene”, which is “not often discussed but is implicitly averted by traditional hygiene algorithms”. The point we’re trying to make is that the traditional view on hygiene is subtle.
Second Perspective
There is a much cleaner definition of hygiene, however, that is more of a positive statement (and subsumes the preceding issues):
If two programs are α-equivalent (that is, they are the same up to renaming variables), then when you desugar them (that is, expand their sugars) they should still be α-equivalent.
Unfortunately, this definition only makes sense if we have scope defined on both the full and base languages. Most hygiene systems can’t use this definition, however, because the full language is not usually given explicit scoping rules; rather, it’s defined implicitly through translation into the base language.
Recently, Herman and Wand have advocated for specifying the scoping rules for the full language (in addition to the base language), and then verifying that this property holds. If the property doesn’t hold, then either the scope specification or the sugar definitions are incorrect. This is, however, an onerous demand to place on the developer of syntactic sugar, especially since scope can be surprisingly tricky to define precisely.
Third Perspective
Scope inference gives a third perspective. Instead of requiring authors of syntactic sugar to specify the scoping rules for the full language, we give an algorithm that infers them. We have to then define what it means for this algorithm to work “correctly”.
We say that an inferred scope is correct precisely if the second definition of hygiene holds: that is, if desugaring preserves α-equivalence. Thus, our scope inference algorithm finds scoping rules such that this property holds, and if no such scoping rules exist then it fails. (And if there are multiple sets of scoping rules to choose from, it chooses the ones that put as few names in scope as possible.)
An analogy would be useful here. Think about type inference: it finds type annotations that could be put in your program such that it would type check, and if there are multiple options then it picks the most general one. Scope inference similarly finds scoping rules for the full language such that desugaring preserves α-equivalence, and if there are multiple options then it picks the one that puts the fewest names in scope.
This new perspective on hygiene allows us to shift the focus from expansion algorithms to the sugars themselves. When your focus is on an expansion algorithm, you have to deal with whatever syntactic sugar is thrown your way. If a sugar introduces unbound identifiers, then the programmer (who uses the macro) just has to deal with it. Likewise, if a sugar uses scope inconsistently, treating a variable either as a declaration or a reference depending on the phase of the moon, the programmer just has to deal with it. In contrast, since we infer scope for the full language, we check check weather a sugar would do one of these bad things, and if so we can call the sugar unhygienic.
To be more concrete, consider a desugaring
rule for bad(x, expression) that sometimes expands
to lambda x: expression and sometimes expands to just expression,
depending on the context. Our algorithm would infer from the first
rewrite that the expression must be in scope of x. However, this would
mean that the expression was allowed to contain references to x,
which would become unbound when the second rewrite was used! Our
algorithm detects this and rejects this desugaring rule. Traditional
macro systems allow this, and only detect the potential unbound
identifier problem when it actually occurred. The paper contains a
more interesting example called “lambda flip flop” that is rejected
because it uses scope inconsistently.
Altogether, scope inference rules out bad sugars that cannot be made hygienic, but if there is any way to make a sugar hygienic it will find it.
Again, here’s the paper and implementation, if you want to read more or try it out.
Scope Inference, a.k.a. Resugaring Scope Rules
Tags: Programming Languages, Pyret, Resugaring, Scope, Semantics
Posted on 12 June 2017.This is the second post in a series about resugaring. It focuses on resugaring scope rules. See also our posts on resugaring evaluation steps and resugaring type rules.
Many programming languages have syntactic sugar. We would hazard to
guess that most modern languages do. This is when a piece of syntax
in a language is defined in terms of the rest of the language. As a
simple example, x += expression might be shorthand for x = x +
expression. A more interesting sugar is
Pyret’s for loops.
For example:
for fold(p from 1, n from range(1, 6)):
p * n
end
computes 5 factorial, which is 120. This for is a piece of sugar,
though, and the above code is secretly shorthand for:
fold(lam(p, n): p * n end, 1, range(1, 6))
Sugars like this are great for language development: they let you grow a language without making it more complicated.
Languages also have scoping rules that say where variables are in scope. For instance, the scoping rules should say that a function’s parameters are in scope in the body of the function, but not in scope outside of the function. Many nice features in editors depend on these scoping rules. For instance, if you use autocomplete for variable names, it should only suggest variables that are in scope. Similarly, refactoring tools that rename variables need to know what is in scope.
This breaks down in the presence of syntactic sugar, though: how can your editor tell what the scoping rules for a sugar are?
The usual approach is to write down all of the scoping rules for all of the sugars. But this is error prone (you need to make sure that what you write down matches the actual behavior of the sugars), and tedious. It also goes against a general principle we hold: to add a sugar to a language, you should just add the sugar to the language. You shouldn’t also need to update your scoping rules, or update your type system, or update your debugger: that should all be done automatically.
We’ve just published a paper at ICFP that shows how to automatically
infer the scoping rules for a piece of sugar, like the for example
above. Here is the
paper and implementation.
This is the latest work we’ve done with the goal of making the above
principle a reality. Earlier, we showed
how to automatically find evaluation steps that show how your program
runs in the presence of syntatic sugar.
How it Works
Our algorithm needs two things to run:
- The definitions of syntactic sugar. These are given as pattern-based rewrite rules, saying what patterns match and what they should be rewritten to.
- The scoping rules for the base (i.e. core) language.
It then automatically infers scoping rules for the full language, that includes the sugars. The final step to make this useful would be to add these inferred scoping rules to editors that can use them, such as Sublime, Atom, CodeMirror, etc.
For example, we have tested it on Pyret (as well as other languages).
We gave it scoping rules for Pyret’s base language (which included
things like lambdas and function application), and we gave it rules
for how for desugars, and it determined the scoping rules of
for. In particular:
- The variables declared in each
fromclause are visible in the body, but not in the argument of anyfromclause. - If two
fromclauses both declare the same variable, the second one shadows the first one.
This second rule is exactly the sort of thing that is easy to overlook if you try to write these rules down by hand, resulting in obscure bugs (e.g. when doing automatic variable refactoring).
Here are the paper and implementation, if you want to read more or try it out.
