Mystery Languages

Tags: Programming Languages, Education

Posted on 05 July 2018.

How do you learn a new language? Do you simply read its reference manual, and then you’re good to go? Or do you also explore the language, trying things out by hand to see how they behave?

This skill—learning about something by poking at it—is frequently used but almost never taught. However, we have tried to teach it via what we dub mystery languages, and this post is about how we went about it.

Mystery Lanuages

A mystery language is exactly what it sounds like: a programming language whose behavior is a mystery. Each assignment comes with some vague documentation, and an editor in which you can write programs. However, when you run a program it will produce multiple answers! This is because there are actually multiple languages, with the same syntax but different semantics. The answers you get are the results of running your program in all of the different languages. The goal of the assignment is to find programs that tell the languages apart.

As an example, maybe the languages have the syntax a[i], and the documentation says that this “accesses an array a at index i”. That totally specifies the behavior of this syntax, right?

Not even close. For example, what happens if the index is out of bounds? Does it raise an error (like Java), or return undefined (like JavaScript), or produce nonsense (like C), or wrap the index to a valid one (like Python)? And what happens if the index is 2.5, or "2", or 2.0, or "two"?

(EDIT: Specifically, Python wraps negative indices that are smaller than the length of the array.)

Students engage with the mystery languages in three ways:

  1. The first part of each assignment is to find a set of programs that distinguishes the different languages (a.k.a. a classifier).
  2. The second part of the assignment is to describe a theory that explains the different behaviors of the languages. (For example, a theory about C’s behavior could include accessing heap addresses past the end of an array.)
  3. Finally, after an assignment is due, there’s an in-class discussion and explanation of the mystery languages. This is especially useful to provide terminology for behaviors that students encountered in the languages.

This example is somewhat superficial (the real mystery languages mostly use more significant features than array access), but you get the idea: every aspect of a programming language comes with many possible designs, and the mystery languages have students explore them first-hand.

Why Mystery Languages?

We hope to teach a number of skills through mystery languages:

  • Evaluating Design Decisions: The mystery languages are almost entirely based on designs chosen by real languages: either historical choices that have since been settled, or choices that are still up in the air and vary between modern languages. Students get to explore the consequences of these decisions themselves, so that they get a feel for them. And the next day in class they learn about the broader historical context.
  • Syntax vs. Semantics: We are frustrated by how frequently discussions about programming languages revolve around syntax. With luck, showing students first-hand that a single syntax can have a variety of semantics will bring their discussions to a somewhat higher level.
  • Adversarial Thinking: Notice that the array example was all about exploring edge cases. If you only ever wrote correct, safe, boring programs then you’d never notice the different ways that array access could work. This is very similar to the kind of thinking you need to do when reasoning about security, and we hope that students might pick up on this mindset.
  • Experimental Thinking: In each assignment, students are asked not only to find programs that behave differently across the mystery languages, but also to explain the behavior of the different languages. This is essentially a scientific task: brainstorm hypotheses about how the languages might work; experimentally test these hypotheses by running programs; discard any hypothesis that was falsified; and iterate.

Adopting Mystery Languages

If you want to use Mystery Languages in your course, email us and we’ll help you get started!

There are currently 13 mystery languages, and more in development. At Brown, we structured our programming languages course around these mystery languages: about half of the assignments and half of the lectures were about mystery languages. However, mystery languages are quite flexible, and could also be used as a smaller supplement to an existing course. One possibility is to begin with one or two simpler languages to allow students to learn how they work, and from there mix and match any of the more advanced languages. Alternatively, you could do just one or two mystery languages to meet specific course objectives.

Learn More

You can learn more about mystery languages in our SNAPL’17 paper, or dive in and try them yourself (see the assignments prefixed with “ML:”).