Privacy-Respecting Type Error Telemetry at Scale
Posted on 02 February 2024.Thesis: Programming languages would benefit hugely from telemetry. It would be extremely useful to know what people write, how they edit, what problems they encounter, etc. Problem: Observing programmers is very problematic. For students, it may cause anxiety and thereby hurt their learning (and grades). For professionals too it may cause anxiety, but it can also leak trade secrets.
One very partial solution is to perform these observations in controlled settings, such as a lab study. The downside is that it is hard to get diverse populations of users, it’s hard to retain them for very long, it’s hard to pay for these studies, etc. Furthermore, the activities they perform in a lab study may be very different from what they would do in real use: i.e., lab studies especially lack ecological validity when compared against many real-world programming settings.
We decided to instead study a large number of programmers doing their normal work—but in a privacy-respecting way. We collaborated with the Roblox Studio team on this project. Roblox is a widely-used platform for programming and deploying games, and it has lots of users of all kinds of ages and qualifications. They range from people writing their first programs to developers working for game studios building professional games.
In particular, we wanted to study a specific phenomenon: the uptake of types in Luau. Luau is an extension of Lua that powers Roblox. It supports classic Lua programs, but also lets programmers gradually add types to detect bugs at compile time. We specifically wanted to see what kind of type errors people make when using Luau, with the goal of improving their experience and thereby hopefully increasing uptake of the language.
Privacy-respecting telemetry sounds wonderful in theory but is very thorny in practice. We want our telemetry to have several properties:
-
It must not transmit any private information. This may be more subtle than it sounds. Error messages can, for instance, contain the names of functions. But these names may contain a trade secret.
-
It must be fast on the client-side so that the programmer experience is not disrupted.
-
It must transmit only small amount of data, so as not to overload the database servers.
(The latter two are not specific to privacy, but are necessary when working at scale.)
Our earlier, pioneering work on error message analysis was able to obtain a large amount of insight from logs. As a result of the above constraints, we cannot even pose many of the same questions in our setting.
Nevertheless, we were able to still learn several useful things about Luau. For more details, see the paper. But to us, this project is at least as interesting for the questions it inspires as for the particular solution or the insights gained from it. We hope to see many more languages incorporate privacy-respecting telemetry. (We’re pleased to see the Go team also thinking about these issues, as summarized in Russ Cox’s transparent telemetry notes. While there are some differences between our approaches, our overarching goals and constraints are very much in harmony.)