Developing Behavioral Concepts of HigherOrder Functions
Posted on 31 July 2021.
HigherOrder Functions (HOFs) are an integral part of the programming process. They are so ubiquitous, even Java had to bow and accept them. They’re especially central to cleanly expressing the stages of processing data, as the R community and others have discovered.
How do we teach higherorder functions? In some places, like
How to Design Programs,
HOFs are presented as abstractions over common patterns. That is,
you see a certain pattern of program behavior over and over, and
eventually learn to parameterize it and call it map
, filter
, and
so on. That is a powerful method.
In this work, we take a different perspective. Our students often write examples first, so they can think in terms of the behavior they want to achieve. Thus, they need to develop an understanding of HOFs as abstractions over common behaviors: “mapping”, “filtering”, etc.
Our goal in this work is to study how well students form behavioral abstractions having been taught code pattern abstractions. Our main instrument is sets of inputoutput behavior, such as
(list "red" "green" "blue")
>
(list 3 5 4)
(Hopefully that calls to mind “mapping”.) We very carefully designed a set of these to capture various specific similarities, overlaps, and impossibilities.
We then evaluated students using two main devices, inspired by activities in machine learning:

Clustering: Group behaviors into clusters of similar ones.

Classification: For each behavior, assign a label.
We also tried labeling over visual presentations.
Our paper describes these instruments in detail, and our outcomes. What is most interesting, perhaps, is not our specific outcomes, but this style of thinking about teaching HOFs. We think the materials—especially the inputoutput pairs, and a table of properties—will be useful to educators who are tackling this topic. More broadly, we think there’s a huge unexplored space of HOF pedagogy combined with meaningful evaluation. We hope this work inspires more work along these lines.
Adversarial Thinking Early in PostSecondary Education
Posted on 18 July 2021.
Adversarial Thinking (AT) is often described as “thinking like a hacker” or as a “security mindset”. These quasidefinitions are not only problematic in their own right (in some cases, they can be outright circular), they are also too narrow. We believe that AT applies in many other settings as well: in finding ways where machine learning can go wrong, for identifying problems with user interfaces, and for that matter even in software testing and verification.
All these are, however, quite sophisticated computer science concepts. Does that mean AT can only be covered in advanced computer science courses—security, machine learning, formal methods, and the like? Put differently, how much technical sophistication do students need before they can start to engage in it?
We believe AT can be covered starting from a fairly early stage. In this work, we’ve studied its use with (accelerated) introductory postsecondary (university) students. We find that they do very well, but also exhibit some weaknesses. We also find that they are able to reckon with the consequences of systems well beyond their technical capability. Finally, we find that they focus heavily on social issues, not just on technical ones.
In addition to these findings, we have also assembled a rich set of materials covering several aspects of computer science. Students generally found these engaging and thoughtprovoking, and responded to them with enthusiasm. We think educators would benefit greatly from this collection of materials.
Want to Learn More?
If you’re interested in this, and in the outcomes, please see our paper.
Teaching and Assessing PropertyBased Testing
Posted on 10 January 2021.
PropertyBased Testing (PBT) sees increasing use in industry, but lags significantly behind in education. Many academics have never even heard of it. This isn’t surprising; computing education still hasn’t come to terms with even basic software testing, even when it can address pedagogic problems. So this lag is predictable.
The Problem of Examples
But even people who want to use it often struggle to find good examples of it. Reversing a list drives people to drink, and math examples are hard to relate to. This is a problem from several respects. Without compelling examples, nobody will want to teach it. Even if they do, unless the examples are compelling, students will not pay attention to it. And if the students don’t, they won’t recognize opportunities to use it later in their careers.
This loses much more than a testing technique. We consider PBT a gateway to formal specification. Like a formal spec, it’s an abstract statement about behaviors. Unlike a formal spec, it doesn’t require learning a new language or mathematical formalism, it’s executable, and it produces concrete counterexamples. We therefore use it, in Brown’s Logic for Systems course, as the starting point to more formal specifications. (If they learn nothing about formal specification but simply become better testers, we’d still consider that a win.)
Therefore, for the past 10 years, and with growing emphasis, we’ve been teaching PBT: starting in our accelerated introductory course, then in Logic for Systems, and gradually in other courses as well. But how do we motivate the concept?
Relational Problems
We motivate PBT through what we call relational problems. What are those?
Think about your typical unit test. You write an inputoutput
pair: f(x) is y
. Let’s say it fails:

Usually, the function
f
is wrong. Congrats, you’ve just caught a bug! 
Sometimes, the test is wrong:
f(x)
is not, in fact,y
. This can take some reflection, and possibly reveals a misunderstanding of the problem.
That’s usually where the unittesting story ends. However, there is one more possibility:
 Neither is “wrong”.
f(x)
has multiple legal results,w
,y
, andz
; your test chosey
, but this particular implementation happened to returnz
orw
instead.
We call these “relational” because f
is clearly more a relation than
a function.
Some Examples
So far, so abstract. But many problems in computing actually have a relational flavor:

Consider computing shortest paths in a graph or network; there can be many shortest paths, not just one. If we write a test to check for one particular path, we could easily run into the problem above.

Many other graph algorithms are also relational. There are many legal answers, and the implementation happens to pick just one of them.

Nondeterministic data structures inspire relational behavior.

Various kinds of matching problems—e.g., the stablemarriage problem—are relational.

Combinatorial optimization problems are relational.

Even sorting, when done over nonatomic data, is relational.
In short, computing is full of relational problems. While they are not at all the only context in which PBT makes sense, they certainly provide a rich collection of problems that students already study that can be used to expose this idea in a nontrivial setting.
Assessing Student Performance
Okay, so we’ve been having students write PBT for several years now. But how well do they do? How do we go about measuring such a question? (Course grades are far too coarse, and even assignment grades may include various criteria—like code style—that are not strictly germane to this question.) Naturally, their core product is a binary classifier—it labels a purported implementation as valid or invalid—so we could compute precision and recall. However, these measures still fail to offer any semantic insight into how students did and what they missed.
We therefore created a new framework for assessing this. To wit, we took each problem’s abstract property statement (viewed as a formal specification), and subdivided it into a set of subproperties whose conjunction is the original property. Each subproperty was then turned into a test suite, which accepted those validators that enforced the property and rejected those that did not. This let us get a more finegrained understanding of how students did, and what kinds of mistakes they made.
Want to Learn More?
If you’re interested in this, and in the outcomes, please see our paper.
Students Testing Without Coercion
Posted on 20 October 2020.
It is practically a trope of computing education that students are overeager to implement, yet woefully undereager to confirm they understand the problem they are tasked with, or that their implementation matches their expectations. We’ve heard this stereotype couched in various degrees of cynicism, ranging from “students can’t test” to “students won’t test”. We aren’t convinced, and have, for several years now, experimented with nudging students towards early example writing and thorough testing.
We’ve blogged previously about our prototype IDE, Examplar — our experiment in encouraging students to write illustrative examples for their homework assignments before they actually dig into their implementation. Examplar started life as a separate, complementary tool from Pyret’s usual programming environment, providing a buffer just for the purposes of developing a test suite. Clicking Run in Examplar runs the student’s test suite against our implementations (not theirs) and then reports the degree to which the suite was valid and thorough (i.e., good at catching our buggy implementations). With this tool, students could catch their misconceptions before implementing them.
Although usage of this tool was voluntary for all but the first assignment, students relied on it extensively throughout the semester and the quality of final submissions improved drastically compared to prior offerings of the course. Our positive with this prototype encouraged us to fullyintegrate Examplar’s feedback into students’ development environment. Examplar’s successor provides a unified environment for the development of both tests and implementation:
This new environment — which provides Examplarlike feedback on every Run — no longer requires that students have the selfawareness to periodically switch to a separate tool. The environment also requires students to first click a “Begin Implementation” button before showing the tab in which they write their implementation.
This unified environment enabled us to study, for the first time, whether students wrote examples early, relative to their implementation progress. We tracked the maximum test thoroughness students had achieved prior to each editandrun of their implementations file. Since the IDE notified students of their test thoroughness upon each run, and since students could only increase their thoroughness via edits to their tests file, the mean of these scores summarizes how thoroughly a student explored the problem with tests before fully implementing it.
We find that nearly every student on nearly every assignment achieves some level of thoroughness before their implementation work:
To read more about our design of this environment, its pedagogic context, and our evaluation of students’ development process, check out the full paper here.
Using Design Alternatives to Learn About Data Organizations
Posted on 27 June 2020.
A large number of computer science education papers focus on data structures. By this, they mean the canon: lists, queues, stacks, heaps, and so on. These are certainly vital to the design of most programs.
However, there is another kind of “data structure” programmers routinely contend with: how to represent the world your program is about. Suppose, for instance, you’re trying to represent a family’s genealogy. You could:

Represent each person as a node and have references to their two biological parents, who in turn have references to their biological parents, and so on. The tree “bottoms out” when we get to people about whom we have no more information.

Represent each person as a node and have references to their children instead (a list, say, if we want to preserve their birth order). This tree bottoms out at people who have no children.

Represent each coupling as a node, and have references to their children (or issue, as genealogists like to say). Now you may have a kind of node for children and another for coupling.
And so on. There are numerous possibilities. Which one should we pick? It depends on (1) what information we even have, (2) what operations we want to perform, and (3) what complexity we need different operations to take.
Unfortunately, computing education research doesn’t talk about this problem very much at all; in fact, we don’t seem to even have terminology to talk about this issue. In a sense, this is also very much a matter of data structure, though of a different kind: whereas the purely abstract data structures of computer science we might call computational data structures, these — which center around directly representing realworld information — we might instead call representational data structures. That could get pretty confusing, though, so we’ve adopted the term data organization to refer to the latter.
Learning to think about data organization is an essential computing skill. But how early can we teach it? How well can students wrestle with it? What methods should we use? Do they need to be sophisticated programmers before they can engage in reasoning about representations?
Good news: we can begin this quite early, and students don’t need to be sophisticated computer scientists: they can just think about the world, and their experiences living in it, to reason about data organizations. Representational data structures probably do a far better job of drawing on their lived experience than computational ones do! (Unless they’ve previously lived as a computer.)
There are several ways we could introduce this topic. We chose to expose them to pairs of representations for the same domain, and have them compare the two. This is related to theories of perception. Read the paper to learn more!
Somewhat subtly, this also adds a dimension to “computational thinking” that is usually quite missing from standard discussions about it. Activities like those described in this paper generate new and engaging activities that many students can participate in. Indeed, computing background does not seem to matter much in our data, and a more diverse group of students is likely to make a much richer set of judgments—thereby enabling students in traditionally underrepresented groups to contribute based on their unique experiences, and also feel more valued.