Articles by tag: Differential Analysis

Differential Analysis: A Summary
Tierless Programming for SDNs: Differential Analysis



Differential Analysis: A Summary

Tags: Differential Analysis, Formal Methods, Properties, Verification

Posted on 27 June 2024.

For multiple decades we have worked on a the problem of differential analysis. This post explains where it comes from, what it means, and what its consequences are.

Context: Verification

For decades, numerous researchers have worked on the problem of verification. To a first approximation, we can describe this problem as follows:

P ⊧ ɸ

That is, checking whether some program P satisfies some property ɸ. There have been many productive discussions about exactly what methods we should use, but this remains the fundamental question.

Starting in around 2004, we started to build tools to verify a variety of interesting system descriptions (the Ps), starting with access-control policies. They could also be security policies, network configurations, and more. We especially recognized that many of these system descriptions are (sufficiently) sub-Turing-complete, which means we can apply rich methods to precisely answer questions about them. That is, there is a rich set of problems we can verify.

The Problem

Attractive as this idea is, it runs into a significant problem in practice. When you speak to practitioners, you find that they are not short of system descriptions (Ps), but they are severely lacking in properties (ɸs). The problem is not what some might imagine — that they can’t express their properties in some precise logic (which is a separate problem!) — but rather that they struggle to express non-trivial properties at all. A typical conversation might go like:

  • We have a tool for verification!
  • Nice, what does it do?
  • It consumes system descriptions of the kind you produce!
  • Oh, that’s great!
  • You just need to specify your properties.
  • My what?
  • Your properties! What you want your system to do!
  • I don’t have any properties!
  • But what do you want the system to do?
  • … Work correctly?

This is not to mock practitioners: not at all. Quite the contrary! Even formal methods experts would struggle to precisely describe the expected behavior of complex systems. In fact, talk to formal methods researchers long enough and they’ll admit knowing that this is a problem. It’s just not something we like to think about.

An Alternative

In fact, the “practitioner” answer shows us the path forward. What does it mean for a system to “work”? How does a system’s maintainer know that it “works”?

As a practical matter, many things help us confirm that a system is working well enough. We might have some test suites, we might have monitoring of its execution, we observe it run and use it; lots of people are using it every day; we might even have verified a few properties of a few parts! The net result is that we have confidence in a system.

And then, things happen! Typically, one of two things:

  • We find a bug, and need to fix it.
  • We modify an existing feature or add a new one (or — all too rarely — remove one!).

So the problem we run into is the following:

How do we transfer the confidence we had
in the old version to the new one?

Put differently, the core of formal methods is checking for compatibility between two artifacts. Traditionally, we have a system description (P) and a property (ɸ); these are meant to be expressed independent of one another, so that compatibility gives us confidence and incompatibility indicates an error. But now we have two different artifacts: an old system (call it P) and a new one (call it P’). These are obviously not going to be the same (except in rare cases; see below), but broadly, we want to know, how are they different?

P - P'

Of course, what we care about is not the syntactic difference, but the semantic change. Large syntactic differences may have small semantic ones and vice versa.

Defining the Difference

Computing the semantic difference is often not easy. There is a long line of work of computing the difference of programs. However, it is difficult for Turing-complete languages; it is also not always clear what the type of the difference should be. Computing it is a lot easier when the language is sub-Turing-complete (as our papers show). The question of exactly what a “difference” is is also interesting.

Many of the system description languages we have worked with tend to be of the form RequestResponse. For instance, an access control policy might have the type:

Request ⇒ {Accept, Deny}

(In practice, policy languages can be much richer, but this suffices for illustration.) So what is the type of the difference? It maps every request to the cross-product of responses:

Request ⇒ {Accept↦Accept, Deny↦Deny, Accept↦Deny, Deny↦Accept}

That is: some requests that used to be accepted still are; some that were denied still are; but some that were accepted are now deined, and some that were denied are now accepted. (This assumes the domains are exactly the same; otherwise some requests that previously produced a decision no longer do, and vice versa. We’ll assume you can work out the details of domains with bottom values.) The difference is of course the requests whose outcomes change: in this case,

Request ⇒ {Accept↦Deny, Deny↦Accept}

Using the Difference

Depending on how we compute the difference, we can treat the difference essentially as a database. That is, it is a set of pairs of request and change-of-response. The database perspective is very productive, because we can do many things with this database:

  • Queries: What is the set of requests whose decisions go from Deny↦Accept? These are places where we might look for data leaks.
  • Views: What are all the requests whose decisions go from Accept↦Deny? These are all the requests that lost access. We might then want to perform queries over this smaller database: e.g., who are the entities whose requests fall in it?

And perhaps most surprisingly:

  • Verification: Confirm that as a result of this change, certain entities did not gain access.

In our experience, administrators who would not be able to describe properties of their system can define properties of their changes. Indeed, classical verification even has a name for some such properties: they’re called “frame conditions”, and mean, in effect, “and nothing else changed”. Here, we can actually check for these. It’s worth noting that these properties are not true of either the old or new systems! For instance, certain individuals may have had privileges before and will have them after the alteration; all we’re checking is that their privileges did not change.

Uses

Having a general “semantic difference engine”, and being able to query it (interactively), is very powerful. We can perform all the operations we have described above. We can use it to check the consequences of an intended edit. In some rare cases, we expect the difference to be empty: e.g., when we refactor the policy to clean it up syntactically, but expect that the refactoring had no semantic impact. Finally, a semantic differencing engine is also useful as an oracle when performing mutation testing, as Martin and Xie demonstrated.

A Cognitive Perspective

We think there are a few different, useful framings of differential analysis.

One might sound like a truism: we’re not very good at thinking about the things that we didn’t think about. That is, when we make a change to the system, there was some intent behind the change; but it can be very difficult to determine all the consequences of that change. Our focus on the intended change can easily blind us thinking through the consequences. Overcoming these blind spots is very difficult for humans. A semantic differential analysis engine lays them bare.

Another is that we lack good tools to figure out what the properties of a system should be. Model-exploration tools (such as Alloy, or our derivative of it, Forge) are useful at prompting people to think about how they expect systems to behave and not behave. Differential output can also be such a spur: in articulating why something should or should not happen with a change, we learn more about the system itself.

Finally, it’s worth distinguishing the different conditions that lead to system changes. When working on features, we can often do so with some degree of flexibility. But when fixing bugs, we’re often in a hurry: we need to make a change to immediately block, say, a data leakage. If we’re being principled, we might add some tests to check for the intended behavior (and perhaps also to avoid regression); but at that moment, we are likely to be in an especially poor shape to think through unintended consequences. Differential analysis serves as an aid in preventing fixing one problem introducing another.

Readings

Here are some of our papers describing differential analysis (which we have also called “change-impact analysis” in the past):

Tierless Programming for SDNs: Differential Analysis

Tags: Differential Analysis, Flowlog, Programming Languages, Software-Defined Networking, Verification

Posted on 02 June 2015.

This post is part of our series about tierless network programming with Flowlog:
Part 1: Tierless Programming
Part 2: Interfacing with External Events
Part 3: Optimality
Part 4: Verification
Part 5: Differential Analysis


Verification is a powerful way to make sure a program meets expectations, but what if those expectations aren't written down, or the user lacks the expertise to write formal properties? Flowlog supports a powerful form of property-free analysis: program differencing.

When we make a program change, usually we're starting from a version that "works". We'd like to transfer what confidence we had in the original version to the new version, plus confirm our intuition about the changes. In other words, even if the original program had bugs, we'd like to at least confirm that the edit doesn't introduce any new ones.

Of course, taking the syntactic difference of two programs is easy — just use diff! — but usually that's not good enough. What we want is the behavioral, or semantic difference. Flowlog provides semantic differencing via Alloy, similarly to how it does property checking. We call Flowlog's differencing engine Chimp (short for Change-impact).

Differences in Output and State Transitions

Chimp translates both the old (prog1) and new (prog2) versions to Alloy, then supports asking questions like: Will the two versions ever handle packets differently? More generally, we can ask Chimp whether the program's output behavior ever differs: does there exist some program state and input event such that, in that state, the two programs will disagree on output?

pred changePolicyOut[st: State, ev: Event] {
  some out: Event |
    prog1/outpolicy[st,ev,out] && not prog2/outpolicy[st,ev,out] ||
    prog2/outpolicy[st,ev,out] && not prog1/outpolicy[st,ev,out]
}
Any time one program issues an output event that the other doesn't, Chimp displays an Alloy scenario.

We might also ask: When can the programs change state differently? Similarly to changePolicyOut above, Chimp defines changeStateTransition[st: State, ev: Event] as matching any of the following for each table T in the program:

some x0, ..., xn: univ | 
  prog1/afterEvent_T[prestate, ev, x0, ..., xn] && 
    not prog2/afterEvent_T[prestate, ev, x0, ..., xn] ||
  prog2/afterEvent_T[prestate, ev, x0, ..., xn] && 
    not prog1/afterEvent_T[prestate, ev, x0, ..., xn]

Recall that afterEvent_T keeps track of when each tuple is in the table T after an event is processed.

Refining Differential Analysis

The two predicates above are both built into Chimp. Using them as a starting point, users can ask pointed questions about the effects of the change. For instance, will any TCP packets be handled differently? Just search for a pre-state and a TCP packet that the programs disagree on:

some prestate: State, p: EVtcp_acket |
  changePolicyOut[prestate, p]
This lets users explore the consequences of their change without any formal guidance except their intuition about what the change should do.

Reachability

So far, these queries show scenarios where the programs differ, taking into consideration all potential inputs and starting states; this includes potentially unreachable starting states. We could, for instance, have two programs that behave differently if a table is populated (resulting in a non-empty semantic diff!) yet never actually insert rows into that table. Chimp provides optional reachability-checking to counter this, although users must cap the length of system traces being searched.

Schema Clashes

Suppose that we want to modify the original source-tracking example to keep track of flows by source and destination, rather than just source addresses. Now instead of one column:

TABLE seen(macaddr);
the seen table has two columns:
TABLE seen(macaddr, macaddr);

This poses a challenge for Chimp; what shape should the seen table be? If Chimp finds a scenario, should it show a seen table with one or two columns? We call this situation a schema clash, and Chimp addresses it by creating two separate tables in the prestate: one with one column (used by the first program) and another with two columns (used by the second program).

Doing this causes a new issue: Chimp searches for arbitrary states that satisfy the change-impact predicates. Since there is no constraint between the values of the two tables, Chimp might return a scenario where (say) the first seen table is empty, but the second contains tuples!

This doesn't match our intuition for the change: we expect that for every source in the first table, there is a source-destination pair in the second table, and vice versa. We can add this constraint to Chimp and filter the scenarios it shows us, but first, we should ask whether that constraint actually reflects the behavior of the two programs.

Differential Properties

Since it's based on Flowlog's verification framework, Chimp allows us to check properties stated over multiple programs. Our expecation above, stated in Alloy for an arbitrary state s, is:

all src: Macaddr | 
  src in s.seen1 
  iff 
  some dst: Macaddr | src->dst in s.seen2

Let's check that this condition holds for all reachable states. We'll proceed inductively. The condition holds trivially at the starting (empty) state; so we only need to show that it is preserved as the program transitions. We search for a counterexample:

some prestate: State, ev: Event | {
  // prestate satisfies the condition
  all src: Macaddr | src in prestate.seen_1 iff 
    some dst: Macaddr | src->dst in prestate.seen_2
	
  // poststate does not
  some src: Macaddr | 
    (prog1/afterEvent_seen_1[prestate,ev,src] and 
     all dst: Macaddr | not prog2/afterEvent_seen_2[prestate,ev,src,dst])
    ||
    (not prog1/afterEvent_seen_1[prestate,ev,src] and 
     some dst: Macaddr | prog2/afterEvent_seen_2[prestate,ev,src,dst])
}

Chimp finds no counterexample. Unfortunately, Chimp can't guarantee that this isn't a false negative; the query falls outside the class where Chimp can guarantee a complete search. Nevertheless, the lack of counterexample serves to increase our confidence that the change respects our intent.

After adding the constraint that, for every source in the first table, there is a source-destination pair in the second table, Chimp shows us that the new program will change the state (to add a new destination) even if the source is already in seen.

Further Reading

Chimp supports more features than discussed here; for instance, Chimp can compare the behavior of any number of program versions, but a two-program comparison is the most common. You can read more about Chimp in our paper.