Tierless Programming for SDNs: Verification

Tags: Flowlog, Programming Languages, Software-Defined Networking, Verification

Posted on 17 April 2015.

This post is part of our series about tierless network programming with Flowlog:
Part 1: Tierless Programming
Part 2: Interfacing with External Events
Part 3: Optimality
Part 4: Verification
Part 5: Differential Analysis

The last post said what it means for Flowlog's compiler to be optimal, which prevents certain bugs from ever occurring. But what about the program itself? Flowlog has built-in features to help verify program correctness, independent of how the network is set up.

To see Flowlog's program analysis in action, let's first expand our watchlist program a bit more. Before, we just flooded packets for demo purposes:

DO forward(new) WHERE
    new.locPt != p.locPt;
Now we'll do something a bit smarter. We'll make the program learn which ports lead to which hosts, and use that knowledge to avoid flooding when possible (this is often called a "learning switch"):
TABLE learned(switchid, portid, macaddr);
ON packet(pkt):
  INSERT (pkt.locSw, pkt.locPt, pkt.dlSrc) INTO learned;

  DO forward(new) WHERE
    learned(pkt.locSw, new.locPt, pkt.dlDst);
    (NOT learned(pkt.locSw, ANY, pkt.dlDst) AND
     pkt.locPt != new.locPt);
The learned table stores knowledge about where addresses have been seen on the network. If a packet arrives with a destination the switch has seen before as a source, there's no need to flood! While this program is still fairly naive (it will fail if the network has cycles in it) it's complex enough to have a few interesting properties we'd like to check. For instance, if the learned table ever holds multiple ports for the same switch and address, the program will end up sending multiple copies of the same packet. But can the program ever end up in such a state? Since the initial, startup state is empty, this amounts to checking:
"Can the program ever transition from a valid state (i.e., one where every switch and address has at most one port in learned) into an invalid one?"

Verifying Flowlog

Each Flowlog rule defines part of an event-handling function saying how the system should react to each packet seen. Rules compile to logical implications that Flowlog's runtime interprets whenever a packet arrives.

Alloy is a tool for analyzing relational logic specifications. Since Flowlog rules compile to logic, it's easy to describe in Alloy how Flowlog programs behave. In fact, Flowlog can automatically generate Alloy specifications that describe when and how the program takes actions or changes its state.

For example, omitting some Alloy-language foibles for clarity, here's how Flowlog describes our program's forwarding behavior in Alloy.

pred forward[st: State, p: EVpacket, new: EVpacket] {
  // Case 1: known destination
  (p.locSw->new.locPt->p.dlDst) in learned and
   (p.locSw->new.locPt) in switchHasPort and ...)
  // Case 2: unknown destination
  (all apt: Portid | (p.locSw->apt->p.dlDst) not in learned and
   new.locPt != p.locPt and 
   (p.locSw->new.locPt) in switchHasPort and ...)
An Alloy predicate is either true or false for a given input. This one says whether, in a given state st, an arbitrary packet p will be forwarded as a new packet new (containing the output port and any header modifications). It combines both forwarding rules together to construct a logical definition of forwarding behavior, rather than just a one-way implication (as in the case of individual rules).

The automatically generated specification also contains other predicates that say how and when the controller's state will change. For instance, afterEvent_learned, which says when a given entry will be present in learned after the controller processes a packet. An afterEvent predicate is automatically defined for every state table in the program.

Using afterEvent_Learned, we can verify our goal: that whenever an event ev arrives, the program will never add a second entry (sw, pt2,addr) to learned:

assert FunctionalLearned {
  all pre: State, ev: Event |
    all sw: Switchid, addr: Macaddr, pt1, pt2: Portid |
      (not (sw->pt1->addr in pre.learned) or 
       not (sw->pt2->addr in pre.learned)) and
      afterEvent_learned[pre, ev, sw, pt1, addr] and
      afterEvent_learned[pre, ev, sw, pt2, addr] implies pt1 = pt2

Alloy finds a counterexample scenario (in under a second):

The scenario shows an arbitrary packet (L/EVpacket; the L/ prefix can be ignored) arriving at port 1 (L/Portid1) on an arbitrary switch (L/Switchid). The packet has the same source and destination MAC address (L/Macaddr). Before the packet arrived, the controller state had a single row in its learned table; it had previously learned that L/Macaddr can be reached out port 0 (L/Portid1). Since the packet is from the same address, but a different port, it will cause the controller to add a second row to its learned table, violating our property.

This situation isn't unusual if hosts are mobile, like laptops on a campus network are. To fix this issue, we add a rule that removes obsolete mappings from the table:

DELETE (pkt.locSw, pt, pkt.dlSrc) FROM learned WHERE
  not pt = pkt.locPt;
Alloy confirms that the property holds of the modified program. We now know that any reachable state of our program is valid.

Verification Completeness

Alloy does bounded verification: along with properties to check, we provide concrete bounds for each datatype. We might say to check up to to 3 switches, 4 IP addresses, and so on. So although Alloy never returns a false positive, it can in general produce false negatives, because it searches for counterexamples only up to the given bounds. Fortunately, for many useful properties, we can compute and assert a sufficient bound. In the property we checked above, a counterexample needs only 1 State (to represent the program's pre-state) and 1 Event (the arriving packet), plus room for its contents (2 Macaddrs for source and destination, etc.), along with 1 Switchid, 2 Portids and 1 Macaddr to cover the possibly-conflicted entries in the state. So when Alloy says that the new program satisfies our property, we can trust it.

Benefits of Tierlessness

Suppose we enhanced the POX version of this program (Part 1) to learn ports in the same way, and then wanted to check the same property. Since the POX program explicitly manages flow-table rules, and the property involves a mixture of packet-handling (what is sent up to the controller?) and controller logic (how is the state updated?), checking the POX program would mean accounting for those rules and how the controller updates them over time. This isn't necessary for the Flowlog version, because rule updates are all handled optimally by the runtime. This means that property checking is simpler: there's no multi-tiered model of rule updates, just a model of the program's behavior.

You can read more about Flowlog's analysis support in our paper.

In the next post, we'll finish up this sequence on Flowlog by reasoning about behavioral differences between multiple versions of the same program.