(Sub)Typing First Class Field Names

Tags: Programming Languages, Semantics, Types

Posted on 10 December 2012.

This post picks up where a previous post left off, and jumps back into the action with subtyping.

Updating Subtyping

There are two traditional rules for subtyping records, historically known as width and depth subtyping. Width subtyping allows generalization by dropping fields; one record is a supertype of another if it contains a subset of the sub-record's fields at the same types. Depth subtyping allows generalization within a field; one record is a supertype of another if it is identical aside from generalizing one field to a supertype of the type in the same field in the sub-record.

We would like to understand both of these kinds of subtyping in the context of our first-class field names. With traditional record types, fields are either mentioned in the record or not. Thus, for each possible field in both types, there are four combinations to consider. We can describe width and depth subtyping in a table:

T1 T2 T1 <: T2 if...
f: S f: T S <: T
f: - f: T Never
f: S f: - Always
f: - f: - Always

We read f: S as saying that T1 has the field f with type S, and we read f: - as saying that the corresponding type doesn't mention the field f. The first row of the table corresponds to depth subtyping, where the field f is still present, but at a more general type in T2. The second row is a failure to subtype, when T2 has a field that isn't mentioned at all in T1. The third row corresponds to width subtyping, where a field is dropped and not mentioned in the supertype. The last row is a trivial case, where neither type mentions the field.

For records with string patterns, we can extend this table with new combinations to account for ○ and ↓ annotations. The old rows remain, and become the ↓ cases, and new rows are added for ○ annotations:

T1 T2 T1 <: T2 if...
f: S f: T S <: T
f: S f: T Never
f: - f: T Never
f: S f: T S <: T
f: S f: T S <: T
f: - f: T Never
f: S f: - Always
f: S f: - Always
f: - f: - Always

Here, we see that it is safe to treat a definitely present field as a possibly-present one, in the case where we compare f:S to f:T). The dual of this case, treating a possibly-present field as definitely-present, is unsafe, as the comparison of f:S to f:T shows. Possibly present annotations do not allow us to invent fields, as having f: - on the left-hand-side is still only permissible if the right-hand-side also doesn't mention f.

Giving Types to Values

In order to ascribe these rich types to object values, we need rules for typing basic objects, and then we need to apply these subtyping rules to generalize them. As a working example, one place where objects with field patterns come up every day in practice is in JavaScript arrays. Arrays in JavaScript hold their elements in fields named by stringified numbers. Thus, a simplified type for a JavaScript array of booleans is roughly:

BoolArrayFull: { [0-9]+: Bool }

That is, each field made up of a sequence of digits is possibly present, and if it is there, it has a boolean value. For simplicity, let's consider a slightly neutered version of this type, where only single digit fields are allowed:

BoolArray: { [0-9]: Bool }

Let's think about how we'd go about typing a value that should clearly have this type: the array [true, false]. We can think of this array literal as desugaring into an object like (indeed, this is what λJS does):

{"0": true, "1": false}

We would like to be able to state that this object is a member of the BoolArray type above. The traditional rule for record typing would ascribe a type mapping the names that are present to the types of their right hand side. Since the fields are certainly present, in our notation we can write:

{"0": Bool, "1": Bool}

This type should certainly be generalizable to BoolArray. That is, it should hold (using the rules in the table above) that:

{"0": Bool, "1": Bool} <: { [0-9]: Bool }

Let's see what happens when we instantiate the table for these two types:

T1 T2 T1 <: T2 if...
0: Bool 0: Bool Bool <: Bool
1: Bool 1: Bool Bool <: Bool
3: - 3: Bool Fail!
4: - 4: Bool Fail!
... ... ...

(We cut off the table for 5-9, which are the same as the cases for 3 and 4). Our subtyping fails to hold for these types, which don't let us reflect the fact that the fields 3 and 4 are actually absent, and we should be allowed to consider them as possibly present at the boolean type. In fact, our straightforward rule for typing records is in fact responsible for throwing away this information! The type that it ascribed,

{"0": Bool, "1": Bool}

is actually the type of many objects, including those that happen to have fields like "banana" : 42. Traditional record typing drops fields when it doesn't care if they are present or absent, which loses information about definitive absence.

We extend our type language once more to keep track of this information. We add an explicit piece of a record type that tracks a description of the fields that are definitely absent on an object, and use this for typing object literals:

p = ○ | ↓
T = ... | { Lp : T ..., LA: abs }

Thus, the new type for ["0": true, "1": false] would be:

{"0": Bool, "1": Bool, ("0"|"1"): abs}

Here, the overbar denotes regular-expression complement, and this type is expressing that all fields other than "0" and "1" are definitely absent.

Adding another type of field annotation requires that we again extend our table of subtyping options, so we now have a complete description with 16 cases:

T1 T2 T1 <: T2 if...
f: S f: T S <: T
f: S f: T Never
f: abs f: T Never
f: - f: T Never
f: S f: T S <: T
f: S f: T S <: T
f: abs f: T Always
f: - f: T Never
f: S f: abs Never
f: S f: abs Never
f: abs f: abs Always
f: - f: abs Never
f: S f: - Always
f: S f: - Always
f: abs f: - Always
f: - f: - Always

We see that absent fields cannot be generalized to be definitely present (the abs to f case), but they can be generalized to be possibly present at any type. This is expressed in the case that compares f : abs to f: T, which always holds for any T. To see these rules in action, we can instantiate them for the array example we've been working with to ask a new question:

{"0": Bool, "1": Bool, ("0"|"1"): abs} <: { [0-9]: Bool }

And the table:

T1 T2 T1 <: T2 if...
0: Bool 0: Bool Bool <: Bool
1: Bool 1: Bool Bool <: Bool
3: abs 3: Bool OK!
4: abs 4: Bool OK!
... ... ...
9: abs 9: Bool OK!
foo: abs foo: - OK!
bar: abs bar: - OK!
... ... ...

There's two things that make this possible. First, it is sound to generalize the absent fields that are possibly present on the array type, because the larger type doesn't guarantee their presence either. Second, it is sound to generalize absent fields that aren't mentioned on the array type, because unmentioned fields can be present or absent with any type. The combination of these two features of our subtyping relation lets us generalize from particular array instances to the more general type for arrays.

Capturing the Whole Table

The tables above present subtyping on a field-by-field basis, and the patterns we considered at first were finite. In the last case, however, the pattern of “fields other than 0 and 1” was in fact infinite, and we cannot actually construct that infinite table to describe subtyping. The writeup and its associated proof document lay out an algorithmic version of the rules presented in the tables above, and also provides a proof of their soundness.

The writeup also discusses another interesting problem, which is the interaction between these pattern types and inheritance, where patterns on the child and parent objects may overlap in subtle ways. It goes further and discusses what happens in cases like JavaScript, where the field "__proto__" is an accessible member that has inheritance semantics. Check it all out here!