Typing First Class Field Names

Tags: Programming Languages, Semantics, Types

Posted on 03 December 2012.

In a previous post, we discussed some of the powerful features of objects in scripting languages. One feature that stood out was the use of first-class strings as member names for objects. That is, in programs like

var o = {name: "Bob", age: 22};
function lookup(f) {
  return o[f];
}
lookup("name");
lookup("age");

the name position in field lookup has been abstracted over. Presumably only a finite set of names actually works with the lookup (o appears to only have two fields, after all).

It turns out that so-called “scripting” languages aren't the only ones that compute fields for lookup. For example, even within the constraints of Java's type system, the Bean framework computes method names to call at runtime. Developers can provide information about the names of fields and methods on a Bean with a BeanInfo instance, but even if they don't provide complete information, “the rest will be obtained by automatic analysis using low-level reflection of the bean classes’ methods and applying standard design patterns.” These “standard design patterns” include, for example, concatenating the strings "get" and "set" onto field names to produce method names to invoke at runtime.

Traditional type systems for objects and records have little to say about these computed accesses. In this post, we're going to build up a description of object types that can describe these values, and explore their use. The ideas in this post are developed more fully in a writeup for the FOOL Workshop.

First-class Singleton Strings

In the JavaScript example above, we said that it's likely that the only intended lookup targets―and thus the only intended arguments to lookup―are "name" and "age". Giving a meaningful type to this function is easy if we allow singleton strings as types in their own right. That is, if our type language is:

T = s | Str | Num
  | { s : T ... } | T → T | T ∩ T

Where s stands for any singleton string, Str and Num are base types for strings and numbers, respectively, record types are a map from singleton strings s to types, arrow types are traditional pairs of types, and intersections are allowed to express a conjunction of types on the same value.

Given these definitions, we could write the type of lookup as:

lookup : ("name" → Str) ∩ ("age" → Num)

That is, if lookup is provided the string "name", it produces a string, and if it is provided the string "age", it produces a number.

In order to type-check the body of lookup, we need a type for o as well. That can be represented with the type { "name" : Str, "age" : Num }. Finally, to type-check the object lookup expression o[f], we need to compare the singleton string type of f with the fields of o. In this case, only the two strings that are already present on o are possible choices for f, so the comparison is easy and type-checking works out.

For a first cut, all we did was make the string labels on objects' fields a first-class entity in our type system, with singleton string types s. But what can we say about the Bean example, where get* and set* method invocations are computed rather than just used as first-class values?

String Pattern Types

In order to express the type of objects like Beans, we need to express field name patterns, rather than just singleton field names. For example, we might say that a Bean with Int-typed parameters has a type like:

IntBean = { ("get".+)  : → Int },
            ("set".+)  : Int → Void,
            "toString" : → Str }

Here, we are using .+ as regular expression notation for any non-empty string. We read the type above as saying that all fields that begin with get and end with any string are functions that return Int values. The same is true for "set" methods. The singleton string "toString" is also a field, and is simply a function that returns strings.

To express this type, we need to extend our type language to handle these string patterns, which we write down as regular expressions (the write-up outlines the actual limits on what kinds of patterns we can support). We extend our type language to include patterns as types, and as field names:

L = regular expressions
T = L | Str | Num
  | { L : T ... } | T → T | T ∩ T

This new specification gives us the ability to write down types like IntBean, which have field patterns that describe infinite sets of fields. Let's stop and think about what that means as a description of a runtime object. Our type for o above, { "name" : Str, "age" : Num }, says that values bound to o at runtime certainly have name and age fields at the listed types. The type for IntBean, on the other hand, seems to assert that these objects will have the fields getUp, getDown, getSerious, and infinitely more. But a runtime object can't actually have all of those fields, so a pattern indicating an infinite number of field names is describing a fundamentally different kind of value.

What an object type with an infinite pattern represents is that all the fields that match the pattern are potentially present. That is, at runtime, they may or may not be there, but if they are there, they must have the annotated type. We extend object types again to make this explicit with presence annotations, which explicitly list fields as definitely present, written ↓, or possibly present, written ○:

p = ○ | ↓
T = ... | { Lp : T ... }

In this notation, we would write:

IntBean = { ("get".+)  : → Int },
            ("set".+)  : Int → Void,
            "toString" : → Str }

Which indicates that all the fields in ("get".+) and ("set".+) are possibly present with the given arrow types, and toString is definitely present.

Subtyping

Now that we have these rich object types, it's natural to ask what kinds of subtyping relationships they have with one another. A detailed account of subtyping will come soon; in the meantime, can you guess what subtyping might look like for these types?

Update: The answer is in the next post.