Typing First Class Field Names
Tags: Programming Languages, Semantics, Types
Posted on 03 December 2012.In a previous post, we discussed some of the powerful features of objects in scripting languages. One feature that stood out was the use of first-class strings as member names for objects. That is, in programs like
the name position in field lookup has been abstracted over.
Presumably only a finite set of names actually works with the lookup
(o
appears to only have two fields, after all).
It turns out that so-called “scripting” languages aren't the only
ones that compute fields for lookup. For example, even within the
constraints of Java's type system, the Bean framework computes method
names to call at runtime. Developers can provide information about the
names of fields and methods on a Bean with a BeanInfo
instance, but even if they don't provide complete information, “the rest
will be obtained by automatic analysis using low-level reflection of the
bean classes’ methods and applying standard design patterns.” These
“standard design patterns” include, for example, concatenating the
strings "get"
and "set"
onto field names to
produce method names to invoke at runtime.
Traditional type systems for objects and records have little to say about these computed accesses. In this post, we're going to build up a description of object types that can describe these values, and explore their use. The ideas in this post are developed more fully in a writeup for the FOOL Workshop.
First-class Singleton Strings
In the JavaScript example above, we said that it's likely that the only
intended lookup targets―and thus the only intended arguments to
lookup
―are "name"
and "age"
.
Giving a meaningful type to this function is easy if we allow singleton
strings as types in their own right. That is, if our type language is:
T = s | Str | Num | { s : T ... } | T → T | T ∩ T
Where s
stands for any singleton string, Str
and Num
are base types for strings and numbers,
respectively, record types are a map from singleton strings
s
to types, arrow types are traditional pairs of types, and
intersections are allowed to express a conjunction of types on the same
value.
Given these definitions, we could write the type of lookup as:
lookup : ("name" → Str) ∩ ("age" → Num)
That is, if lookup
is provided the string
"name"
, it produces a string, and if it is provided the
string "age"
, it produces a number.
In order to type-check the body of lookup
, we need a
type for o
as well. That can be represented with the type
{ "name" : Str, "age" : Num }
. Finally, to type-check the
object lookup expression o[f]
, we need to compare the
singleton string type of f
with the fields of
o
. In this case, only the two strings that are already
present on o
are possible choices for f
, so
the comparison is easy and type-checking works out.
For a first cut, all we did was make the string labels on objects'
fields a first-class entity in our type system, with singleton string
types s
. But what can we say about the Bean example, where
get*
and set*
method invocations are computed
rather than just used as first-class values?
String Pattern Types
In order to express the type of objects like Beans, we need to express
field name patterns, rather than just singleton field names.
For example, we might say that a Bean with Int
-typed
parameters has a type like:
IntBean = { ("get".+) : → Int }, ("set".+) : Int → Void, "toString" : → Str }
Here, we are using .+
as regular expression notation
for any non-empty string. We read the type above as saying that all
fields that begin with get
and end with any string are
functions that return Int
values. The same is true for
"set"
methods. The singleton string "toString"
is also a field, and is simply a function that returns strings.
To express this type, we need to extend our type language to handle these string patterns, which we write down as regular expressions (the write-up outlines the actual limits on what kinds of patterns we can support). We extend our type language to include patterns as types, and as field names:
L = regular expressions T = L | Str | Num | { L : T ... } | T → T | T ∩ T
This new specification gives us the ability to write down types like
IntBean
, which have field patterns that describe
infinite sets of fields. Let's stop and think about what that
means as a description of a runtime object. Our type for o
above, { "name" : Str, "age" : Num }
, says that values
bound to o
at runtime certainly have name
and
age
fields at the listed types. The type for
IntBean
, on the other hand, seems to assert that these
objects will have the fields getUp
, getDown
,
getSerious
, and infinitely more. But a runtime object
can't actually have all of those fields, so a pattern
indicating an infinite number of field names is describing a
fundamentally different kind of value.
What an object type with an infinite pattern represents is that all the fields that match the pattern are potentially present. That is, at runtime, they may or may not be there, but if they are there, they must have the annotated type. We extend object types again to make this explicit with presence annotations, which explicitly list fields as definitely present, written ↓, or possibly present, written ○:
p = ○ | ↓ T = ... | { Lp : T ... }
In this notation, we would write:
IntBean = { ("get".+)○ : → Int }, ("set".+)○ : Int → Void, "toString"↓ : → Str }
Which indicates that all the fields in ("get".+)
and
("set".+)
are possibly present with the given
arrow types, and toString
is definitely present.
Subtyping
Now that we have these rich object types, it's natural to ask what kinds of subtyping relationships they have with one another. A detailed account of subtyping will come soon; in the meantime, can you guess what subtyping might look like for these types?
Update: The answer is in the next post.