Objects in Scripting Languages
Tags: JavaScript, Programming Languages, Semantics
Posted on 28 February 2012.We've been studying scripting languages in some detail, and have collected a number features of their object systems that we find unusually expressive. This expressiveness can be quite powerful, but also challenges attempts to reason about and understand programs that use these features. This post outlines some of these exceptionally expressive features for those who may not be intimately familiar with them.
Dictionaries with Inheritance
Untyped scripting languages implement objects as dictionaries mapping member names (strings) to values. Inheritance affects member lookup, but does not affect updates and deletion. This won't suprise any experienced JavaScript programmer:
In other scripting languages, setting up this inheritance can't be done quite so directly. Still, its effect can be accomplished, and the similar object structure observed. For example, in Python:
We can delete the field in both languages, which returns
obj
to its original state, before it was extended with a
z
member. In JavaScript:
This also works in Python:
In both languages, we could have performed the assignments and lookups with computed strings as well:
We can go through this entire progression in Ruby, as well:
Classes Do Not Shape Objects
The upshot is that a class definition in a scripting language says little about the structure of its instances. This is in contrast to a language like Java, in which objects' structure is completely determined by their class, to the point where memory layouts can be predetermined for runtime objects. In scripting languages, this isn't the case. An object is an instance of a 'class' in JavaScript, Python, or Ruby merely by virtue of several references to other runtime objects. Some of these be changed at runtime, others cannot, but in all cases, members can be added to and removed from the inheriting objects. This flexibility can lead to some unusual situations.
Brittle inheritance: Fluid classes make inheritance brittle. If we start with this Ruby class:
Then we might assume that implementation of myMethod
assumes
a numeric type for @privateFld
. This assumption can be
broken by subclasses, however:
Since both A
and B
use the same name, and it
is simply a dictionary key, B
instances violate the
assumptions of A
's methods:
Ruby's authors are well aware of this; the Ruby manual states "it is only safe to extend Ruby classes when you are familiar with (and in control of) the implementation of the superclass" (page 240).
Mutable Inheritance: JavaScript and Python expose the
inheritance chain through mutable object members. In JavaScript, we
already saw that the member "__proto__"
could be used to
implement inheritance directly. The "__proto__"
member is
mutable, so class hierarchies can be changed at runtime. We found it a
bit more surprising when we realized the same was possible in Python:
Methods?
These scripting languages also have flexible, and different, definitions of "methods".
JavaScript simply does not have methods. The syntax
Binds this
to the value of obj
in the body of
method
. However, the method
member is just a
function and can be easily extracted and applied:
Since f()
does not use the method call syntax above, it is
treated as a function call. In this case, it is a well known JavaScript
wart that this
is bound to a default "global object" rather
than obj
.
Python and Ruby make a greater effort to retain a binding for the
this
parameter. Python doesn't care about the name of the
parameter (though self
is canonically used), and simply has
special semantics for the first argument of a method. If a method is
extracted via member access, it returns a function that binds the object
from the member access to the first parameter:
If the same method is accessed as a field multiple times, it isn't the same function both times―a new function is created for each access:
Python lets programmers access the underlying function without the first
parameter bound through the member im_func
. This is
actually the same reference across all extracted methods, regardless of
even the original object of extraction:
Ruby has a similar treatment of methods, their extraction, and their reapplication to new arguments.
But Why?
These features aren't just curiosities―we've found examples where they are used in practice. For example, Django's ORM builds classes dynamically, modifying them based on strings that come from modules describing database tables and relationships ( base.py):
Ruby on Rails' ActiveRecord uses dynamic field names as well, iterating over fields and invoking methods only when their names match certain patterns ( base.rb):
These applications use objects as dictionaries (with inheritance) to build up APIs that they couldn't otherwise.
These expressive features aren't
without their perils. Django has
explicit warnings that things can go awry if relationships between
tables expressed in ORM classes overlap. And the fact that
__proto__
is in the same namespace as the other members bit
Google Docs, whose
editor would crash if the string "__proto__"
was
entered. The implementation was using an object as a hashtable keyed by
strings from the document, which led to an assignment to
__proto__
that changed the behavior of the map.
So?
The languages presented here are widely adopted and used, and run critical systems. Yet, they contain features that defy conventional formal reasoning, at the very least in their object systems. Perhaps these features' expressiveness outweighs the cognitive load of using them. If it doesn't, and using these features is too difficult or error-prone, we should build tools to help us use them, or find better ways to implement the same functionality. And if not, we should take notice and recall that we have these powerful techniques at our disposal in the next object system we design.