Objects in Scripting Languages
Tags: JavaScript, Programming Languages, Semantics
Posted on 28 February 2012.We've been studying scripting languages in some detail, and have collected a number features of their object systems that we find unusually expressive. This expressiveness can be quite powerful, but also challenges attempts to reason about and understand programs that use these features. This post outlines some of these exceptionally expressive features for those who may not be intimately familiar with them.
Dictionaries with Inheritance
Untyped scripting languages implement objects as dictionaries mapping member names (strings) to values. Inheritance affects member lookup, but does not affect updates and deletion. This won't suprise any experienced JavaScript programmer:
var parent = {"z": 9};
// Using __proto__ sets up inheritance directly in most browsers
var obj = { "x": 1, "__proto__": parent};
obj.x // evaluates to 1
obj.z // evaluates to 9
obj.z = 50 // creates new field on obj
obj.z // evaluates to 50, z on parent is "overridden"
parent.z // evaluates to 9; parent.z was unaffected by obj.z = 50
In other scripting languages, setting up this inheritance can't be done quite so directly. Still, its effect can be accomplished, and the similar object structure observed. For example, in Python:
class parent(object):
z = 9 # class member
def __init__(self):
self.x = 1 # instance member
obj = parent()
obj.x # evaluates to 1
obj.z # evaluates to 9
obj.z = 50 # creates new field on obj
obj.z # evaluates to 50, z on parent is "overridden"
parent.z # evaluates to 9, just like JavaScript
We can delete the field in both languages, which returns
obj
to its original state, before it was extended with a
z
member. In JavaScript:
delete obj.z;
obj.z // evaluates to 9 again
This also works in Python:
delattr(obj, "z");
obj.z # evaluates to 9 again
In both languages, we could have performed the assignments and lookups with computed strings as well:
// JavaScript
obj["x " + "yz"] = 99 // creates a new field, "x yz"
obj["x y" + "z"] // evaluates to 99
# Python
setattr(obj, "x " + "yz", 99) # creates a new field, "x yz"
getattr(obj, "x y" + "z") # evaluates to 99
We can go through this entire progression in Ruby, as well:
class Parent; def z; return 9; end; end
obj = Parent.new
class << obj; def x; return 1; end; end
obj.x # returns 1
obj.z # returns 9
class << obj; def z; return 50; end; end
obj.z # return 50
# no simple way to invoke shadowed z method
class << obj; remove_method :z; end
obj.z # returns 9
class << obj
define_method("xyz".to_sym) do; return 99; end
end
print obj.xyz # returns 99
Classes Do Not Shape Objects
The upshot is that a class definition in a scripting language says little about the structure of its instances. This is in contrast to a language like Java, in which objects' structure is completely determined by their class, to the point where memory layouts can be predetermined for runtime objects. In scripting languages, this isn't the case. An object is an instance of a 'class' in JavaScript, Python, or Ruby merely by virtue of several references to other runtime objects. Some of these be changed at runtime, others cannot, but in all cases, members can be added to and removed from the inheriting objects. This flexibility can lead to some unusual situations.
Brittle inheritance: Fluid classes make inheritance brittle. If we start with this Ruby class:
class A
def initialize; @privateFld = 90; end
def myMethod; return @privateFld * @privateFld; end
end
Then we might assume that implementation of myMethod
assumes
a numeric type for @privateFld
. This assumption can be
broken by subclasses, however:
class B < A
def initialize; super(); @privateFld = "string (not num)"; end
end
Since both A
and B
use the same name, and it
is simply a dictionary key, B
instances violate the
assumptions of A
's methods:
obj = B.new
B.myMethod # error: cannot multiply strings
Ruby's authors are well aware of this; the Ruby manual states "it is only safe to extend Ruby classes when you are familiar with (and in control of) the implementation of the superclass" (page 240).
Mutable Inheritance: JavaScript and Python expose the
inheritance chain through mutable object members. In JavaScript, we
already saw that the member "__proto__"
could be used to
implement inheritance directly. The "__proto__"
member is
mutable, so class hierarchies can be changed at runtime. We found it a
bit more surprising when we realized the same was possible in Python:
class A(object):
def method(self): return "from class A"
class B(object):
def method(self): return "from class B"
obj = A()
obj.method() # evaluates to "from class A"
isinstance(obj, A) # evaluates to True
obj.__class__ = B # the __class__ member determines inheritance
obj.method() # evaluates to "from class B"
isinstance(obj, B) # evaluates to True: obj's 'class' has changed!
Methods?
These scripting languages also have flexible, and different, definitions of "methods".
JavaScript simply does not have methods. The syntax
obj.method(...)
Binds this
to the value of obj
in the body of
method
. However, the method
member is just a
function and can be easily extracted and applied:
var f = obj.method; f(...);
Since f()
does not use the method call syntax above, it is
treated as a function call. In this case, it is a well known JavaScript
wart that this
is bound to a default "global object" rather
than obj
.
Python and Ruby make a greater effort to retain a binding for the
this
parameter. Python doesn't care about the name of the
parameter (though self
is canonically used), and simply has
special semantics for the first argument of a method. If a method is
extracted via member access, it returns a function that binds the object
from the member access to the first parameter:
class A(object):
def __init__(self_in_init): self_in_init.myField = 900
def method(self_in_method): return self_in_method.myField
obj = A()
f1 = obj.method # the access binds self_in_method to obj
f1() # evaluates to 900, using the above binding
If the same method is accessed as a field multiple times, it isn't the same function both times―a new function is created for each access:
obj = A()
f1 = obj.method # first extraction
f2 = obj.method # second extraction
f1 is f2 # evaluates to False, no reference equality
Python lets programmers access the underlying function without the first
parameter bound through the member im_func
. This is
actually the same reference across all extracted methods, regardless of
even the original object of extraction:
obj = A()
f1 = obj.method # first extraction
f2 = obj.method # second extraction
otherobj = A()
f3 = obj.method # extraction from another object
# evaluates to True, same function referenced from extractions on the
# same object
f1.im_func is f2.im_func
# evaluates to True, same function referenced from extractions on
# different objects
f2.im_func is f3.im_func
Ruby has a similar treatment of methods, their extraction, and their reapplication to new arguments.
But Why?
These features aren't just curiosities―we've found examples where they are used in practice. For example, Django's ORM builds classes dynamically, modifying them based on strings that come from modules describing database tables and relationships ( base.py):
attr_name = '%s_ptr' % base._meta.module_name
field = OneToOneField(base, name=attr_name,
auto_created=True, parent_link=True)
new_class.add_to_class(attr_name, field)
Ruby on Rails' ActiveRecord uses dynamic field names as well, iterating over fields and invoking methods only when their names match certain patterns ( base.rb):
attributes.each do |k, v|
if k.include?("(")
multi_parameter_attributes << [ k, v]
elsif respond_to?("#{k}=")
if v.is_a?(Hash)
nested_parameter_attributes << [ k, v ]
else
send("#{k}=", v)
else
raise(UnkownAttributeError, "unknown attribute: #{k}")
end
end
These applications use objects as dictionaries (with inheritance) to build up APIs that they couldn't otherwise.
These expressive features aren't
without their perils. Django has
explicit warnings that things can go awry if relationships between
tables expressed in ORM classes overlap. And the fact that
__proto__
is in the same namespace as the other members bit
Google Docs, whose
editor would crash if the string "__proto__"
was
entered. The implementation was using an object as a hashtable keyed by
strings from the document, which led to an assignment to
__proto__
that changed the behavior of the map.
So?
The languages presented here are widely adopted and used, and run critical systems. Yet, they contain features that defy conventional formal reasoning, at the very least in their object systems. Perhaps these features' expressiveness outweighs the cognitive load of using them. If it doesn't, and using these features is too difficult or error-prone, we should build tools to help us use them, or find better ways to implement the same functionality. And if not, we should take notice and recall that we have these powerful techniques at our disposal in the next object system we design.