Objects in Scripting Languages

Tags: JavaScript, Programming Languages, Semantics

Posted on 28 February 2012.

We've been studying scripting languages in some detail, and have collected a number features of their object systems that we find unusually expressive. This expressiveness can be quite powerful, but also challenges attempts to reason about and understand programs that use these features. This post outlines some of these exceptionally expressive features for those who may not be intimately familiar with them.

Dictionaries with Inheritance

Untyped scripting languages implement objects as dictionaries mapping member names (strings) to values. Inheritance affects member lookup, but does not affect updates and deletion. This won't suprise any experienced JavaScript programmer:

var parent = {"z": 9};
// Using __proto__ sets up inheritance directly in most browsers
var obj = { "x": 1, "__proto__": parent};

obj.x       // evaluates to 1
obj.z       // evaluates to 9
obj.z = 50  // creates new field on obj
obj.z       // evaluates to 50, z on parent is "overridden"
parent.z    // evaluates to 9; parent.z was unaffected by obj.z = 50

In other scripting languages, setting up this inheritance can't be done quite so directly. Still, its effect can be accomplished, and the similar object structure observed. For example, in Python:

class parent(object):
  z = 9                 # class member
  def __init__(self):
    self.x = 1          # instance member

obj = parent()

obj.x                 # evaluates to 1
obj.z                 # evaluates to 9
obj.z = 50            # creates new field on obj
obj.z                 # evaluates to 50, z on parent is "overridden"
parent.z              # evaluates to 9, just like JavaScript

We can delete the field in both languages, which returns obj to its original state, before it was extended with a z member. In JavaScript:

delete obj.z;
obj.z               // evaluates to 9 again

This also works in Python:

delattr(obj, "z");
obj.z               # evaluates to 9 again

In both languages, we could have performed the assignments and lookups with computed strings as well:

// JavaScript
obj["x " + "yz"] = 99         // creates a new field, "x yz"
obj["x y" + "z"]              // evaluates to 99

# Python
setattr(obj, "x " + "yz", 99) # creates a new field, "x yz"
getattr(obj, "x y" + "z")     # evaluates to 99

We can go through this entire progression in Ruby, as well:

class Parent; def z; return 9; end; end
obj = Parent.new
class << obj; def x; return 1; end; end

obj.x # returns 1
obj.z # returns 9
class << obj; def z; return 50; end; end
obj.z # return 50

# no simple way to invoke shadowed z method
class << obj; remove_method :z; end
obj.z # returns 9

class << obj
  define_method("xyz".to_sym) do; return 99; end
end
print obj.xyz # returns 99

Classes Do Not Shape Objects

The upshot is that a class definition in a scripting language says little about the structure of its instances. This is in contrast to a language like Java, in which objects' structure is completely determined by their class, to the point where memory layouts can be predetermined for runtime objects. In scripting languages, this isn't the case. An object is an instance of a 'class' in JavaScript, Python, or Ruby merely by virtue of several references to other runtime objects. Some of these be changed at runtime, others cannot, but in all cases, members can be added to and removed from the inheriting objects. This flexibility can lead to some unusual situations.

Brittle inheritance: Fluid classes make inheritance brittle. If we start with this Ruby class:

class A
  def initialize; @privateFld = 90; end

  def myMethod; return @privateFld * @privateFld; end
end

Then we might assume that implementation of myMethod assumes a numeric type for @privateFld. This assumption can be broken by subclasses, however:

class B < A
  def initialize; super(); @privateFld = "string (not num)"; end
end

Since both A and B use the same name, and it is simply a dictionary key, B instances violate the assumptions of A's methods:

obj = B.new
B.myMethod   # error: cannot multiply strings

Ruby's authors are well aware of this; the Ruby manual states "it is only safe to extend Ruby classes when you are familiar with (and in control of) the implementation of the superclass" (page 240).

Mutable Inheritance: JavaScript and Python expose the inheritance chain through mutable object members. In JavaScript, we already saw that the member "__proto__" could be used to implement inheritance directly. The "__proto__" member is mutable, so class hierarchies can be changed at runtime. We found it a bit more surprising when we realized the same was possible in Python:

class A(object):
  def method(self): return "from class A"

class B(object):
  def method(self): return "from class B"

obj = A()
obj.method()       # evaluates to "from class A"
isinstance(obj, A) # evaluates to True

obj.__class__ = B  # the __class__ member determines inheritance
obj.method()       # evaluates to "from class B"
isinstance(obj, B) # evaluates to True: obj's 'class' has changed!

Methods?

These scripting languages also have flexible, and different, definitions of "methods".

JavaScript simply does not have methods. The syntax

obj.method(...)

Binds this to the value of obj in the body of method. However, the method member is just a function and can be easily extracted and applied:

var f = obj.method; f(...);

Since f() does not use the method call syntax above, it is treated as a function call. In this case, it is a well known JavaScript wart that this is bound to a default "global object" rather than obj.

Python and Ruby make a greater effort to retain a binding for the this parameter. Python doesn't care about the name of the parameter (though self is canonically used), and simply has special semantics for the first argument of a method. If a method is extracted via member access, it returns a function that binds the object from the member access to the first parameter:

class A(object):
  def __init__(self_in_init): self_in_init.myField = 900
  def method(self_in_method): return self_in_method.myField

obj = A()
f1 = obj.method  # the access binds self_in_method to obj
f1()             # evaluates to 900, using the above binding

If the same method is accessed as a field multiple times, it isn't the same function both times―a new function is created for each access:

obj = A()
f1 = obj.method  # first extraction
f2 = obj.method  # second extraction

f1 is f2         # evaluates to False, no reference equality

Python lets programmers access the underlying function without the first parameter bound through the member im_func. This is actually the same reference across all extracted methods, regardless of even the original object of extraction:

obj = A()
f1 = obj.method  # first extraction
f2 = obj.method  # second extraction

otherobj = A()
f3 = obj.method  # extraction from another object

# evaluates to True, same function referenced from extractions on the
# same object
f1.im_func is f2.im_func

# evaluates to True, same function referenced from extractions on
# different objects
f2.im_func is f3.im_func

Ruby has a similar treatment of methods, their extraction, and their reapplication to new arguments.

But Why?

These features aren't just curiosities―we've found examples where they are used in practice. For example, Django's ORM builds classes dynamically, modifying them based on strings that come from modules describing database tables and relationships ( base.py):

attr_name = '%s_ptr' % base._meta.module_name
field = OneToOneField(base, name=attr_name,
        auto_created=True, parent_link=True)
new_class.add_to_class(attr_name, field)

Ruby on Rails' ActiveRecord uses dynamic field names as well, iterating over fields and invoking methods only when their names match certain patterns ( base.rb):

attributes.each do |k, v|
  if k.include?("(")
    multi_parameter_attributes << [ k, v]
  elsif respond_to?("#{k}=")
    if v.is_a?(Hash)
      nested_parameter_attributes << [ k, v ]
    else
    send("#{k}=", v)
  else
    raise(UnkownAttributeError, "unknown attribute: #{k}")
  end
end

These applications use objects as dictionaries (with inheritance) to build up APIs that they couldn't otherwise.

These expressive features aren't without their perils. Django has explicit warnings that things can go awry if relationships between tables expressed in ORM classes overlap. And the fact that __proto__ is in the same namespace as the other members bit Google Docs, whose editor would crash if the string "__proto__" was entered. The implementation was using an object as a hashtable keyed by strings from the document, which led to an assignment to __proto__ that changed the behavior of the map.

So?

The languages presented here are widely adopted and used, and run critical systems. Yet, they contain features that defy conventional formal reasoning, at the very least in their object systems. Perhaps these features' expressiveness outweighs the cognitive load of using them. If it doesn't, and using these features is too difficult or error-prone, we should build tools to help us use them, or find better ways to implement the same functionality. And if not, we should take notice and recall that we have these powerful techniques at our disposal in the next object system we design.

The Brown PLT Blog