Oxfam America banner

Friday, August 29, 2008

Python's vars()

The builtin function vars() in Python will give you the argument's __dict__ or the namespace of the object. That is the namespace of the object not the object and the names belonging to its class - an important distinction. In short, it is simple a syntactic nicety over referencing the property __dict__ directly. For example, contrast the following:


>>> namespace = vars(o)


With:


>>> names = o.__dict__


But for introspecting objects to produce some custom serialized form (json comes to mind), this is likely not what you want to do. This post will attempt to explain why not.

Say we want to explicitly convert an object to some more primitive form (a dictionary or list) and we use vars() to introspect the attributes of the object. Maybe our serialization function will also account for things we don't want to serialize - like things we don't want to expose over the network for security or bandwidth reasons. Here's a simple example:


class Thing(object):
def __init__(self, a, b, secret=None)
self.a = a
self.b = b
self.secret = secret
self._fd = open('file.txt')

def dictifyThing(thing):
blacklisted = ['secret', '_fd']
data = dict(vars(thing))
for bl in blacklisted:
del data[bl]
return data


That will work fine for this example. And so we're happy:


>>> thing = Thing( 1, 2, "don't tell anyone" )
>>> dictifyThing(thing)
{'a': 1, 'b': 2}


A great (though not unique) feature of Python are properties which allows us to make derived read-only attributes on an object or attributes that carry side-effects upon setting. For example, we might have an extension of Thing where a is read-only and derived from the value of b:


AnotherThing(Thing):

def __init__(self, b, secret=None):
self.b = b
self._fd = open('data.txt')

@property
def a(self):
return self.b - 1


However, dictifyThing() will no longer work the way we want it to:


>>> thing = AnotherThing(2, "open sesame")
>>> dictifyThing(thing)
{ 'b': 2 }


One should expect this behavior, because a, being a derived property, isn't actually part of the namespace of an instance of AnotherThing. This conversely illustrates another advantage of properties: calculating the value of a and setting as a normal attribute is undesirable both for space efficiency reasons and the fact that we wouldn't have a definite way of constraining the numerical relationship between a and b.

Does this mean we can't write a decent version dictifyThing()? No, probably the more commonly known dir() will give us all the referenceable names of an object - which then must include class attributes. We can use this to write something nicer:


def dictifyThing(thing):
blacklisted = ['secret']
data = {}
for name in ( n for n in dir(thing) if n[0] != '_' and
n not in blacklisted ):
data[name] = getattr(thing, name)
return name


Now our encoding scheme works nicely for both Thing and AnotherThing. A small improvement would be to use a higher-level function make defining new encoders simpler:


def _encode(include=(), exclude=()):
def f(o):
data = {}
if include:
for name in include:
data[name] = getattr(o, name)
return data
for name in ( n for n in dir(thing) if n[0] != '_' and
n not in exclude ):
data[name] = getattr(thing, name)
return data
return f

encodeThing = _encode(exclude=('secret'))
encodeFoo = _encode(include=('a','b','c'))

1 comments:

Juho Vepsäläinen said...

Thanks for an excellent post! I found it useful for my implementation of an unserializer (deserializer?). Particularly the point about properties was helpful.

I ended up doing something like this in my implementation: http://code.google.com/p/bui/source/browse/trunk/bui/backend/abstract.py?spec=svn115&r=115 (see AbstractObject). Apparently pre tags are not allowed so I suppose a link is just fine. :)

Post a Comment