Playing with Python references, mutability and object copies

A collection of notes I've made while preparing materials for my Python course. In this notebook I try to explain some language mechanisms which might be a bit confusing.

Python Variables and references

Python does not store values of it's variables, it stores references to the value object instead

In [1]:
a = "a" # variable a references a string object which contains the value: a
type(a)
Out[1]:
str
In [2]:
id(a) # let's check the identity of a
Out[2]:
4297970952
In [3]:
b = a # let's create another reference
print id(b)
print id(a)
print a == b # value coparison
print a is b # identity comparison
4297970952
4297970952
True
True
In [4]:
# Because the string object is immutable, changing the "value" of b won't affect the "value" of a
b = "b"
print id(b)
print a
print id(a)
4297970992
a
4297970952
In [5]:
a = "a"
b = "b"
print id(a+b)
print id(b+a)
print id(a+b), id(b+a)  # Note that ID's of these objects are the same, it's because Python 
                        # first concatenates "a" and "b" into newly allocated string "ab".
                        # This string is then passed to id() and deallocated, since it's no 
                        # longer used. Because of the way CPython works, the next concatenation 
                        # of "b" and "a" is allocated at the same location in memory, hence the result
4393414208
4393414528
4393414208 4393414208
In [6]:
print a+b is b+a        # Here Python has to store both concatenated strings in memory at the same time,
                        # so it can't allocate both to the same location which results in different IDs
False
In [7]:
print a+b
print a+b is "ab"       # Although both sides evaluate to "ab" string, the IDs will be diffrent because 
                        # internally the a+b results in a product of BINARY_ADD, which gets a new ID.
                        # The "ab" is a result of LOAD_CONST which creates an object and then all subsequent 
                        # references will point to the same object
ab
False
In [8]:
print 256 is 256        # Integers are immutable objects, these objects are usually cached
a = 256
b = 256
print a is b
True
True
In [9]:
print 257 is 257        # Python has to have both objects allocated at the same time (to evaluate the expression)
                        # so their IDs match
c = 257
d = 257
print c is d            # But they won't match if passed via reference, numbers above 256 (and strings 
                        # longer than 2 characters) are not cached internally
True
False

Notes on identity and mutability

  • In Python, everything is represented by objects, including code.
  • Every object has an identity, a type and value, once created, object's ID and type never changes.
  • The value of some objects can change, these are called mutable, otherwise an object is immutable
  • Immutable object's value can change if the value of a mutable object it contains/references changes
  • Objects are never explicitly destroyed, unreferenced objects, hovewer, might be wiped out by the garbage collector
  • Objects which reference external resources like files should be explicitly closed (ex file.close())
  • Immutable objects of the same value are not guaranteed to have the same ID
  • Mutable objects of the same value are guaranteed to have different IDs [] is [] => False
In [10]:
a = [1, 2, 3]
b = a                   # Passing the list by reference to variable b
b[1] = 100
# Because _list_ type is mutable and both variables reference the same list object,
# changing b will affect the shared object and consequently the value of a
print a
[1, 100, 3]
In [11]:
a = [1, 2, 3]
b = a[:]                # This creates a shallow copy of object referenced by a
b[1] = 100
print a
[1, 2, 3]
In [12]:
a = ([], [], [])        # Let's create a tuple of lists, the referenced objects cannot be replaced,
                        # IDs are protected

a[0] = [1, 2, 3]        # Trying to reference a new list object will raise **TypeError**
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-2235965d77e9> in <module>()
      2                         # IDs are protected
      3 
----> 4 a[0] = [1, 2, 3]        # Trying to reference a new list object will raise **TypeError**

TypeError: 'tuple' object does not support item assignment
In [13]:
print a
a[0].extend([1, 2])     # Extending existing (mutable) object is possible
print a
([], [], [])
([1, 2], [], [])
In [15]:
def foo(bar):
    bar.append("Bob")
    print id(bar)
    
some_list = []
foo(some_list)          # Passing reference to the list

print id(some_list)
print some_list         # List is a mutable object, it can be changed by any assigned variable
4393537408
4393537408
['Bob']

Default argument evaluation in funcitons

  • Funciton's default arguments are evaluated once, when the funciton is defined, not when it is called
  • If the mutable object is used as a default argument, once mutated, all subsequent calls to this function will see it mutated too
  • This functionality can be "exploited" to maintain a state between calls of a funciton (often used in caching functions)
In [19]:
def bar(sth = []):
    sth.append(1)
    print id(sth)
    print sth

bar()
bar()
bar()
4393535104
[1]
4393535104
[1, 1]
4393535104
[1, 1, 1]

Copying objects

  • Assignment statements in Python do not copy objects, they create a binding/reference between a target and an object
  • For objects/collections that are mutable/contain mutable items, a copy is needed in order to change one copy without changing the other
  • A shallow copy constructs a new object and then tries to copy object references found in the original
  • A deep copy will create a new object and recursively insert copies of all objects found in the original
  • Recursive objects (directly or indirectly referencing themselves) might cause a recursive loop while deep copying
  • Deep copy copies everything which might copy objects which should not be copied but shared instead
  • Deep copying mechanism can be controlled in a class by overriding __deepcopy__() method (__copy__ for shallow copy)
  • Lists can be shallow copied by assigning a slice of the entire list: copied_list = original_list[:]
  • Deep copying does not copy types like: module, file, socket, array or similar, __deepcopy__() should be overloaded to properly reinitialize these
In [44]:
import copy

f = open("/tmp/test001.txt", "w")
obj = object()
print id(f), id(obj)
4393424336 4410853408
In [45]:
a = {"file": f, "foo": obj}           # Reference objects
print id(a["file"]), id(a["foo"])
4393424336 4410853408
In [46]:
b = copy.copy(a)                      # Create a shallow copy
print id(b["file"]), id(b["foo"])     # the IDs are the same as original
print b["file"].closed                # File object is still open
4393424336 4410853408
False
In [47]:
c = copy.deepcopy(a)                  # Create a deep copy
print id(c["file"]), id(c["foo"])     # IDs are different
print c["file"].closed                # The file object is closed
4393425056 4410853376
True