Python Variables and references¶

Python does not store values of it's variables, it stores references to the value object instead

In [1]:

a = "a" # variable a references a string object which contains the value: a
type(a)

Out[1]:

str

In [2]:

id(a) # let's check the identity of a

Out[2]:

4297970952

In [3]:

b = a # let's create another reference
print id(b)
print id(a)
print a == b # value coparison
print a is b # identity comparison

4297970952
4297970952
True
True

In [4]:

# Because the string object is immutable, changing the "value" of b won't affect the "value" of a
b = "b"
print id(b)
print a
print id(a)

4297970992
a
4297970952

In [5]:

a = "a"
b = "b"
print id(a+b)
print id(b+a)
print id(a+b), id(b+a)  # Note that ID's of these objects are the same, it's because Python 
                        # first concatenates "a" and "b" into newly allocated string "ab".
                        # This string is then passed to id() and deallocated, since it's no 
                        # longer used. Because of the way CPython works, the next concatenation 
                        # of "b" and "a" is allocated at the same location in memory, hence the result

4393414208
4393414528
4393414208 4393414208

In [6]:

print a+b is b+a        # Here Python has to store both concatenated strings in memory at the same time,
                        # so it can't allocate both to the same location which results in different IDs

False

In [7]:

print a+b
print a+b is "ab"       # Although both sides evaluate to "ab" string, the IDs will be diffrent because 
                        # internally the a+b results in a product of BINARY_ADD, which gets a new ID.
                        # The "ab" is a result of LOAD_CONST which creates an object and then all subsequent 
                        # references will point to the same object

ab
False

In [8]:

print 256 is 256        # Integers are immutable objects, these objects are usually cached
a = 256
b = 256
print a is b

True
True

In [9]:

print 257 is 257        # Python has to have both objects allocated at the same time (to evaluate the expression)
                        # so their IDs match
c = 257
d = 257
print c is d            # But they won't match if passed via reference, numbers above 256 (and strings 
                        # longer than 2 characters) are not cached internally

True
False

Notes on identity and mutability¶

In Python, everything is represented by objects, including code.
Every object has an identity, a type and value, once created, object's ID and type never changes.
The value of some objects can change, these are called mutable, otherwise an object is immutable
Immutable object's value can change if the value of a mutable object it contains/references changes
Objects are never explicitly destroyed, unreferenced objects, hovewer, might be wiped out by the garbage collector
Objects which reference external resources like files should be explicitly closed (ex file.close())
Immutable objects of the same value are not guaranteed to have the same ID
Mutable objects of the same value are guaranteed to have different IDs [] is [] => False

In [10]:

a = [1, 2, 3]
b = a                   # Passing the list by reference to variable b
b[1] = 100
# Because _list_ type is mutable and both variables reference the same list object,
# changing b will affect the shared object and consequently the value of a
print a

[1, 100, 3]

In [11]:

a = [1, 2, 3]
b = a[:]                # This creates a shallow copy of object referenced by a
b[1] = 100
print a

[1, 2, 3]

In [12]:

a = ([], [], [])        # Let's create a tuple of lists, the referenced objects cannot be replaced,
                        # IDs are protected

a[0] = [1, 2, 3]        # Trying to reference a new list object will raise **TypeError**

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-2235965d77e9> in <module>()
      2                         # IDs are protected
      3 
----> 4 a[0] = [1, 2, 3]        # Trying to reference a new list object will raise **TypeError**

TypeError: 'tuple' object does not support item assignment

In [13]:

print a
a[0].extend([1, 2])     # Extending existing (mutable) object is possible
print a

([], [], [])
([1, 2], [], [])

In [15]:

def foo(bar):
    bar.append("Bob")
    print id(bar)
    
some_list = []
foo(some_list)          # Passing reference to the list

print id(some_list)
print some_list         # List is a mutable object, it can be changed by any assigned variable

4393537408
4393537408
['Bob']

Default argument evaluation in funcitons¶

Funciton's default arguments are evaluated once, when the funciton is defined, not when it is called
If the mutable object is used as a default argument, once mutated, all subsequent calls to this function will see it mutated too
This functionality can be "exploited" to maintain a state between calls of a funciton (often used in caching functions)

In [19]:

def bar(sth = []):
    sth.append(1)
    print id(sth)
    print sth

bar()
bar()
bar()

Copying objects¶

Assignment statements in Python do not copy objects, they create a binding/reference between a target and an object
For objects/collections that are mutable/contain mutable items, a copy is needed in order to change one copy without changing the other
A shallow copy constructs a new object and then tries to copy object references found in the original
A deep copy will create a new object and recursively insert copies of all objects found in the original
Recursive objects (directly or indirectly referencing themselves) might cause a recursive loop while deep copying
Deep copy copies everything which might copy objects which should not be copied but shared instead
Deep copying mechanism can be controlled in a class by overriding __deepcopy__() method (__copy__ for shallow copy)
Lists can be shallow copied by assigning a slice of the entire list: copied_list = original_list[:]
Deep copying does not copy types like: module, file, socket, array or similar, __deepcopy__() should be overloaded to properly reinitialize these

In [44]:

import copy

f = open("/tmp/test001.txt", "w")
obj = object()
print id(f), id(obj)

4393424336 4410853408

In [45]:

a = {"file": f, "foo": obj}           # Reference objects
print id(a["file"]), id(a["foo"])

4393424336 4410853408

In [46]:

b = copy.copy(a)                      # Create a shallow copy
print id(b["file"]), id(b["foo"])     # the IDs are the same as original
print b["file"].closed                # File object is still open

4393424336 4410853408
False

In [47]:

c = copy.deepcopy(a)                  # Create a deep copy
print id(c["file"]), id(c["foo"])     # IDs are different
print c["file"].closed                # The file object is closed

4393425056 4410853376
True

Playing with Python references, mutability and object copies

Python Variables and references¶

Notes on identity and mutability¶

Default argument evaluation in funcitons¶

Copying objects¶