Effective Python Summary — Python3 — Part 2

Vu Quang Hoa
6 min readJul 26, 2016

--

Chapter 2: Function

  1. Prefer Exception to returning None

I have write many utility functions, of course, some of them return None. This is a bad practice since None ~ some values equal to False (empty string, empty dictionary, False, …)

>>> def divide(a, b):
try:
return a/b
except Exception, e:
return None
>>> if divide(2, 0):
print '0 is not divided by 2'
else:
print '0 is divided by 2'
>>> # This is totally wrong, this is a side effect of None

In this case, the function shouldn’t handle exception by itself, it should instead raise exception in order to have clearer output and let the owner function (which calls the utility function) deals with exception (this is a clearer exception)

>>> def divide(a, b):
try:
return a/b
except ZeroDivisionError, e:
raise ValueError("Invalid input")

Then

>>> def execute(a, b):
try:
result = divide(a, b)
except ValueError, e:
print str(e)
else:
print result
>>> execute(10, 0)
Invalid input
>>> execute(10, 5)
2

2. How closures interact with variable scope

Example of a closure:

>>> def print_msg(msg):
"""This is the outer enclosing function"""

def printer():
"""This is the nested function"""
print(msg)

return printer # this got changed

A closure is created when:

  • There is a nested function (function inside a function).
  • The nested function must refer to a value defined in the enclosing function.
  • The enclosing function must return the nested function.

How Closure interact with variable scope?

From the textbook:

When you reference a variable in an expression, the Python interpreter will traverse the scope to resolve the reference in this order:

1. The current function’s scope
2. Any enclosing scopes (like other containing functions)
3. The scope of the module that contains the code (also called the global scope)
4. The built-in scope (that contains functions like len and str)
If none of these places have a defined variable with the referenced name, then a NameError exception is raised.

For examples: A function checks an element in the in_list but not in the ignore_list

>>> def find_element(in_list, ignore_list):
def find(element):
if element in ignore_list:
return False
elif element in in_list:
return True
return False
return find
>>> in_list = [1, 2, 3]
>>> ignore_list = [2]
>>> find = find_element(in_list, ignore_list)
>>> find(3)
True
>>> find(2)
False
>>> find(4)
False

What if we add a found variable like

>>> def find_element(in_list, ignore_list):
found = False
def find(element):
if element in ignore_list:
found = True
return False
elif element in in_list:
return True
return False
find(2)
return found, find
>>> in_list = [1, 2, 3]
>>> ignore_list = [2]
>>> found, find = find_element(in_list, ignore_list)
>>> found # it should be True
False
Oops

The reason is explained:
Since found variable is not in the local scope (the nested function cannot access found variable in outer enclosing scope). Hence the local found variable is set to TRUE, but the global found variable is still FALSE

We can use nonlocal on python3 with found variable or using the code below for python2

>>> def find_element(in_list, ignore_list):
found = [False]
def find(element):
if element in ignore_list:
found[0] = True
return False
elif element in in_list:
return True
return False
find(2)
return found[0], find

3. Use generators in returning of list

There are two reasons why we should return generator instead of list here:

  • For returning big output (here)
  • For clean code as it is below
>>> def find_even_numbers(list_a, list_b):
result = []
for element in list_a:
if element % 2 == 0:
result.append(element)
for element in list_b:
if element % 2 == 0:
result.append(element)
return result

We can do better than the above

>>> def find_even_numbers(list_a, list_b):
result = [a for a in list_a if a % 2 == 0]
result.extend([b for b in list_b if b % 2 == 0])
return result

However, if list_a and list_b are big lists and the returning result is also big, we should use GENERATORS

>>> def find_even_numbers(list_a, list_b):
len_a = len(list_a)
len_b = len(list_b)
longest_len = len_a if len_a > len_b else len_b
for i in xrange(longest_len):
if i < len_a and list_a[i] % 2 == 0:
yield list_a[i]
if i < len_b and list_b[i] % 2 == 0:
yield list_b[i]
>>> list_a = [1,2,3]
>>> list_b = [4, 6, 7, 8]
>>> for i in find_even_numbers(list_a, list_b):
print i
4
2
6
8

4. Be defensive when iterating over arguments

Creating iterator each time to save memory, storing all input/output in list would crash the memory.

>>> def average_in_total(populations):
total = sum(populations)
return [population/float(total) for population in populations]
>>> populations = [1500, 2500, 3500]
>>> print average_in_total(populations)
[0.16666666666666666, 0.3333333333333333, 0.5]

This works as expected since the populations is a list. What if the populations is an iterator

>>> def yield_populations():
for i in xrange(1000, 3000, 1000):
yield i
>>> populations = yield_populations()
>>> print average_in_total(populations)
[]

The reason:
The sum function does loops over the iterator once, then the iterator won’t work for the later loop in return statement.

Hence, before looping over the function arguments, we should make sure that the input is correct like

>>> def average_in_total(populations):
populations = list(populations)
total = sum(populations)
return [population/float(total) for population in populations]

5. Don’t use *args for big variable, this is also buggy.

>>> def show_error(message, errors=[]):
if not errors:
print message
else:
print "%s: %s" % (message, ', '.join(errors))
>>> message = 'Those are errors'
>>> errors = ['exception at line 2', 'dead loop']
>>> show_error(message, errors)
Those are errors: exception at line 2, dead loop

When I want to add log_type as the first function argument, this will be look like

>>> def show_error(log_type, message, errors=[]):
if not errors:
print "%s: %s" % (log_type, message)
else:
print "%s: %s: %s" % (log_type, message, ', '.join(errors))
>>> message = 'Those are errors'
>>> errors = ['exception at line 2', 'dead loop']
>>> log_type = 'Warning'
>>> show_error(log_type, message, errors)
Warning: Those are errors: exception at line 2, dead loop
>>> show_error(message, errors)
Those are errors: ['exception at line 2', 'warning loop'] # This is not what we expected => totally wrong.

There are some issues with *args:
1. The developer read the function and doesn’t know what variables are included
2. The developer have to loops through all variables *args in order to get the exactly variable they want.
3. The developer wants to add some variables as first inputs, this will break calling statements to old function

=> SOLUTION: using **kwargs for many function arguments and *args for few function arguments (<= 3 as a best practice)

>>> def show_error(log_type='Warning', message=None, errors=[]):
if not errors:
print "%s: %s" % (log_type, message)
else:
print "%s: %s: %s" % (log_type, message, ', '.join(errors))
>>> message = 'Those are errors'
>>> errors = ['exception at line 2', 'dead loop']
>>> log_type = 'Warning'
>>> show_error(message=message, errors=errors) # This is clearer
Warning: Those are errors: exception at line 2, dead loop

6. Optional behaviours with keyword arguments

>>> import string
>>> import random
>>> def generate_random_passcode_1(size, chars):
return ''.join(random.choice(chars) for _ in range(size))
>>> def generate_random_passcode_2(
size=8, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
>>> generate_random_passcode_1(3, ['a', 'b', 'c'])
'cbc'
>>> generate_random_passcode_1(['a', 'b', 'c'], 3)
Exception
>>> generate_random_passcode_2() -> random 8 characters
>>> generate_random_passcode_2(size=10) -> random 10 characters
>>> generate_random_passcode_2(chars=['a', 'b', 'c']) -> random 8 characters in ['a', 'b', 'c']

generate_random_passcode_2 is better than generate_random_passcode_1

* easy to know what params and values the developer should pass to the function.
* clearer
However: we should always pass keyword arguments in order to take advantages of **kwargs

7. Use None or Docstring to specify dynamic default arguments.

This is mentioned here

Default arguments are evaluated once, during function definition at module load time => if we specify default={} or [], this will lead to ood behaviours.

def test(numbers=[]):
numbers.append(10)
print numbers
>>> test()
[10]
>>> test(numbers=[1,2])
[1, 2, 10]
>>> test(numbers=[1,2,3])
[1, 2, 3, 10]
# Now look carefully.>>> test()
[10] # intended
>>> test()
[10, 10] # Oops
>>> test()
[10, 10, 10] # What is that?
>>> test()
[10, 10, 10, 10] # Boom...

8. Enforce clarity with keyword-only arguments.

No one can deny the advantages of **kwargs, however, if the developer keeps calling the function with *args, the function calling statement still works but this will lead to confusion to other developers

>>> def show_error(log_type, message, errors=[]):
if not errors:
print "%s: %s" % (log_type, message)
else:
print "%s: %s: %s" % (log_type, message, ', '.join(errors))
>>> message = 'Those are errors'
>>> errors = ['exception at line 2', 'dead loop']
>>> log_type = 'Warning'
>>> show_error(log_type, message, errors)
Warning: Those are errors: exception at line 2, dead loop

However, we want to enforce other developers to use **kwargs when calling this function (in python2):

>>> def show_error(log_type, message, **kwargs):
errors = kwargs.get('errors', [])
if not errors:
print "%s: %s" % (log_type, message)
else:
print "%s: %s: %s" % (log_type, message, ', '.join(errors))
>>> message = 'Those are errors'
>>> errors = ['exception at line 2', 'dead loop']
>>> log_type = 'Warning'
>>> show_error(log_type, message, errors=errors)
Warning: Those are errors: exception at line 2, dead loop
>>> show_error(log_type, message, errors)
TypeError: show_error() takes exactly 2 arguments (3 given)

Part 3 Class is coming

--

--