Lesson Plan

You may want to follow along with some of these examples, you can start an interactive Python prompt (an “interpreter”) such as you see here by running python (the basic Python interpreter) or ipython (a more friendly interpreter).

$ python
Python 2.7.2+ (default, Oct  4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print 'Hello, world!'
Hello, world!
>>>

You can exit the interpreter by hitting your platform’s <end of input> key combination. On Windows this is <ctrl-z><enter>. On Linux or Mac OSX it is <ctrl-d>.

Hello World!

Exercise

  • Exit the interpreter

    • we are going to make a Python script print out our traditional greeting
  • cd to the exercises directory

  • Edit the file helloworld.py

    • alter the script to print ‘Hello, world!’
    • the print statement is the same as we entered into the interpreter after the >>> prompt
  • Run the script from the command line:

    $ ./helloworld.py
    $ # or
    $ python helloworld.py
    

Basics

You will want to start a new python interpreter session and follow along with these examples. If something doesn’t work, or confuses you, ask.

$ python
  • variables, assignment, print
>>> count = 5.0
>>> print 'count', count
count 5.0
>>> count = count + 3.0
>>> print 'new count',count
new count 8.0
  • arithmetic
>>> count = 5.0
>>> count + 3
8.0
>>> count * 3
15.0
>>> count / 3
1.6666666666666667
>>> count ** 2 # exponentiation
25.0
>>> count ** .5
2.23606797749979
>>> count % 3 # remainder
2.0
>>> count // 3 # quotient
1.0
>>> count * 3 + 2
17.0
>>> count * (3 + 2)
25.0
  • comparisons between numbers
>>> 2 > 3
False
>>> 0 < 4
True
>>> 3 >= 3
True
>>> 4 <= 6
True
>>> 8 == 8
True
>>> 8 == 9
False
  • comparisons between strings
>>> "a" > "b"
False
>>> "a" < "b"
True
>>> "a" < "A"
False
>>> "z" < "Z"
False
>>> "this" > "th" # having "something more" means you are > ('i' is compared to '')
True
>>> "this" > "tho" # the first difference determines the result ('i' is compared to 'o')
False

Note

Why is “a” > “A”?

Your computer represents the two characters with different numbers internally. Those numbers happen to be arranged such that “a” (97) is greater than “A” (65).

  • variables point to values (objects), not to other variables
>>> first = 1
>>> second = 2
>>> second = first
>>> second
1
>>> first = 3
>>> first
3
>>> second
1
  • basic types
>>> count = 36
>>> print count
36
>>> count / 10 # surprising?
3
>>> irrational = 3.141592653589793
>>> irrational
3.141592653589793
>>> label = "irrational, 'eh"
>>> label2 = 'count "this"'
>>> label3 = '''python has these too\n'''
>>> label4 = """but they are just a different way to write the same thing"""
>>> print label, label2, label3, label4
irrational, 'eh count "this" python has these too
but they are just a different way to write the same thing
>>> print label + label2
irrational, 'ehcount "this"
>>> None # doesn't show up
>>> print None
None
>>> print True
True
>>> print False
False
>>> True == 1
True
>>> False == 1
False
>>> False == 0
True
  • what type of object is something?
>>> type( 0 )
<type 'int'>
>>> type( 1 )
<type 'int'>
>>> type( 1.0 )
<type 'float'>
>>> type( [] )
<type 'list'>
>>> type( False )
<type 'bool'>
>>> type( 'blue' )
<type 'str'>

Note

What do those () characters mean in type(0)?

We are asking a “thing” (“object”) called type to “act” upon a single thing, which is our integer value 0. The thing type has a piece of code (a “function” or “method”) that tells it what to do when it is “asked to act” (“called”) on a set of things (“arguments” or “parameters”). Here the set of arguments we are passing is a single value, but later on we will see how to pass multiple arguments into functions which support multiple arguments.

We’ll see how to write our own functions later in this tutorial. A “method” is a function which is “attached” to an object, we’ll use these throughout the tutorial, but this tutorial does not yet cover how to write our own objects.

  • type conversions, each type normally can be “called” to create a new value of that type
>>> string = '32'
>>> string
'32'
>>> int(string)
32
>>> float(string)
32.0
>>> str( int( string ))
'32'
>>> str( float( string ))
'32.0'
>>> string = '32.6'
>>> int(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '32.6'
>>> float( string )
32.6
>>> int( float (string ))
32
>>> int( round( float( string ), 0 ))
33
>>> round( 32.6, 0 )
33.0
>>> round( 32.6, 1 )
32.6

Exercise

In the interpreter, multiply the strings ‘10’ and ‘20’ to get the integer result 200:

>>> first = '10'
>>> second = '20'
...
>>> print first * second # should print 200

Lists

  • lists are “collections of things” which have a particular order
>>> integers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> integers
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> integers.append( 11 )
>>> integers
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11]
>>> integers.insert( 0, 12 )
>>> integers
[12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11]
>>> len(integers)
12
>>> sorted(integers)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
>>> integers # why didn't integers change?
[12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11]
>>> integers.sort()
>>> integers # the .sort() method did an "in place" sort
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
>>> integers.append( 'apple' )
>>> integers
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 'apple']

Note

What did integers.append( 11 ) mean?

Here the object integers had a piece of code attached to it (we call these “methods”) which was written to “add an object to the end of THIS list”. By writing integers.append we “looked up” this piece of code in the integers list. When we added the ( 11 ) to the statement we asked that piece of code (append) to act on a single integer object 11.

Note

Python’s interactive interpreter has a help function that allows you to get documentation on a particular type of object, such as list. The help text normally includes all of the “methods” that are available, a description of the parameters for each method, and normally a “docstring” (human description) for the method explaining what it does, and occasionally how it does it.

>>> help( list ) # type <q> to exit the help

Exercise

  • edit the file exercises/basicexercise.py
    • create, modify and display some variables
  • run the file with python basicexercise.py from the exercises directory or ./basicexercise.py if you prefer.
#! /usr/bin/env python
# basicexercise.py

# Create 4 variables pointing to each of the following:
#  An integer (int)
#  A (positive) floating-point number (float)
#  A list
#  A string (str)

# Print the square of the integer 

# Print the square root of the float 

# Append the float to the list 

# Insert the string in the list at index 0

# Insert the integer in the list at index 0

# Print the list

List Indexing

  • indexing

    Image showing indices aligned with spaces before each item in list
    • alist[i] looks up the index in the above scheme and gets the next item
    • alist[-i] looks up the index in the second line and gets the next item
>>> counts = [0,1,2,3,4]
>>> counts[0]
0
>>> counts[1]
1
>>> counts[2]
2
>>> counts[-1]
4
>>> counts[-2]
3
>>> counts[-4]
1
>>> counts[-5]
0
>>> counts[-6]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>
  • list slicing
    • alist[i:j] looks up the index i, then includes all items until it reaches the index j
    • you can leave off the index for start/end
      • alist[:j] retrieves all items from start (index 0) until we reach j, this is, conveniently, the first j items
      • alist[i:] starts at index i and retrieves all items until we reach the end, this “skips” the first i items
>>> counts = [1, 2, 3, 4, 5]
>>> counts
[1, 2, 3, 4, 5]
>>> counts[1:]
[2, 3, 4, 5]
>>> counts[:-1]
[1, 2, 3, 4]
>>> counts[1:-1]
[2, 3, 4]
>>> counts[99:]
[]
>>> counts[:-99]
[]
>>> counts[3:8]
[4, 5]

Note

Bonus Material

You can also specify a “step” in your slices:

>>> counts[::2] # every other item, starting at 0
[1, 3, 5]
>>> counts[1::2] # every other item, starting at 1
[2, 4]
>>> counts[::-1] # the whole list, stepping backward
[5, 4, 3, 2, 1]
>>> counts[-1:1:-1] # start at index -1, step backward while index is > 1
[5, 4, 3]
  • convenience function for creating ranges of integers
>>> range( 5 )
[0, 1, 2, 3, 4]
>>> range( 2, 5 )
[2, 3, 4]

Exercise

  • slice and dice a list
#! /usr/bin/env python
# basicsliceexercise.py

# we create a list of integers...
integers = range( 0, 20 )

# Print the first item of the list 
# Print the last item of the list

# Print the first 5 items of the list (a slice)
# Print the last 5 items of the list (a slice)

# Print 5 items starting from index 5

Boolean Logic

Reduces down to the statement: “if this is True, do that, otherwise do that”. Computers, being binary (on/off) machines work very easily with on/off choices such as boolean logic.

  • almost any object can be tested for “boolean truth”
>>> bool( 0 )
False
>>> bool( 1 )
True
>>> bool( [] )
False
>>> bool( ['this'] )
True
>>> bool( 0.0 )
False
>>> bool( 1.0 )
True
>>> bool( '' )
False
>>> bool( 'this' )
True
>>> bool( None )
False
  • if, elif, else
    • only do a given “suite” of commands if the “check” matches
    • else is for when no other check matches (and is optional)
>>> x = 32
>>> if x < 5:
...     print 'hello'
... elif (x+4 > 33):
...     print 'hello world'
... else:
...     print 'world'
...
hello world

Note

Technical Tidbit

Your computer is formed of tiny electrical switches where a current in one “wire” can prevent or allow a current from flowing in another “wire”. Below all the levels of abstraction, when the computer decides “if this is True” it is checking whether a value can flow through the second “wire”.

  • comparisons are boolean operators
    • == (are they equal) vs = (assign value)
    • >=, <=, != (not equal)
  • logical combinations allow you to string together boolean tests
    • and, or, not
>>> x = 23
>>> y = 42
>>> (x == y) or (x * 2 > y )
True
>>> (x == y) or (x > y)
False
>>> (x < y) and (y > 30)
True
>>> (x == y) or (not x > y)
True

Loops

  • while something is True, keep doing “this set of things”
>>> x = 10
>>> while x > 0:
...     print x
...     x = x - 1
...
10
9
8
7
6
5
4
3
2
1
>>> counts = [1, 2, 3, 4, 5]
>>> i = 0
>>> while i < len(counts):
...     count = counts[i]
...     print count
...     i = i + 1
...
1
2
3
4
5
  • loops using for x in y are syntactic “sugar” for that last while loop, this pattern is referred to as “iterating over” an object, and is extremely common
>>> counts = [1, 2, 3, 4, 5]
>>> for count in counts:
...     print count
...
1
2
3
4
5
  • “suites” of commands, python is not normal here (most languages use {} braces or pairs of words, such as do and done)
for var in a,b,c,d
do
    echo "Variable is ${var}"
    ls ${var}
done
#! /usr/bin/env python
# iterforxiny.py

measurements = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

print 'Squares'
total = 0
for item in measurements:
    print item, item ** 2
    total += item **2
print 'Sum of Squares:', total
  • the suites can “nest” with further for-loops (or other structures)
#! /usr/bin/env python
# iternest.py

rows = [
    ['Lightseagreen Mole', 24.0, -0.906, 0.424, -2.13, 0.0, 'green'],
    ['Indigo Stork', 51.0, 0.67, 0.742, 0.9, 9.0, 'yellow'],
]

for i,row in enumerate( rows ):
    print 'rows[{}]'.format( i )
    for measurement in row[1:-1]:
        print '  {}'.format( measurement )

Note

The enumerate function we use in the above sample can be thought of as doing this:

result = []
for i in range( len( rows )):
    result.append( (i,rows[i]))
return result

but is actually implemented in a more efficient manner.

#! /usr/bin/env python
# iterfilter.py

measurements = range( 30 )

print 'Odd Triple Squares'
total = 0
rest = 0
for item in measurements:
    if item == 25:
        print '25 is cool, but not an odd triple'
    elif item % 2 and not item % 3:
        print item, item ** 2
        total += item **2
print 'Sum of Odd Triple Squares:', total

Exercise

  • construct lists by iterating over other lists
  • use conditions to only process certain items in a list
  • use conditions and a variable to track partial results
#! /usr/bin/env python
# iterexercise.py

rows = [
    ['Lightseagreen Mole', 24.0, -0.906, 0.424, -2.13, 0.0, 'green'],
    ['Springgreen Groundhog', 77.0, 1.0, -0.031, -32.27, 25.0, 'red'],
    ['Blue Marten', 100.0, -0.506, 0.862, -0.59, 16.0, 'yellow'],
    ['Red Bobcat', 94.0, -0.245, 0.969, -0.25, 36.0, 'green'],
    ['Ghostwhite Falcon', 31.0, -0.404, 0.915, -0.44, 49.0, 'green'],
    ['Indigo Stork', 51.0, 0.67, 0.742, 0.9, 9.0, 'yellow'],
]


# Create 2 lists holding the first two columns of *numeric* data 
# (second and third columns)

# Print those items in the second column which are greater than 20 and less than 90

# Print the largest value in the third column

String Manipulation

  • strip (remove whitespace or other characters)
>>> value = '  25.3  '
>>> value
'  25.3  '
>>> value.strip()
'25.3'
>>> quoted = '"this"'
>>> quoted
'"this"'
>>> quoted.strip('"')
'this'
  • split, join
>>> row = 'Silver Deer,69,-0.115,0.993,-0.12,25,violet'
>>> components = row.split( ',' )
>>> components
['Silver Deer', '69', '-0.115', '0.993', '-0.12', '25', 'violet']
>>> print "\n".join( components )
Silver Deer
69
-0.115
0.993
-0.12
25
violet
>>> not_all_strings = [ 'Silver Goat', 45, -.333, .75, .08, 5, 'violet' ]
>>> "\n".join( not_all_strings )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 1: expected string, int found
>>> count = 53
>>> mean = 37.036
>>> label = 'DMX Score'
>>> '{0},{1},{2}'.format( label, count, mean )
'DMX Score,53,37.036'
>>> '{0!r} for {1} items {2:0.2f}'.format( label, count, mean )
"'DMX Score' for 53 items 37.04"

Dictionaries

  • a.k.a. hash-tables in other languages, have special syntax in most scripting languages
    • keys must be immutable (technically, hashable)
    • values (anything)
#! /usr/bin/env python
# dictdefinitions.py

dictionary = {}
dictionary2 = {
    'thar': 'thusly',
    'them': 'tharly',
}
dictionary3 = {2:3, 4:None, 5:18}
  • you can add, remove, reassign
>>> dictionary = {}
>>> dictionary
{}
>>> dictionary['this'] = 'those'
>>> dictionary
{'this': 'those'}
>>> dictionary['those'] = 23
>>> dictionary
{'this': 'those', 'those': 23}
>>> len(dictionary)
2
>>> dictionary['this'] == 'those'
True
>>> del dictionary['those']
>>> dictionary
{'this': 'those'}
>>> 'those' in dictionary
False
>>> 'this' in dictionary
True
>>> dictionary['those']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'those'
  • only one entry for each equal-hash-and-compare-equal key
    • you can thus use a dictionary to confirm/create uniqueness
    • values must compare equal and have the same “hash”, this is “computer equal”, not “human equal”, though Python tries to make “computer equal” a bit more human e.g. with floats/ints
>>> dictionary = {'this':'that'}
>>> dictionary[ ' this ' ] = 'thar'
>>> dictionary
{'this': 'that', ' this ': 'thar'}
>>> dictionary[ 45 ] = 8
>>> dictionary[ 45.0 ] = 9
>>> dictionary
{'this': 'that', ' this ': 'thar', 45: 9}
>>> # Super Bonus Ask During Coffee Question: why is the key 45 and not 45.0?
  • iterable, but un-ordered, so don’t depend on the order of items
#! /usr/bin/env python
# dictiteration.py

dictionary = {'this':'that','those':'thar',23:18,None:5}

print 'items',dictionary.items()
print 'values',dictionary.values()
print 'keys', dictionary.keys()

for key in dictionary:
    print '{0!r} : {1}'.format( key, dictionary[key] )

Exercise

  • loop over a list of strings, split into key: value pairs and add to dictionary
#! /usr/bin/env python
# dictexercise.py

rows = [
    'Dodgerblue Lemming ,30,-0.988,0.154,-6.41,36,yellow',
    ' Orangered Myotis,88,0.035,0.999,0.04,0,blue',
    '  Aquamarine Falcon,68,-0.898,0.44,-2.04,16, indigo',
    'Lightsalmon Prairie-Dog,20,0.913,0.408,2.24,16,violet ',
    'Magenta Pigeon,25,-0.132,0.991,-0.13,1, blue',
    'Peru Eagle,25,-0.132,0.991,-0.13,1, blue',
    'Peru Eagle  ,25,-0.132,0.991,-0.13,1,red ',
]
# create a dictionary that maps column 1 (name) to the last column (colour)
# strip the name of any extra whitespace, same with the colour
# for each item in the dictionary, print the name and colour 
# what colour will be shown for "Peru Eagle"?

Reading a File

  • look at ../sample_data.csv, note how it looks like the data in the previous exercise
Subject,Count,DMX Score,Coda Score,Vinny Score,Zim Score,Subject Choice
Dodgerblue Lemming,30,-0.988,0.154,-6.41,36,yellow
Orangered Myotis,88,0.035,0.999,0.04,0,blue
Aquamarine Falcon,68,-0.898,0.44,-2.04,16,indigo
Lightsalmon Prairie-Dog,20,0.913,0.408,2.24,16,violet
Magenta Pigeon,25,-0.132,0.991,-0.13,1,blue
Peru Eagle,25,-0.132,0.991,-0.13,1,blue
Mintcream Caribou,10,-0.544,-0.839,0.65,4,green
Silver Deer,69,-0.115,0.993,-0.12,25,violet
Darkslateblue Ibis,34,0.529,-0.849,-0.62,4,violet
Olive Goshawk,74,-0.985,0.172,-5.74,4,blue
Lightcoral Seal,47,0.124,-0.992,-0.12,49,indigo
Red Vulture,37,-0.644,0.765,-0.84,25,yellow
Palegoldenrod Brown-Bear,11,-1.0,0.004,-225.95,9,blue
Firebrick Coyote,51,0.67,0.742,0.9,9,yellow
Thistle Bustard,84,0.733,-0.68,-1.08,16,red
Whitesmoke Lynx,57,0.436,0.9,0.48,1,orange
Beige Wolverine,90,0.894,-0.448,-2.0,4,violet
Darkorchid Grebe,85,-0.176,-0.984,0.18,25,orange
Ivory Wolf,18,-0.751,0.66,-1.14,4,blue
Fuchsia Moose,62,-0.739,0.674,-1.1,36,violet
  • this is a standard comma separated value data-file, possibly from some survey which observed animals and subjected them to various (humane) tests which generated measurements. Let’s poke around in it:
>>> reader = open( '../sample_data.csv', 'r') # r is for "read" mode
>>> reader 
<open file '../sample_data.csv', mode 'r' at 0x...>
>>> content = reader.read()
>>> len(content)
995
>>> reader.close()
>>> lines = content.splitlines()
>>> len(lines)
21
>>> lines[0]
'Subject,Count,DMX Score,Coda Score,Vinny Score,Zim Score,Subject Choice'
>>> lines[1]
'Dodgerblue Lemming,30,-0.988,0.154,-6.41,36,yellow'
>>> lemming = lines[1]
>>> columns = lemming.split(',')
>>> columns
['Dodgerblue Lemming', '30', '-0.988', '0.154', '-6.41', '36', 'yellow']
>>> measurement = columns[2]
>>> measurement
'-0.988'
>>> type(measurement)
<type 'str'>
>>> measurement = float( measurement )
>>> measurement
-0.988
>>> type(measurement)
<type 'float'>
  • the previous loaded the whole file into memory at one go, we could also have iterated over the file line-by-line.
>>> reader = open( '../sample_data.csv', 'r')
>>> header = reader.readline()
>>> header # note the '\n' character, you often need to do a .strip()!
'Subject,Count,DMX Score,Coda Score,Vinny Score,Zim Score,Subject Choice\n'
>>> for line in reader:
...     print float(line.split(',')[2])
...
-0.988
0.035
...
  • the special file sys.stdin can be used to process input which is being piped into your program at the bash prompt (we’ll see two more special pipes in Writing (Structured) Files below.
#! /usr/bin/env python
# argumentsstdin.py
import sys
header = sys.stdin.readline()
for line in sys.stdin:
    print float(line.split(',')[2])
$ cat ../sample_data.csv | ./argumentsstdin.py
-0.988
0.035
-0.898
0.913
...

Note

file objects keep an internal “pointer” (offset, bookmark) which they advance as you iterate through the file. Regular files on the file-system can be “rewound” or positioned explicitly. File-like objects such as pipes often cannot provide this functionality.

Exercise

  • read data from ../sample_data.csv so that you have a list of strings, one string for each line in the file
  • use code from dictexercise.py to again map names to colours and print the colour of a “Firebrick Coyote”
  • use code from iterexercise.py to turn data into columns and find out which animal was spotted most frequently
#! /usr/bin/env python
# filereadexercise.py

# read data from ../sample_data.csv

# use code from dictexercise.py to again map names to colours
# what colour is a "Firebrick Coyote"?

# for each column in the data file make a list containing
# that columns data
# which animal was seen most frequently?
# Hint: refer back to iterexercise.py and dictexercise.py
# Hint: watch out for the header row
# Hint: help(list.index)

Simple Functions

  • previous exercise introduced code reuse
  • simple functions, one returned value
#! /usr/bin/env python
# functionsimple.py

def double( value ):
    """returns the value multiplied by two"""
    return value * 2

def larger( first, second):
    """Return the larger of first and second"""
    if first >= second:
        return first 
    else:
        return second 

print 'double of the larger of 3 and 4:',double( larger( 3,4 ))
# print value # doesn't work
  • variable scope

Exercise

  • copy your code from filereadexercise.py into readdata.py and turn the code that reads data from a file into a function that returns a list of strings
  • have the function take a file name as an argument
#! /usr/bin/env python
# readdata.py

# use your code from filereadexercise.py here. turn the code that reads
# from a file into a function that returns a list of strings from the
# file. make the function take a filename as an argument.
# Hint: make sure to call your function so the rest of your code works!

Functions as Building Blocks

  • grouping code in small, logical chunks helps you reuse it
  • docstrings
#! /usr/bin/env python
# functionreuse.py

def pretty_print_add(x, y):
    """
    nicely print the addition of two things
    """
    template = '{0} + {1} = {2}'
    print template.format(x, y, x + y)


pretty_print_add(8, 9)

pretty_print_add(4.5, 5.6)

pretty_print_add((1,2), (3,4))

pretty_print_add([5,6], [7,8])

Exercise

  • in readdata.py group the code that makes a dictionary from the data into a function that returns the dictionary
  • also put the code that makes lists into a function that returns several lists
  • have both of these functions take a file name as an argument and call the file reading function you’ve already written
  • call these functions and use the data they return to make the rest of the code work

Modules

  • using code from other files, modules and importing
  • put all code into functions
  • __name__ == ‘__main__’
#! /usr/bin/env python
# moduledemo1.py

from functionreuse import pretty_print_add

pretty_print_add(145, 396)
#! /usr/bin/env python
# moduledemo2.py

from pretty_print import pretty_print_add

pretty_print_add(145, 396)
#! /usr/bin/env python
# pretty_print.py

def pretty_print_add(x, y):
    """
    nicely print the addition of two things
    """
    template = '{0} + {1} = {2}'
    print template.format(x, y, x + y)

if __name__ == '__main__':
    pretty_print_add(8, 9)

    pretty_print_add(4.5, 5.6)

    pretty_print_add((1,2), (3,4))

    pretty_print_add([5,6], [7,8])

Exercise

  • write a function that finds the mean of a list of numbers and use it to find the mean of each of the score columns in sample_data.csv
  • use your functions in readdata.py by importing them
    • you will need to modify readdata.py so that it doesn’t print anything when you import it
#! /usr/bin/env python
# moduleexercise.py

# import relevant function from ``readdata.py``. make sure nothing is
# printed to the screen when you do this.

# write a function that calculates the mean of a list of numbers
# Hint: help(sum) and help(len)

# find the mean of each of the score columns in sample_data.csv
# and print them
# put this code in a function too

if __name__ == '__main__':
    # call just one function here so that the means are printed

Arguments and Return Codes

  • as you will recall from the bash session, programs have return codes which invoking programs will check to see whether the program succeeded
  • main function and the “entry point” for scripts
    • scripting languages execute their code line-by-line, so they don’t have a void main() {} entry point as in C
    • putting the main actions inside a function doesn’t seem that useful until you discover that most Python packaging tools can generate wrapper scripts that invoke a particular function (such as main, here)
#! /usr/bin/env python
# argumentsmain.py

import sys

def main():
    """Primary entry point for the script/module"""
    return 1

# A python-only idiom meaning "only execute this if we are the top-level script"
# i.e. do *not* run this if we are being imported as a module
if __name__ == "__main__":
    sys.exit( main())
  • command line arguments, sys.argv
#! /usr/bin/env python
# argumentsargv.py

import sys

def print_files( files ):
    """A function another module might want to invoke"""
    for file in files:
        print file 

def main():
    """Primary entry point for the script/module"""
    if sys.argv[1:]:
        print_files( sys.argv[1:] )
        return 0
    else:
        sys.stderr.write( "You need to provide file[s]\n" )
        return 1

if __name__ == "__main__":
    sys.exit( main())

Exercise

  • modify your moduleexercise.py script take the file to process from the (bash) command line

Writing (Structured) Files

  • while using print is fine when you are directly communicating with a user, you will often want to output data in a structured format for future processing
  • files can be opened in “write” mode by passing 'w' as the mode parameter
  • the standard module sys has two pipe handles already opened for output, these are similar to the pipe handle sys.stdin we saw in Reading a File.
    • stdout – where most client programs expect your primary output
    • stderr – where most client programs expect error messages, warnings etc.
#! /usr/bin/env python
# outputbasic.py
import sys

rows = [
    ['Lightseagreen Mole', 24.0, -0.906, 0.424, -2.13, 0.0, 'green'],
    ['Indigo Stork', 51.0, 0.67, 0.742, 0.9, 9.0, 'yellow'],
]

def format_row( row ):
    result = []
    for item in row:
        result.append( str(item))
    return ",".join( result )

def write_rows( rows, writer ):
    for row in rows:
        writer.write( format_row( row ))
        writer.write( '\n' )
    
def write_file( rows, filename='' ):
    if not filename:
        write_rows( rows, sys.stdout )
    else:
        writer = open( filename,'w')
        write_rows( rows, writer )
        writer.close()

if __name__ == "__main__":
    if sys.argv[1:]:
        write_file( rows, sys.argv[1] )
    else:
        write_file( rows )
    

Exercise

  • modify your moduleexercise.py script to write the summary information for each (numeric) column processed into a CSV file where each row is the original column label (the first row in the file) and the mean value for that row

Exceptions and Tracebacks

  • so far we’ve ignored situations where errors occurred, but real software needs to handle errors or unexpected conditions all the time
>>> value = ' Aquamarine Falcon '
>>> float( value )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float:  Aquamarine Falcon
  • when functions call other functions, the system creates a “stack” of “frames”, an uncaught error will, by default, print out a “traceback” of these frames
    • when something goes wrong, you use the traceback to help you find out where and what the problem was
    • in python the traceback is ordered from “top” to “bottom”, that is, the “frame” printed first in the traceback (“<stdin>” in the example below) is the “top level” caller
    • each frame is a function which was running (not yet complete) when the uncaught error was encountered
    • in python, the last line of the traceback is a string representation of the Exception which was raised, which generally attempts to be a useful description of what went wrong
>>> from functionarguments import *
>>> rows = split_rows( open('../sample_data.csv').read().splitlines()[1:] )
>>> first,second = extract_columns( rows, 1, -2 )
>>> first,second = extract_columns( rows, 30 )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "functionarguments.py", line 15, in extract_columns
    result.append( extract_column( rows, column ))
  File "functionarguments.py", line 8, in extract_column
    result.append( row[column] )
IndexError: list index out of range
  • it is possible to catch these Exceptions in Python by using a special type of block around the code in which the exception may occur
>>> value = '  Aquamarine Falcon '
>>> try:
...     value = float( value )
... except ValueError, err:
...     value = value.strip()
...
>>> value
'Aquamarine Falcon'

Note

We can catch multiple Exception types using except (ValueError,TypeError), err instead.

Note

The syntax for catching exceptions changes between Python 2.x and 3.x, in Python 3.x the syntax becomes except ValueError, TypeError as err

Exercise

  • does your script fail if you point it at ../bad_sample_data.csv?
    • if not, congratulations; you pass
    • if so, what does the traceback tell you?
  • (if necessary) modify your moduleexercise.py so that it can parse ../bad_sample_data.csv as well as any file in the ../real_data/ directory
    • catch the case where the first column is a quoted, comma-separated name, convert the name to first last rather than last, first
    • assume that missing (numeric) values should be set equal to 0.0
    • assume that comments (lines starting with ‘#’) and blank lines should be ignored

Bonus Exercise

  • modify your script to load multiple files passed from the command line
  • check for duplicate subject names

Using Existing Libraries

  • Generally speaking, you should prefer to use pre-written modules to handle common tasks. The Python standard library and the thousands of Python packages and extensions mean that you normally would not write this type of low-level code yourself.
#! /usr/bin/env python
# reusecsv.py
import csv
lines = list(csv.reader(open('../sample_data.csv')))[1:]

Bonus Exercise

  • rewrite your code to use the python standard csv library to parse the CSV data
  • use the built-in min, max and sum functions to calculate summary information on your columns, rather than using your custom-written functions

Numpy

  • numpy is a powerful package for use in scientific compuation with Python
  • you can readily rewrite many of our samples (and far more involved processes) just by combining the tools Numpy already provides
#! /usr/bin/env python
# reusenumpy.py
from functionarguments import *
import csv, numpy
rows = list(csv.reader(open('../sample_data.csv')))[1:]

column = extract_column( rows,1 )
column = as_type(column,float)
print 'Max of column [1]',numpy.max( column )
print 'Mean of column [1]',numpy.mean( column )
print 'Median of column [1]',numpy.median( column )
print 'Standard deviation of column [1]',numpy.std( column )

Bonus Exercise

  • using numpy, load the sample_data.csv data-set and play with the columns of data to determine what relationship the columns have to one another