SciPy Conference 2007 -- Tutorial -- Day 1
Last edited August 14, 2007
More by Christopher Hanley »
iPython and Interactive Programming -- Morning Session

Introduction
 My notes on the morning session of the Scipy Conference 2007 Tutorials Day 1
Authors 
 Fernando Perez and Brian Grandger
Goals with iPython1 
  •  Make parallel computing easy
iPython Controller 
  •  Manages each engine (controls the threads)
  • Contols the way clients use and connect to the engine)
Startup Scripts 
  •  ipcontroller - start contoller
  • ipengine - start engine after contoller
  • ipcluster - starts one contoller and N engines on localhost or an ssh based cluster ( will use most of the time )
What can go wrong? 
  •  firewall ports must be open
  • engine started before controller
  • easy to mess up non-default ports
  • machine could have multiple network interfaces
  • Controller and each engine keeps a log in ~/.ipython/log

If controller fails all the engines will fail.

Windows Usage 
 Will need to have two console windows open.  Change to c:\Python2.5\Scripts\(ipcontroller scripts)

With Twisted you can run multiple engines in the same process without using threads 
 Doesn't allow for parallel computing obviously.
Start parallel on multicore laptop 
 prompt> ipcluster -n 4 ( starts 4 engines on local machine, can also be done over ssh connected machines)

The Basics 
  •  Start remote controller
    • import ipython1.kernel.api as kernel
    • ipc = kernel.RemoteController(('127.0.0.1',10105)) # tuple of ip address and port number
  • Get IDs
  • Execute
    • ipe.executeAll('a=5')
  • One engine executes on engine 0
    • ipc.execute([0,1], 'b=10')
  • How do I handle bugs?
    • Provides sensible way of handling parallel computing errors and provide a means to debug
  • auto parallel
    • autopx
    • all commands sent to the engines
    • type autopx again to turn off
  • objects
    • ipc.pushAll(a=10)
    • send over the wire any numpy objects (including numpy arrays)
    • ipc.pullAll('a') # pulls objects for all engines and returns as a list
  • dictionary interface
    • ipc['b'] = 10 # pushes 10 to all engines
    • ipc['b'] # pulls object from all engines
    • ipc.keysAll() # See the dictonary contents
  • ipc.scatterAll('a',a)
    • Evenly divides sequence up between the N engines you have
    • scatters array to all engines
  • ipc.gatherAll('a')
    • gathers array from all engines
  • ipc.block = False
    • execute will not block
    • In non-blocking mode you get back a pending result object
      • pr = _
      • pr.getResult() # Gets result if done otherwise blocks
  • ipc.map?
    • parallelize Python's builtin map
    • can parallelize things in 1 line
  • The remote controller interface does no load balancing
  • Underlying controller uses Twisted and sockets

Task Controller  Interface
 Allows for load balancing
Iterators 
  • Object that can be walked
  • Chapter in new O'Reilly book "Beautiful Code".  One chapter on how Travis Oliphant wrote iterator into numpy.
    • def f() # defining our iterator
      • yield 1
      • yield 3
      • print 'hi'
      • print 'scipy'
    • a = f() # a is a generator
  • mergesort example that pushes to multiple engines for processing (nwmerge.py,simpleiter.py)
  • For parallel computing with iterators all we need is something that can be walked over.

Scalability 
  •  Been run on desktops to super computers
  • Can be limited by the number of sockets a system contains.

MPI 
  •  All of this integrates with MPI and Python MPI tools.
IPython's Task Controller Interface 
  •  Can look like a remote control (low level control)
  • Can look like a farming system
    • Controller = task master
    • Engines = workers
  • More powerful but less general than RemoteController interface
  • Dealts of specific Engines are hidden
  • Both interfaces can be live at the same time. Can import TaskController and RemoteController in the same Python script.
  • When tasks fail the Controller will try and repeat them.
Idiomatic Python -- Afternoon Session

Introduction 
 My notes for SciPy 2007 Tutorial Afternoon Session on Idiomatic Python
Author 
 C. Titus Brown  -- Caltech post-doc
Slides for 3 day course on advanced topics 
 http://ivory.idyll.org/articles/advanced-swc/
Testing BoF 
  •  Wednesday Night
  • Time/Location TBD
Enumerator Idiom 
  •  for x in enumerate(z):
    • print x
  • x is a tuple
    • a,b = b,a
Data Types 
  •  Tuples -- ordered collections of items, like lists that cannot be modified
    • a,b = b.a
  • Lists
    • append, extend, reverse, sort
      • def sort_by_second(a,b):
        • return cmp(a[1],b[1])
      • y.sort(sort_by_second)
  • Dictionaries -- an unsorted collection
    • Key, value pairs
    • items()
List Comprehensions 
  •  An in place for loop
    • z = [ i** 2 for i in range(0,5)]
  • Very useful for file parsing and simple math
  • Gets difficult to follow for multiple line expressions
    • if list comprehension is longer than 1 line break it up into a for loop
Building your own types 
  •  General Classes
class A:
    def __init__(self, item):
       self.item = item
    def hello(self):
       print 'hello,',self.item

x = A('world')
x.hello()

  • Defining special method names
    • __len__ # length function
    • __getitem__ # allows indexing
      • With these defined the class will behave like an indexable list
  • Defining your own mutable types
    • need to define both __getitem__ and __setitem__
    • useful for defining interfaces for object or relational databases
    • can you __getattr__ to figure out what special methods you need to define.  have getattr raise an exception
  • Defining your own types makes your own code more readable
Iterators in your custom data type
  •  More general implementation of a sequence protocol
  • define __iter__
  • enumerate is a form of an iterator
Generators 
  •  Python implementation of code routines that can be suspended
  • def g():
    • for i in range(0,5):
      • yield i**2
  • x = g()
  • print x.next()
  • print 'second next'
  • print x.next()
  • Yields state of g at the time of the call to next()
  • This can also be done with a list
  • No requirement that generators return
    • This isn't true for a list since a list needs to have a finite size
Assert 
  •  assert 0, assert False, raises an AssertionError and kills program
  • In scientific programming you often want a program to fail when your program gets garbage numbers
  • assert statements are removed from optimized python code (python -O test.py)
  • assert statements should not modify data (don't add side effects to assert functions).
Programming for reuseability 
  •  Distinguish between scripts and modules
  • packages
    • directory with an __init__.py
    • Python knows a directory with an __init__.py is a package
  • Packages is a good way to organize files
  • Packages can be nested
Naming and Formatting 
  •  Don't be foolishly consistent
  • Use 4 spaces instead of tabs in code
  • names
    • my_package
    • my_module
    • my_function
    • ThisIsMyClass - for class names
    • _my_function -- "_" means down touch (use) this function in a class, private by convention
  • Doc strings
    • Provides way for python doc utilities to pull out human readable comments from your code
Sharing Data 
  •  Avoid sharing data
Module imports 
  •  create empty references to loaded modules even if they are not used yet
easy_install 
  •  easy_install package==versionNumber
  • just works
  • handles dependencies if they are properly defined
  • need to install easy setup
  • make sure your code works with easy_install
setup tools 
  •  import pkg_resources
  • pkg_resources.require(Quixote==1.2')
  • for different version
  • can set flags with easy install to require different versions of the software
 Sets
  •  An unordered collection of values
  • s = set((1,2,3,4,5))
  • Only keeps 1 value if redundant values are added
  • Can union and intersect sets
  • Can check for supersets and subsets
  • Can convert between supersets and subsets easily
Any and All in Python 2.5 
  •  Any - returns true if any values are true
  • All - returns true only if all values are true
  • Shorthand way of demanding all or part of something be true in a sequence
Exception hierarchies 
  • Never catch all exceptions:
    •  if you expect someone to use your code in some other code.
    • you could end up silently ignore errors without dealing with them.
  • You can re-raise exceptions
    • can catch KeyboardInterrupt
    • can catch SystemExit(0)
  • Interrupts are handled in the order they are specified
  • All exceptions inherit from BaseException.  Some exceptions inherit from Exception
  • Define your own exception:
    • class TstException(Exception):
      • pass
subprocess module
  • runs a command
  • cross platform
  • p = subprocess.Popen(['/bin/echo','hello, world'],stdout = subprocess.PIPE)
  • (stdout,stderr) = p.communicate()
    • p.communicate() very useful
      • can pass in message so you know something that works
      • communicate tells you that it is done reading from standard input and then continues reading from stdout and stderr so that the process doesn't block
  • This module will replace your calls to os.sys in your applications ( claimed by Titus Brown )
  • There are a lot of keyword arguments
Testing Software 
  •  Testing is a solution to maintenance problems
  • 3 types of testing
    • unit tests
    • functional tests
    • regression tests
  • Unit tests and functional tests are more expectation based
  • unit tests rely on small isolated tests of functionality
  • function tests tell you when the code is broken
  • unit tests tell you where the code is broken

  • doctest
    • useful way to write documentation
    • a demonstration of the code
    • all examples must work
    • leads to lots and lots of text in the code
  • unittest
    • JAVA style unittesting framework
    • automated testing framework
  • nosetests
    • framework that combines all of your tests (unittest, doctests, etc.) into a unified output
    • automatically discovers all of the tests
    • Allows you to write easy tests without a lot of other overhead

  • Adding tests to existing projects
    • tests should be simple code
    • setup and teardown code can be more complex
    • don't rewrite projects from scratch to add testing
  • 5 step process for retrofitting projects
  1. Write test when bug found before fixing bug
  2. Write complete tests for different portions of code
  3. Get more and more positive feedback from test
  4. Repeat until you have diminishing returns
  5. Use code coverage analysis to get to the parts of code that haven't been covered so far

The content on this page is provided by a Google Notebook user, and Google assumes no responsibility for this content.