![]() |
|
Introduction My notes on the morning session of the Scipy Conference 2007 Tutorials Day 1
Authors Fernando Perez and Brian Grandger
Goals with iPython1 - Make parallel computing easy
iPython Controller - Manages each engine (controls the threads)
- Contols the way clients use and connect to the engine)
Startup Scripts - ipcontroller - start contoller
- ipengine - start engine after contoller
- ipcluster - starts one contoller and N engines on localhost or an ssh based cluster ( will use most of the time )
What can go wrong? - firewall ports must be open
- engine started before controller
- easy to mess up non-default ports
- machine could have multiple network interfaces
- Controller and each engine keeps a log in ~/.ipython/log
If controller fails all the engines will fail.
Windows Usage Will need to have two console windows open. Change to c:\Python2.5\Scripts\(ipcontroller scripts)
With Twisted you can run multiple engines in the same process without using threads Doesn't allow for parallel computing obviously.
Start parallel on multicore laptop prompt> ipcluster -n 4 ( starts 4 engines on local machine, can also be done over ssh connected machines)
The Basics - Start remote controller
- import ipython1.kernel.api as kernel
- ipc = kernel.RemoteController(('127.0.0.1',10105)) # tuple of ip address and port number
- Get IDs
- Execute
- One engine executes on engine 0
- ipc.execute([0,1], 'b=10')
- How do I handle bugs?
- Provides sensible way of handling parallel computing errors and provide a means to debug
- auto parallel
- autopx
- all commands sent to the engines
- type autopx again to turn off
- objects
- ipc.pushAll(a=10)
- send over the wire any numpy objects (including numpy arrays)
- ipc.pullAll('a') # pulls objects for all engines and returns as a list
- dictionary interface
- ipc['b'] = 10 # pushes 10 to all engines
- ipc['b'] # pulls object from all engines
- ipc.keysAll() # See the dictonary contents
- ipc.scatterAll('a',a)
- Evenly divides sequence up between the N engines you have
- scatters array to all engines
- ipc.gatherAll('a')
- gathers array from all engines
- ipc.block = False
- execute will not block
- In non-blocking mode you get back a pending result object
- pr = _
- pr.getResult() # Gets result if done otherwise blocks
- ipc.map?
- parallelize Python's builtin map
- can parallelize things in 1 line
- The remote controller interface does no load balancing
- Underlying controller uses Twisted and sockets
Task Controller Interface Allows for load balancing
Iterators - Object that can be walked
- Chapter in new O'Reilly book "Beautiful Code". One chapter on how Travis Oliphant wrote iterator into numpy.
- def f() # defining our iterator
- yield 1
- yield 3
- print 'hi'
- print 'scipy'
- a = f() # a is a generator
- mergesort example that pushes to multiple engines for processing (nwmerge.py,simpleiter.py)
- For parallel computing with iterators all we need is something that can be walked over.
Scalability - Been run on desktops to super computers
- Can be limited by the number of sockets a system contains.
MPI - All of this integrates with MPI and Python MPI tools.
IPython's Task Controller Interface - Can look like a remote control (low level control)
- Can look like a farming system
- Controller = task master
- Engines = workers
- More powerful but less general than RemoteController interface
- Dealts of specific Engines are hidden
- Both interfaces can be live at the same time. Can import TaskController and RemoteController in the same Python script.
- When tasks fail the Controller will try and repeat them.
Introduction My notes for SciPy 2007 Tutorial Afternoon Session on Idiomatic Python
Author C. Titus Brown -- Caltech post-doc
Testing BoF - Wednesday Night
- Time/Location TBD
Enumerator Idiom - for x in enumerate(z):
- x is a tuple
Data Types - Tuples -- ordered collections of items, like lists that cannot be modified
- Lists
- append, extend, reverse, sort
- def sort_by_second(a,b):
- y.sort(sort_by_second)
- Dictionaries -- an unsorted collection
List Comprehensions - An in place for loop
- z = [ i** 2 for i in range(0,5)]
- Very useful for file parsing and simple math
- Gets difficult to follow for multiple line expressions
- if list comprehension is longer than 1 line break it up into a for loop
Building your own types class A: def __init__(self, item): self.item = item def hello(self): print 'hello,',self.item
x = A('world') x.hello()
- Defining special method names
- __len__ # length function
- __getitem__ # allows indexing
- With these defined the class will behave like an indexable list
- Defining your own mutable types
- need to define both __getitem__ and __setitem__
- useful for defining interfaces for object or relational databases
- can you __getattr__ to figure out what special methods you need to define. have getattr raise an exception
- Defining your own types makes your own code more readable
Iterators in your custom data type - More general implementation of a sequence protocol
- define __iter__
- enumerate is a form of an iterator
Generators - Python implementation of code routines that can be suspended
- def g():
- x = g()
- print x.next()
- print 'second next'
- print x.next()
- Yields state of g at the time of the call to next()
- This can also be done with a list
- No requirement that generators return
- This isn't true for a list since a list needs to have a finite size
Assert - assert 0, assert False, raises an AssertionError and kills program
- In scientific programming you often want a program to fail when your program gets garbage numbers
- assert statements are removed from optimized python code (python -O test.py)
- assert statements should not modify data (don't add side effects to assert functions).
Programming for reuseability - Distinguish between scripts and modules
- packages
- directory with an __init__.py
- Python knows a directory with an __init__.py is a package
- Packages is a good way to organize files
- Packages can be nested
Naming and Formatting - Don't be foolishly consistent
- Use 4 spaces instead of tabs in code
- names
- my_package
- my_module
- my_function
- ThisIsMyClass - for class names
- _my_function -- "_" means down touch (use) this function in a class, private by convention
- Doc strings
- Provides way for python doc utilities to pull out human readable comments from your code
Module imports - create empty references to loaded modules even if they are not used yet
easy_install - easy_install package==versionNumber
- just works
- handles dependencies if they are properly defined
- need to install easy setup
- make sure your code works with easy_install
setup tools - import pkg_resources
- pkg_resources.require(Quixote==1.2')
- for different version
- can set flags with easy install to require different versions of the software
Sets - An unordered collection of values
- s = set((1,2,3,4,5))
- Only keeps 1 value if redundant values are added
- Can union and intersect sets
- Can check for supersets and subsets
- Can convert between supersets and subsets easily
Any and All in Python 2.5 - Any - returns true if any values are true
- All - returns true only if all values are true
- Shorthand way of demanding all or part of something be true in a sequence
Exception hierarchies - Never catch all exceptions:
- if you expect someone to use your code in some other code.
- you could end up silently ignore errors without dealing with them.
- You can re-raise exceptions
- can catch KeyboardInterrupt
- can catch SystemExit(0)
- Interrupts are handled in the order they are specified
- All exceptions inherit from BaseException. Some exceptions inherit from Exception
- Define your own exception:
- class TstException(Exception):
subprocess module - runs a command
- cross platform
- p = subprocess.Popen(['/bin/echo','hello, world'],stdout = subprocess.PIPE)
- (stdout,stderr) = p.communicate()
- p.communicate() very useful
- can pass in message so you know something that works
- communicate tells you that it is done reading from standard input and then continues reading from stdout and stderr so that the process doesn't block
- This module will replace your calls to os.sys in your applications ( claimed by Titus Brown )
- There are a lot of keyword arguments
Testing Software - Testing is a solution to maintenance problems
- 3 types of testing
- Unit tests and functional tests are more expectation based
- unit tests rely on small isolated tests of functionality
- function tests tell you when the code is broken
- unit tests tell you where the code is broken
- doctest
- useful way to write documentation
- a demonstration of the code
- all examples must work
- leads to lots and lots of text in the code
- unittest
- JAVA style unittesting framework
- automated testing framework
- nosetests
- framework that combines all of your tests (unittest, doctests, etc.) into a unified output
- automatically discovers all of the tests
- Allows you to write easy tests without a lot of other overhead
- Adding tests to existing projects
- tests should be simple code
- setup and teardown code can be more complex
- don't rewrite projects from scratch to add testing
- 5 step process for retrofitting projects
- Write test when bug found before fixing bug
- Write complete tests for different portions of code
- Get more and more positive feedback from test
- Repeat until you have diminishing returns
- Use code coverage analysis to get to the parts of code that haven't been covered so far
|