Tuesday, September 1, 2009

My Python Journey

Today I was reading this article from Inabow, and realized something.

There is a lot about Python that I haven't got the foggiest about.

So to rectify this problem, I am going to learn something new about the Python ``universe'' every day until I run out of things to learn.  I am also going to document my findings here, since teaching something is supposed to be the best way to learn it.

For my first installment, I am going to talk about the yield statement (and generators, since it's impossible to talk about one without the other), because I have yet to be able to figure out what exactly it does simply by the context that I usually see it in.

Having a basic understanding of Iterators definately helped me understand Generators and yield a little faster.  My understanding is that iterators are what make lists, strings, files, etc. able to be iterated over.  Sorry for using a word to define itself, but basically it means that this is possible:
myList = [1, 2, 3]
for i in myList:
    print i    # 1, 2, 3 

Using a list comprehension is basically creating an iterator:
myIterator = [x * x for x in range(3)]
for i in myIterator:
    print i    # 0, 1, 4

(code excerpt taken from http://bit.ly/2GOT8x, which is also a great yield tutorial)

So now we know what iteration is, but what is a generator?  Basically, since iterators store all of the values in memory, you can use them more than once.  After you call
for i in myIterator:
    print i 

the data doesn't go anywhere.  You can call it again, and the data will be right there.

However, what if the list you are iterating over is huge?  And I mean by-today's-hardware-capabilities-huge?  Do you really want to keep all that data in memory?  If you are going to use it more than once, then the answer might be yes.  But if you are only going to use the data and then throw it away, what is the point of wasting memory like that?  That is where generators come in.  A simple way to make a generator sure looks familiar:
myGenerator = (x*x for x in range(3))
for i in myGenerator:
    print i #0 1 4

It's almost the same as a list comprehension, except without the list form.  The difference is, after you have finished iterating over this data, calling it again means all the data has to be generated again.  Not only that, but once 1 is generated, 0 is gone.  Kaput.  As in, it ain't there no mo'.  Bad for cases where the data needs to be reused (more CPU cycles), but good for cases where the data is used once then thrown away (less memory usage).

So now we get to the real meat of this long winded shite of a post...
The python 2.6.2 docs say this about the yield statement:
The yield statement is only used when defining a generator function, and is only used in the body of the generator function. Using a yield statement in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
Ok, cool.  Sounds good.
But what's a generator function?  We know what a generator is, so it's not to hard to figure this one out.  Here's what the docs have to say:
When a generator function is called, it returns an iterator known as a generator iterator, or more commonly, a generator.
 So a generator function returns a generator.  Makes sense.  One more thing from the docs:
The body of the generator function is executed by calling the generator’s next() method repeatedly until it raises an exception.

And that, my friends is where the yield statement comes in handy.
You see, when yield is used within a function, that function becomes a generator function, which returns a generator.  The yield statement is used almost like the return statement in a regular function, but with one caveat: yield saves the state of the function after it executes.

So: the generator function's body is called, yield outputs a value, the state of the generator function is saved on the stack, the generators next() method is called, and the process repeats.  Sounds great.
For a useful example of this great tool, here's a code snippet from http://bit.ly/yF87Q:
class Permutation: 
    def __init__(self, justalist): 
        self._data = justalist[:] 
        self._sofar = []
    def __iter__(self): 
        return self.next() 
    def next(self): 
        for elem in self._data: 
            if elem not in self._sofar: 
                self._sofar.append(elem) 
                if len(self._sofar) == len(self._data): 
                    yield self._sofar[:] 
                else: 
                    for v in self.next(): 
                        yield v 
                self._sofar.pop()
a = [1,2,3]
for i in Permutation(a):
    print i

No comments:

Post a Comment