ALEX MARTELLI: Hello everybody, and welcome to the second part of the “Python for Programmers” talk, also known as beginner’s night around here The URL here is where you’ll be able to find all of the slides for this talk as a PDF, so easy to check If you want to write it down, this would be a good moment OK, now this is a description of the audience of this talk, a bit less crowded than the one I imagined when I wrote it But you’re experienced programmers You know what programming means, maybe in C, maybe in Java, or maybe in Ruby You have some modest previous exposure to Python, like we did in the first night, which is here Unfortunately, it looks like the video of the first night of “Python for Programmers” didn’t actually make it to Google video There is, however, a video of another talk of mine using roughly the same set of slides, pitched very differently because I was talking mostly to an audience of my colleagues at Google Don’t try to write this down Use the aleax.it/bayp_ [UNINTELLIGIBLE] and just, you’ll find this on the second slide Some parts will be very fast and technical And I apologize in advance, the formatting, the choice of names, and idioms and so on is kind of constrained by the fact that I want to fit complete examples on a single slide So don’t imitate this style unless you have to program on slides as well Also, a shameless self-plug, some of the examples are adopted, simplified, and without all the wonderful discussions You should really, definitely read the book by the book I coedited with my wife, Anna, second edition of the Python Cookbook All righty, so we have built-ins and we have libraries The built-ins are always accessible any time you use Python And the crucial idea behind a built-in is that you should never call a special method directly on an object I mentioned that, in the first part, we make the named special methods ugly on purpose, so as to deter people from actually going and calling them So for example, if you want to know the absolute value of a number, you do not call x.__abs__() You call abs(x), using the built-in abs for absolute value Why is that important? Well, because the built-in can implement a sort of template method design patent It can try to see if the object has a special method for absolute value If the doesn’t, maybe it can compare it to 0 and try to change sign if needed It can do a lot of things in a predetermined order to get your result So never called the special method directly Use any and all of this enormous list of built-in types and functions, which we’ll then proceed to present with a bit more order They’re just in alphabetical order here Notice that these aren’t all of them There’s many more These are those that you really should know to be good Python programmers And if these look like a lot, you have to consider there’s many more, both functions and types, in the standard library and in third-party Python extensions And we’ll see a lot of the built-ins, so several of the standard library stuff, and some mention of third-party extensions So for known built-ins, for anything you get from the standard library or a third-party extension which you have installed, you have to use either the import statement, import module name, or the from statement, from da, da, da, import module name I suggest very strongly, as a Python best practice, that what you have after the word “import,” whether it’s the statement or the close of the from statement, is always a module name Python let’s you import just about anything, but don’t It’s very confusing to the reader of your code if you import some modules, some classes, some functions, and then just use them

Just import the module, and then use the normal syntax module name, dot function name, to access the function within the module If that’s impractical because a module name is very long, or because two models from different packages conflict, you can change the name with which you refer to it with the “as something” close, and then use the something instead of the real name Note that I break these rules even in these examples for the usual reason I really want to be compact and fit a lot of stuff within every slide So a very old but still valid slogan for Python is “Batteries included.” The standard Python library is very rich Probably nowadays is not quite as rich as the Java library, but when the idea was coined, like 10 years ago, there was no language around a standard library as big as this There’s so much stuff you can’t really think you can memorize it all There’s 190 top modules, 13 packages with 300 modules in them, plus encodings, unit test And then we’re in tools, which contain a lot of extremely useful code as well Demo and Tools need not be packaged with predefined distributions of Python So something I always suggest to anybody learning Python, no matter what you choose to install it or if it comes with your machine, like it does with Ubuntu and many other Linux distribution nowadays, or it does with Macs of all sorts, gets the sources as well Just the ability to access all this stuff in Demo and Tools is well worth the little inconvenience of a download But this is just what’s included with Python There are other batteries, plenty of other batteries The official URL nowadays is pypi, which isn’t particularly catchy The name of the thing used to be the “Cheeseshop,” in reference to a Monty Python sketch However, it was always a very inappropriate reference, because the point of the cheese shop in the sketch is that it’s very clean but it doesn’t actually have any cheese, while pypi does have a lot of packages, each of them one or more modules I noticed, the first time I prepared the very first version of this talk a few months ago– that was April– I checked and it was the day of destiny We had exactly 2,222 packages registered on the Cheeseshop I checked yesterday, actually, not today, and exactly 400 more, so I like round numbers So that’s exactly four months, exactly 400 packages more, so we’re growing at exactly 100 packages a month So that’s easy to explain and to remember And unfortunately, the classification doesn’t really help much Like, more than 1/3 are classified as software development Well, yeah, every piece of library can help me do software development That doesn’t really help But nevertheless, there’s plenty of interesting stuff, including, as is typical of these repositories of third-party modules, many ways to approach a single problem, which is very unPythonic Unfortunately, there is no central authority saying, OK, so there are these seven packages to do graph processing We’ll pick this one All seven are going to be there, and you’ll just have to have a look, look for reviews, give them votes, and so on So at a very superficial level, what are all these third-party extensions? Well, there’s a lot of graphical user interfaces, databases and wrappers for databases, a lot of computation stuff, huge lots of network-oriented and web-oriented stuff, development environment and tools, stuff for games, multimedia, scientific visualization, integration with all sorts of languages, and more, and much more Like, for example, nowhere in here is it, what about the thing that lets you control Open Office from Python script, all of the Open Office application for Python? Well, it doesn’t really enter any of these, but it’s still there, as are 2,000 more So what do you do with a third-party extension? You have to install it on your computer There are several approaches The most traditional one is, you unpack it, typically a tar.gz or zip file It contains a setup.py file, and you just run this with

your Python and the install verb And depending on how your machine is installed, you may need to become route or use pseudo if the Python installation is placed in a directory where you can’t write The most modern idea is eggs, not necessarily this kind, which are part of the setup tools Some people love it Some can’t stand them And I plead the fifth about which my [UNINTELLIGIBLE] For Windows, many packages come with self-installing programs, like .exes These are very popular in the Windows world And similarly, for the Mac, you may find [INAUDIBLE] packaged as disk images, and so on and so forth And for Linux distributions, particularly, a lot of the most popular packages come in the various .deb or .rpm forms of the various distributions, so you can use apt and get For Mac, in particular, there’s a repository of prebuilt stuff You don’t really need it, because every Mac comes with a great C compiler for free But people often don’t install it, so you may want to use this One way or another, you’ve got to get the extension installed And now, finally, we get into the meat of the library We’ll start with the most elementary type, numbers Python has several built-in types of number Int and long, this is a distinction which is traditional The int is whatever fits in your machine word, typically 32-bit or less And the long is anything else, limited only by your available memory The distinction will collapse in Python 3000 Every integer number will be able to grow as much as needed It’s already kind of like that, but, like, if you ask for the int of 999 to the 999 power, obviously it doesn’t fit in 32 bits It gives you a long Even though you’ve asked for the int, it gives you the long anyway Float, or actually double-precision floating point number, and complex As an old Fortran programmer, I never understood why most languages forgo the wonders of complex numbers Notation is like digits And if there are few, you’ll have an int If there are too many, you’ll have a long You can use exe decimal You can use a dot to make a floating point number And you can use the plus and the j to make a complex number The operators are arithmetic and bitwise More or less what you’d expect Star-star is raised to power Again, as an old Fortran programmer, I don’t understand why most languages don’t have an operator [INAUDIBLE] raised to power And the // is divide with truncation, while the /, at least in Python 3000, will be without truncation Right now, to have division without truncation between integers, you have to use the from future import, which was saw last time Several built-ins are targeted to number We’ve already mentioned abs or absolute value, minimum and maximum You pass them in a sequence of numbers, and they give you the smallest or biggest, respectively of course If you should try to use min and max on a sequence of complex numbers, they will raise an exception because there is no such thing as comparison greater and smaller in complex numbers Pow is like the star-star, but it has a three-element form to do exponentiation module, or something Round applies to float, or rounding them to a certain number of digits Sum sums the sequence The standard library has several modules, such as math for, like, a trigonometric function, lower than exponentials, and so on, and cmath, for the same thing applying to complex numbers And it also offers another type Decimal is the name of the module, Decimal with an upper case D is the name of the type, which is a floating point decimal Standard floating point is binary So basically it gives you a well-known number of digits, which may be very important, particularly if you do financial computations And it supports exactly the same operators and built-in functions as any other kind of number And it has some methods, like a d.sqrt method gives you the square root as a decimal In the third-party world, there’s plenty of number stuff I point out gmpy because it’s my baby It’s a module for unbounded precision computations It’s got integers, rational and floating point numbers, with unlimited number of bits Of course, it does things in software, so it’s going to be slower

And it’s got, again, support for all the same operators So if you are summing two numbers, you’re going to use plus whatever kind of numbers they may be, and a lot of method and function And there’s plenty of other stuff that you may be interested in Clnum essentially covers the same space as gmpy, but it’s implemented in pure Python, very nice for learning the algorithms behind infinite precision or arithmetic Gmpy is very much faster, of course Dmath, and there are other alternatives, [UNINTELLIGIBLE] It’s kind of like math and cmath, but it applies to decimals, so you can do sine and cosine in decimal if you need to Pypol, that’s polynomial as a number type This also has rational functions This adds some other kind, yet, of complicated functions But it does give you one very simple example One typical thing you may want to do if you’re into number theory is compute the factorial of a number This is sometimes thought of as an expensive operation, simply because the results seem to be a bit big For example, the factorial of 210 is the following 399-digit numbers Gmpy has a fact function to compute factorials And how long does it take? With the timeit module we can import it from the standard library and use it to measure speed To do a million of these computations, 29 seconds So computing the 400-digits factorial will take you about 29 microseconds on a typical laptop of today If you have a good machine, it will be less, of course But for a laptop, it’s not too bad Another elementary type, which is extremely important, right next to numbers are strings Python has two types, string proper, or plain, which uses bytes, and unicode, which uses unicode characters And that is unfortunate, because it complicates things a bit But it’s a legacy aspect We started out with only single-byte character strings, and now we’ve added the unicode ones But keeping backwards compatibility, Python is always very keen on that The whole point of Python 3000 is that it, being a major release– technically, it’s 3.0– it can and will break backwards compatibility So strings in Python 3000 will be exclusively unicode, like in Java, making the language much simpler How do you denote a string with double codes? It’s irrelevant Or as row, which means escape sequences aren’t expanded, like \n in a normal string stands for one single character which is end of line, like in C. But if you want this to mean the character \ followed by the character n, you can use it as a row And the u version of all of these give you unicode strings You can concatenate to string by summing them You can multiply a string by a number to have a certain number of repetition You can format string with a very rich format language, which is kind of like printf This is rather unPythonic In Python 3000, instead of this, we’ll have a format method on string objects, so taking a simpler language The built-ins are for– those of you with a Pascal background should recognize it Chr, are given an ASCII code, returns a character, or given a character, returns its ASCII code Actually, ord also works in unicode characters to produce unicode characters from its unicode number, code number You will use unichr Strings are redundant sequences They’re not exactly containers You can’t put things in there But they’re sequences, meaning you can go, how long are they? You can do indexing and slicing Like in other sequences, you can loop over them And they’ve got so many methods I’ve lost count Capital I returns a copy of the string with the first character turned into upper case Center returns a string patterned with blanks to be centered in a certain field Our standard library’s full of interesting modules to deal with strings String itself is something useful Re gives you regular expression, very close to Perl’s Struct lets you pack and unpack binary values into a string cStringIO lets you use a string as if it was a file, so you can write into it and then get the value or read lines from it, or something like that Textwrap lets you format strings, so like flowing paragraph things Codecs and unicodedata give a lot of preparation for

transition between unicode and byte representations, or analysis of a unicode string What kind of characters does it contain? And stringprep essentially implements [UNINTELLIGIBLE] of C about string preparation for network travel And again, there’s plenty of third-party stuff Some of it is very funny, like Anagrammer, which is a module to help you do anagrams given various possible dictionaries It supports many languages To the extremely useful, like Lupy is a Python-compatible interface to Lucene, which is a very popular open source search engine PyNum2Word, you give it a number and it spells it out in words, which can be fun or useful PyStemmer, you give it words, it extracts the semantic stem of the word, so singular form, or base form of a verb, and something like that And of course, a huge amount of XML processing That’s technically text but not All that’s significant When you make a big string from pieces, like this ball of strings here, if I could say just one thing to a new Python programmer, is do not ever code this, never, ever Forget it It doesn’t exist. Forbidden Why not? Well, if you are into Java, you know that you don’t code this in Python for the same reason you don’t in Java Strings are immutable Every time you do the big += p, the old value of big gets thrown away, and the new value of big, new string, gets constructed and bound to that name Say you’re doing character by character, first time you allocate and copy one character, then two, then three, then four, then five, then six Each of these is a pretty costly operation, because there’s a free and an allocate Plus, the total amount of bytes you’re copying is 1 plus 2 plus 3 plus 4, sum of i for i from 1 to n, n times n plus 1/2, minus 1/2 So you’re doing a quadratic amount of operation if you ever code this in Java or Python So don’t You just take the sequence of pieces, make it up if you need, build the list. Building the list is very fast Appending to a list is constant-time And then you join them You use a join method of the empty string And if you really hate this, you could use cStringIO, but my wife was horrified when I mentioned that, so I don’t think I’ll get deep into that Moving on from elementary types, such as numbers and strings, we have a ample choice of container types The most elementary one, because it doesn’t have any method or anything, is the tuple, normally denoted by parentheses, though the parentheses are generally optional Or you could do it with a key word, with a name of the type tuple And commas, which are what really denotes the type Note the need for the comma here (23) is not a tuple It’s an expression which is worth the number 23 The parentheses are just overloaded to mean arithmetic precedents The comma is what makes this a tuple And the parentheses might not be needed If you write, a=23, a takes a tuple The fact that there are not parentheses is irrelevant The parentheses are not syntactically necessary Lists are maybe clearer They’re a mutable sequence So the name is a bit tricky for some people They’re not linked lists They’re a compact in memory structure, kind of like a vector or array, if you want And there is no need, although you can have an extra redundant comma at the end That’s permitted but it’s not necessary in this case Again, you can call the type to make one, like a constructor, so to speak Set and frozenset Frozenset isn’t very often used, but it’s an immutable version of the set But set is a simple hashtable So the items need to be hashable, which normally means immutable, like numbers and strings And this one doesn’t have a special notation It will in Python 3000 but not now To make an empty one, you just call it without any argument, as you could do with list for tuple If you want, list () means an empty list, just like [] Or you can pass it any sequence

Note that if we didn’t have the comma here, it would be an error, because you cannot make a set by passing it a number You have to pass it a sequence of numbers So in this case, we’re passing a sequence of a single number, a tuple with a single element Or of course, we can pick a string, which is a sequence of characters A dict, short for dictionary, is also a hashtable, but a hashtable used to do key to value mapping So it uses braces as a constant notation, and colon to separate the key from the value, and commas to list them Or if you want to call the type, you typically pass it keyword argument This uses ci, two-character string, as the key, ao, other two-character string, as the one value corresponding to that one string All containers support len Yes AUDIENCE: Does the ci need quote marks? ALEX MARTELLI: No That’s a big point This is key word notation, so no quote marks are needed Actually, they would be syntactically incorrect I’ll get back to that Len of a container is the number of items it contains You can always loop on a container And you can always check if something is a member of the container with a if in What this means is pretty obvious for tuples, distant set For dictionaries, both the new looping and the membership testing are only about the keys So if a 23 is a key in this dict, it doesn’t go and look at the values, only at the keys And similarly, for x and dict, we’ll give x all the values, all the keys in the dictionary, one after the other The order in looping, on either a set or a dict, is just about anything it wants, because it’s a hashtable It doesn’t bother with order Except for tuple, the rest have a lot of very useful methods And set only, and frozensets, also have operators, because it’s sometimes a more natural way to use them For example, to do the union of two sets, you can use a vertical bar, which used on integers, it would be the bitwise, or used in sets, it’s union How do you make a list? this is a little recipe, pretty important You don’t do this This would take to the list containing the single item, 0, and repeat it NC time, so that gives you one list with NC 0’s And so far, so good But then if you multiply that, make a list of that and multiply that by NR, then you get NR times the same list, which is hardly ever what you want With this, the list comprehension notation, you get a new list for every row You don’t need it both ways, so if you don’t mind the asymmetry, this is also perfectly acceptable Another trick which is very important, if you want to make a list of words containing no spaces– that happens very often– this is the canonical way But note that my favorite way, which is to just have the string and split it is a bit shorter Which is very important when you’re fitting code on slides Actually, I find this more readable because it has less punctuation This form has the brackets, a lot of quotes, the commas I really like this So much cleaner Now, tuples, as I mentioned, you don’t really need to have parentheses, just commas And one thing they’re useful for is just a group set of fields In this case, firstname, lastname, year, you can unpack them by assignment The advantages of tuples over lists is mostly that they can be elements of sets, keys, and dictionaries, and so on, because they’re an immutable container To make a dictionary, this is the standard form But if all keys are valued identifiers, you can use the key word form I personally find it more readable, but again, that’s only me It’s got less punctuation, which I like it If you want to use attributes, syntax for access, to access this, it would have to be d {‘zip’} A little bit too much punctuation If you use this trick from the cookbook, you can then access b.zip The only problem is that all the keys need to be valued identifier So for example, if one of your keys is the word “print,” you

cannot do this, because print is a key word in Python, so you cannot use it as an identifier Operation you often want to do is, not just go and get the value, and if it’s not there, will give you an exception, but get it if present and get something else if the key’s not there Dictionary support is directly The get method of dictionary takes the key and a default value, and returns either the value corresponding to this key, or the default value if the key is not in the dictionary For a list, you must implement it with a small function And there’s two approaches One is just try doing it, just try returning the ith element If this gives an index error, well, the index is outside the bounds of the list, and then and only then return default This is very elegant and compact Unfortunately, it’s really only acceptable if– most times, either you don’t care about speed, which is in some cases a possibility, or almost always the index will be OK And it’s only going to raise the exception a few cases, because raising exception and catching them isn’t exactly fast This alternative is less elegant, takes more reading and so on But if you have a lot of cases in which the index will be out of bounds, this will be much faster Note that the index can be up to negative len included, or up to positive len excluded, because the index go from 0 to len minus 1, meaning left to right And then start counting minus 1 to minus len when you’re going from the right So to copy objects, you copy just about anything You can import the standard library module copy and just go copy.copy That’s a shallow copy It’s a copy of the container, but all the items contained are identical, which doesn’t matter if they’re immutable, like numbers or strings, but sometimes could be confusing if you’re basically sharing two references to the same, for example, set For those rare cases where you really want a complete duplication of the graph of objects contained and so on, you will use deep copy Be warned, it will take a lot more memory and it will be much slower Another possibility is, if you know exactly what type you want in return, copy.copy goes to quite some trouble to find out exactly what type x is, and give you another x If you know you want a list, it’s much simpler to call list of something If you know you want a set(aset) of something, if you know you want a dict(adict) of something, if you know you want a tuple(atuple) of something– I’m sure you can generalize from here– in this case, it doesn’t really matter if what you’re passing is of the type It guarantees a return value, because it’s like a constructor Some people like a weird syntax, which Python happens to support only for lists and tuples, which is {:} I don’t think it makes any sense to use it ever, but I warned you because you may seek it in existing Python code I think calling list of the other list is so much more readable and pronounceable Just imagine reading your code over the phone Also, you can call all the types without any argument to make a new empty container That’s maybe not as important because, to be honest, [] is a perfectly acceptable way to indicate a new empty list, but this is OK, too So strings, tuples, and lists are sequences Dicts and sets are not, because essentially they’re not really order sensitive They’re in any which way The order doesn’t really matter But strings, tuples, and lists are in a certain specific order, which is exactly the order in which you build them, and so they’re sequenced Every sequence can be repeated and catenated, as I mentioned earlier for strings They can all be indexed, and that also goes for dicts But in the case of sequences, index is always an integer It can be 0 to len minus 1 as a positive index, or negative to an index from the right They can also be sliced You can pass two indexes with a colon, or three indexes with two colons And this basically means from the first one included to the last one excluded And in the three form, you also give the step at which indexing must proceed So for example, ciao of 2 is a, 0, 1, 2

Indexes always start from 0 Ciao of 3:1 minus 1 Let’s work it out together What’s 3? It’s o It goes by a step of minus 1, so backwards, up to 1 excluded So oa, and that’s the result The fact that a first bound is always included in the last bound, is always excluded, is one of the best principles of programming I met it first in the late ’70s, early ’80s, in Andrew Koenig’s book, C Traps and Pitfalls And I really, really suggest you read it, because it’s a great piece of advice Python does its best to almost always impose it on you Lists are mutable sequences, so you can assign to the indexing or slice The assignment to slice can change length, because you can have on the left of the equal, I have a three-element slice, on the right a seven-element slice It will stretch the list to fit things in That doesn’t apply to extended indexing That needs to keep exactly the same length And as I mentioned, that doesn’t apply to dicts and sets So let’s see some example of these This is a looping over index and something iterable I’ve seen many people code, because they come from languages where loops are mostly things for integers, for i in range of len of a list, item equal a list of i, blah, blah, blah This is not good If you need the index, then loop over the enumerate of the list, which is the pair of index value, index value, index value And then you have the value, for example, to check And you have the index, for example, if you need to assign back there A good alternative, of course, is a list comprehension Just build the new list. In this case, we could use the if L standard reconstruction of Python 2.5 I don’t find it very readable, to be honest. But I guess it’s only been a few years, so I will eventually get used to it It’s what happens if the condition is satisfied? If the condition else, what happens if the condition is not satisfied? All containers are iterable Other things are iterable, too, genexps, generators, all stuff we saw last time And the built-ins are defined to work on iterables So you don’t have necessarily to know what kind of iterable you’re working on The built-in will deal with that, just as you could if you did a four item in whatever iterable this is, colon I already mentioned most of the constructors There’s a frozenset which I just barely mentioned Range, to make a list that goes, essentially, like a slice, from something included to something excluded by some step Xrange is almost like range but not quite, because it’s not a full-fledged list. It’s something you can only use as an iterable There’s many accumulators which take an iterable and boil it down to a value For example, o o takes an iterable and checks the truth of its items. We mentioned the concept of truth testing last time If every single item in the iterable is true, then all the iterable is true If any, even one, is false, then all immediately terminate and it’s false So for example, you want to check that every item in our list is greater than 7, 0 open paren, item greater than 7 for item in list, close paren This will return true if and only if every single thing in the list is greater than 7 Similarly, returns true if any of the items in the iterable is true Otherwise, soon as it finds a false one, it returns false I’m sorry, the vice versa As soon as it finds the true one, it returns true Otherwise, if it gets to the end and none was true, then any’s false So any of an empty sequence is– who thinks it should be true? Who thinks it should be false? 99% Yes, it’s false There isn’t any, so no, false AUDIENCE: It can’t be true if there’s nothing true ALEX MARTELLI: There’s nothing

Is there any policeman higher than seven feet? None? No, empty sequence So no, false, obviously o, vice versa, what should o be for an empty sequence? Who thinks it should be true? Who thinks it should be false? OK, a bit more confusion here Are all policeman with green skin taller than eight feet? Well, there aren’t any policeman with green skin That’s the assumption So can you find a single counter-example of a policeman with green skin that is– No, you can’t So o of an empty sequence is true So len, we already mentioned, is the len max and min, or the highest and lowest. And sum is a summation Be careful about sum Yes? AUDIENCE: I have a question on len Does it calculate len of a sequence? Or does it remember len? ALEX MARTELLI: It doesn’t actually work on every iterable It only works on iterables that expose the special method under, under, len, under, under AUDIENCE: Then it– ALEX MARTELLI: It just does the sequence, how long are you AUDIENCE: [UNINTELLIGIBLE]? ALEX MARTELLI: The sequence is supposed to– Yes The sequence is supposed to– AUDIENCE: [UNINTELLIGIBLE]? ALEX MARTELLI: If the sequence doesn’t cooperate, then you get an exception AUDIENCE: I was asking because I was curious about your list get methods, that you wrote two of them, one of them called len twice And yet, it was faster than the try/except ALEX MARTELLI: Oh yeah It’s calling a simple built-in that just calls the special method in the object is very fast. It takes nothing Yes? AUDIENCE: Is there a perfect hash, like frozendict? ALEX MARTELLI: No, there isn’t an immutable version of dict, not in the standard library or built-ins OK So sum is intended to sum a list of numbers, or sequence of numbers I’ve seen people use it to sum a lot of lists That is not good for the same reason for which it is not good to make a big string out of small pieces by summing You should use extend in the case of list. And you should use join in the case of strings and so on Sum is really intended for a sequence of numbers Indeed, if you don’t pass it an initial value, it chooses 0 So it works fine for sequence of numbers, and that’s all it should be used for OK, I’ve already mentioned enumerate Iter simply takes an iterable and returns an iterable Basically, you can make multiple iterators on the same list, for example, by repeatedly calling iter, and they will, on the same underlying list, they will advance independently Map is a bit complicated and kind of out of– But it basically calls a function on every item of a sequence Reversed gives you the reversed sequence Sorted gives you the sorted sequence Of course, the items must be comparable And zip takes two or more sequences and gives to the sequence of tuples one per So here’s a few examples Frequent task is to invert a dictionary So as long as you have the dictionary with values which are also hashable and are unique– there’s no duplicate value– this is a simple way to invert a dict The iter item’s method lets you iterate one after the other on the key value pairs So we loop on key value And we simply want value in key swapped Of course, if it is not unique, then this would basically associates to the value, a arbitrary one of the many keys mapping to it That’s hardly ever what you want And you can do it another way, like build a dict of list and so on, but that’s not exactly inversion, so this is inversion Another possibility– Yes? AUDIENCE: Why would you use iteritems instead of just items? ALEX MARTELLI: Items returns a list. Building a list only to iterate in it is wasted effort If the dictionary is very big, the amount of memory that it needs to allocate to build the items can be proportionally just a bit In Python 3.0, a crucial simplification, all of these methods will return iterables, kind of like looking a set, set immutable or [UNINTELLIGIBLE] reset, rather than a list Originally in Python, a list was basically the only type of sequence, and so it was very natural to use that OK, this basically builds a dictionary from the list. So

that for any value, we assume all the values are hashable and therefore acceptable as keys It gives you the index So this is a very useful if you happen to do a lot of– You would normally do a lot of this list dot index something, dot index something else Every time you call index on a list, it has to basically start from the beginning, and scan until it finds it In this way, as long as there’s no duplicates, you basically build it once and for all If there is a duplicate value, then the later index will prevail, because this goes from left to right If you want the earlier one to prevail as it would in the method index, you need reversed So you need to reverse the list And that’s not trivial Yes? AUDIENCE: Does reverse return a generator or a new? ALEX MARTELLI: Reversed returns an iterator Yes, not a generator But it takes a sequence, a general iterator AUDIENCE: And sorted as well? ALEX MARTELLI: No, sorted would need– sorted can take anything, any iterable, and return a list Because there’s no way to sort anything until you’ve seen all of it You cannot admit the first item until you’ve seen all of them, because it could be that the last one is going to be first, according to the gospels And this is the way to build a dictionary from the keys and values using only built-ins The zip takes the keys and values and makes the sequence of pairs, not list, actually And dict takes the list of pairs and makes them into a dictionary We start with a dictionary, and we want a list of the keys in the dictionary, each of them with its value So sorted(d) takes the dictionary, meaning the keys, of course, and returns the sorted list. Then we loop over it, and we built the list of tuples So there’s all sort of manipulation of this kind you can do And the cookbook has an entire chapter called “The Shortcuts,” explores all of them And I just picked four that I consider particularly significant Sorting, of course, is very important This woman is sorting coffee beans AUDIENCE: So on the review slides ALEX MARTELLI: Yes AUDIENCE: The sorted dict, could you, in most cases, do sorted [UNINTELLIGIBLE] items? ALEX MARTELLI: I don’t remember ever doing so It wouldn’t make any difference, because– Well, it wouldn’t make any difference in any sensible case It just [UNINTELLIGIBLE] it So comparison is lexicographic So if you sort a list of tuples, and there is never any two tuples where the first item is the same, then it’s just a wasted effort to consider the second one And in a dict, keys are unique, by definition So there cannot be ever any consideration given to the further item of the tuple AUDIENCE: So it should be equivalent without the– So you’re not doing the extra list comprehension on the outside Of if you just do d.items, then that’s sorted ALEX MARTELLI: Oh yeah, that’s another possibility This is the fastest, very simply So that’s why I give it At least it’s the fastest last time I measured It could be something else has been optimized recently for a sufficiently big dictionary So this is a very important feature of sorting, the key= optional parameter If you want some sequence sorted on anything except the natural lexical graphic comparison of elements, you pass the key extraction function to either the sort method or the sorted built-in with key= So in this case, the sort– which is stable, anyway– will happen on this modified version

This is basically a precooked and highly simplified and extremely much faster version of the pattern which I used to call DSU, decorate-sort-undecorate But you don’t need to know that Simply, this prints all the strings in their original casing, but they’re sorted in a casing-sensitive way So they’re stable if you have a lower-case-a animal and upper-case-A animal, whichever one comes first in the sequence xs will come first in the output And this is the classic quicksort in three lines, taken from Haskell It’s a divertissement, because Python’s own sort is much better But it’s fun to see, and it’s the simplest way to teach quicksort Essentially, quicksort means that, recursively, quicksort, everything that is smaller than the [UNINTELLIGIBLE], then at the [UNINTELLIGIBLE], and then recursively sort everything that is larger than the [UNINTELLIGIBLE] And this is quicksort It’s kind of more elegant in Haskell And of course, it’s only three lines if you have a wide enough screen, rather than these 30 characters I can fit here with a reasonable size font There’s several other containers in the standard library Array.array is a very memory-compact container for a homogeneous arrays of elementary types, typically numbers or characters You don’t really use it much, but sometimes it can come in handy Normally, you will go for third-party stuff if you really want to do big stuff with array of numbers Model collections currently has only two collections that are really, really important– queue, a double-ended queue, and defaultdict, which is a kind of dictionary which, when accessed with an absent key, creates a value for it Upper case Queue.Queue is a thread-safe first-in-first-out queue We’ll see it when we mention threading briefly This is a mixing class for multiple inheritance to make a skeleton mapping into a rich mapping Shelve is a persistent dictionary with some limitations Weakref, there’s two things I really suggest as opposed to the elementary weakref things, which are that WeakKeyDictionary and WeakValueDictionary, the reference is that they’re weak in that they don’t keep an object alive And I normally end up using the WeakValueDictionary WeakValueDictionary, instead of keeping its values alive, if the value gets garbage collected, the whole entry in the dictionary goes away WeakKeyDictionary is the same, but only works for hashable objects because it keeps So for example, if you’re making a cache or something that needs to organize objects as long as they exist, but must not get in the way and keep objects alive when they should die, look into that I also mention some support, like copy, particularly the deepcopy function Bisect is for searching into a sorted list. Heapq is for arranging a list into a heap or priority queue There’s many ways to do persistence And then there’s the absolutely precious iter tools, which we will get into in more details than just about any other method So this is how you use defaultdict You pass it a callable, typically a type, which can be called without any argument to provide an initial value for any key So now, to count something, which is the data structure normally known as a bag or multiset, you simply add 1 to b[k], But what if b[k] wasn’t there yet? Well, it gets automatically created by calling int without argument, which of course gives a 0 So this is the way to count stuff in Python And this is how we print it I’m using as the key, b.get So the value is they key So I’m printing, and I’m also asking sort to reverse So I’m printing first those items which appear more often, and then decreasing frequency So this is basically a histogram Count a histogram in four lines, not too bad If you want to union, add, essentially, bags, this is how you do it You make a copy of 1, and item by item add the other Note that we need to loop on the keys that are in the other bag So they get added to 0, and the keys that are in the bag

For intersection, you want a new empty bag, and basically, loop on one of the bag, and if the key’s also in the other bag, then take the minimum You could do it without all of these precautions, but then 0’s would be inserted in one of the bags You have to be kind of careful when you make a defaultdict not to index into it with keys that may not be there Otherwise, it grows, because every time you index it, with the previous d absent key, it gets a new item This is how to build an index in a text file In this case, I’m using set as the built-in type, because, basically, what I want to do is connect to each word a set of line numbers in which that word is found So I use the with statement that requires a from future import in Python 2.5 It will be standard in 2.6 Because this way, the thing gets automatically closed when I’m out of that statement, which is very handy I enumerate the files so I get the line numbers I split the line to get the words And finally, I simply add this line number to whatever was previously there for that word Of course, if nothing was there, then an empty set is created first, and then bounded So this has basically created my index It’s not a very selective index, because every word goes there And of course, it’s kind of very rudimentary I haven’t bothered lower-casing the word, eliminating punctuation So hello exclamation mark is one word, and so on, and so forth But it’s just the show the idea And then to display by alphabetically-ordered word, you sort it and loop on the sorting, and you also sort the index so you don’t print the pages in random order but in nicely increasing order Note, of course, that I use a comma at the end of the print, so to continue the line And so I have to have a separate print when I’m done so the line gets terminated Itertools is one of my favorite modules It’s a very small but powerful collection of very fast, powerful, and general building tools for iteration There are ones that build infinite iterators, meaning ones you have to break out of somehow Count starts with a number and keeps increasing by one, and gives the next one forever Cycle, you give it a set of things, and it returns the first, the second, the third, the fourth, and then loops from the starting, keeps repeating forever Repeat simply repeats whatever object you’ve given, either up to n times or forever You can combine iterators Chain, you give it some iterators and it does all the first and all the second and all the third Izip and imap, like zip and map, but without having to build stuff in memory, so you can actually work with infinite sequences And then there’s filters Take an iterator and it only returns some of it You can filter by predicate, when a predicate is true, when a predicate is false, as long as a predicate is true And drop the part in which the predicate is true and then return the rest You can use indices, slice the iterator Take the first four items Take items from the third to the 17th Take items from the third to the 100 every seven, like any other slice on starmap, which is kind of obscure Tee just makes replications So you can tee an iterator, advance on one of the tees, and the other one remembers the place It’s like a bookmark, so then you can go back and loop on it again And groupby, which, given one iterator which tends to be sorted in something, it gives you an iterator of iterators which have that something constant We’ll see some example This is, I think, a reasonably cool recipe from the textbook It generates all primes by one of the eight most ancient algorithms known, the Sieve of Eratosthenes Now, most people believe that the Sieve of Eratosthenes is an algorithm to generate all the primes up to n, and you have to give it n in advance This is not true This is a Sieve of Eratosthenes that just keeps going Every time you call it– it’s a generator, of course– it yields the next prime How does it do that? Well, by keeping a dictionary which maps composite numbers

that are to common the loop to their first prime factors So basically, it just goes from 2 onwards– 2, 3, 4, 5, 6, 7, that’s what itertools.count does– trying to pop the entry from the dictionary If there was no entry in the dictionary, then since this dictionary has all the composites to first prime factor, well, it was not a composite Therefore, it was a prime, and so we yield it And then we record in the dictionary the first composite to exclude, what is the smallest composite number whose smallest prime factor is q? q squared It has to be, because if it was q times x for any x, either x is smaller than q, which violates one condition, or else it’s not the smallest composite So we only need to record one composite per prime we’ve found What if b was composite? Well, then we need to get the next interesting composite into the dictionary in the a of x And basically, we do that by incrementing by this p factor every time, until we found one that wasn’t in the dictionary yet This may be a bit subtle, so let’s add two print statements This is what we print when we’ve found a prime And this is what we print when we’ve found a composite In each case, it’s what was the index in the loop, and what is the state of the dictionary this time This gives you a good idea of how the thing works I skipped the first couple because, by the time we look at 3, which is prime, then we have, well, 9 corresponds to 3, and 4 corresponds to 2 Those are the squares that we record first Then we look at 4 Well, 4 is composite, We know because we would found it here And so we’ve removed it And so we increment it by 2 And the next composite to exclude is 6 And then we look at 5 5 is what we’re yielding, so the next prime after 3 Again, we do the next So now what do we do with 5? We record 25, the smallest prime, blah, blah, the smallest composite, blah, blah And then look at 6 Well, 6 was there, so 8 goes next as the next composite number to exclude whose smallest prime factors is 2 And so 7, great, that’s prime And again, I’ll let you ponder through this and see why this happens at each step before we finally get 11 It’s a very cool trick And so now that we have the way to generate all primes, what if we only want some? Well, this is something you should never do, but I just needed to squeeze space And itertools is a long word, so I wrote– You could have import itertools as i Don’t do it unless you have to program on slides And I also need a function to check if something is less than something else It needs to be a callable And I don’t really like using lambda so much, so I gave it the name lt for less than So let me get all primes such that they’re greater than 100 and less than 130 Well, I want to read from the inside outside I call eratosthenes to generate all prime I drop them while they’re less than 100 And then I take them while they’re less than 130 And so I get this What if we want seven primes that are greater than 200? Well, from the inside, make all primes, drop them until they are smaller than– I’m sorry, this should be 200, this typo here– and then slice the first seven of this It’s a very high abstraction way to program It’s been compared to functional programming And indeed, some of these– like, takewhile and dropwhile are terms you’ll typically find in SML or Haskell You reason at a very high level of abstraction The cool thing about Python is that the higher the abstraction level you program to, given in good part, of course, the extreme care with which itertools has been optimized, the higher your level of abstraction, the faster your program will be Programs using these become really– The main author of itertools, Raymond Hettinger, has been extremely helpful in writing the cookbook, and he’s got this great introduction about why it’s so fast and why you should really use it all the time OK, this is, by the way, this pile of primes, which you’re welcome to look up and explore with these tools

This is another way– I’m sorry, this should be green, ifilter So anyway, this is a function which I’m building with closure for simplicity The last digit, n, returns a function that checks this argument, whether its last digit is n, basically by taking module of 10 and checking for equality to 10 So I want primes that are between 333 and 555, and whose last digit is 7 OK, this is one of the subtlest of itertools It’s groupby So what does groupby do? Well, first of all, you tell it, OK, what’s the key? What is the key of each record I’m handling? In this case, the key is whether a string is space or not If space is a method of strings, it returns true for non-empty strings, which are all whitespace So group these lines by the concept of whether it’s a space So basically, this gives me a group on which I’m iterator for each value, as g being the value of the key, and lines the iterator over all consecutive items that have that value for the key Then when the key changes, that’s a new group So to make paragraphs from lines, I’m grouping by– I’m assuming paragraphs are groups of lines separated by empty lines I group the lines by there being space or not And I get sg lns If naught is g– so if value of the key is naught string of space, so the strings have content– then they join, by default, with this code join method, the lines Otherwise, I’ve just been given a bunch of empty lines, so I go to the next group So this is an example to understand better what is going on Take these lines I’m using splitlines, the splitline method, because this way I can keep the [UNINTELLIGIBLE] and keep the endlines most easily So splitlines by calling splitlines of 1 So it’s a list, a end of line, b end of line, c end of line, and so on, with a two new line, one after the other, meaning an empty line in between, and therefore it’s paragraph And I call this itertools.groupby And to make it visible, I make a list out of it It’s only three things, anyway And see what it will be False, some iterator, true some iterator, false some iterator What does it mean? Well, I first have a group of lines which are not all whitespace, and then one of lines which are all whitespace, and then one of line which aren’t all whitespace, and so on So every time the key value changes, I get a new group, which is something good You find yourself programming often if you don’t reason at a sufficiently high level of abstraction You don’t have to, but it’s really, really much more fun OK, back to third party There’s plenty of containers I singled out tables and graphs But for example, Ordered Dictionary, something that’s– But I want a dictionary but it needs to keep track of the order in which lines have been put in it OK, so Order Dictionary does that Oh, I want a dictionary which is always unique and keeps track of value key correspondence I didn’t have to have to build it Well, two_way_dict does that I need an array of a billion bits They must only take up one bit per So use BitVector and BitBuffer Or there’s also BitPacket, which is a slight variation Anyway, I want to iterate in an arrays Use arrayterator And so on and so forth I want another u, a least recently used cache Use Irucache The containers are maybe the most interesting and well-fed groups of stuff Something I just couldn’t escape, given that I wanted my wife to lend me a hand on checking out my slides, was, of course there’s a great way to deal with times, dates, and calendars You’ve got several standard library module, and many

third-party ones for date and time, and for calendar support This in particular is a Python support for the Maya calendar, which I think is extremely cool It’s not entirely complete It’s very old, from ’99, but it’s probably worth studying if you are at all interested in Mayan culture But seriously, the cookbook has an entire chapter on this stuff You basically want to use datetime, sometimes calendar, dateutil for very advanced manipulation, pytz for timezone support And of course, if you want to support Gcal and make it into Python calendars vice versa, you can use the GoogleCalendar If you want to use iCal, you can use iCalendar And so on, and so forth Just to check to see if I’m totally– anybody understand why I have this figure in a slide of this title? If anybody is willing to make a guess, please raise their hand Yesterday, Today and Tomorrow was an absolutely great Italian movie with Sophia Loren, Marcello Mastroianni And I carefully picked one that wouldn’t have Sophia Loren or Marcello Mastroianni, because I thought that would make it too easy OK, sorry Too obscure All right, so this is how you tell the date of today This is how you tell the date of yesterday You basically need to– and tomorrow– add or subtract not just the number 1, which would be kind of 1 what? If would be kind of ambiguous One day, one month, one year? You have to make a timedelta, saying explicitly you mean days, or months, or whatever And this is the way to compute the date of last Friday, which, again, is in the cookbook In the cookbook, actually, the specification was very subtle It’s last Friday, but if today is already Friday, then today And that, I just couldn’t fit on the slide, so I fixed the code Yes? AUDIENCE: So the syntax to this, it would seem to me, would be a little bit simpler if you would say, today minus 1.day So therefore, you would have 1.day, [INAUDIBLE] ALEX MARTELLI: Well, the syntax for this particular use might be simpler, but the syntax of the language would have to be totally perverted to manipulate that AUDIENCE: So unlike Ruby, that has the ability for everything to be an object character, there’s not the case in Python ALEX MARTELLI: Oh, everything is an object, but you can’t change the methods of built-in objects So with Ruby, if you want to make 2 plus 2 equals 22, you can With Python, you can’t You know what? I like Python [LAUGHTER] ALEX MARTELLI: And similarly, 1 dot word, in Python, the parser says, OK, this is a floating point number followed by an identifier, which then is a syntax error But point 1 point is a floating point number It’s one 1.0 Same thing So date of last Friday, I, for concision, to avoid repeating it, I make once and for all the one day timedelta And I start with a lastf at yesterday, so today minus one day, as I showed here And basically, I loop until I get to Friday And Friday as a constant is defined in calendar You could hardcode, I think it’s a 5, but that would make it completely unreadable Who remembers whether they start from 0 or from 1, and whether they start from Sunday or from Monday? This is absolutely clear Files, very useful, particularly if you’re ever in jail The only built-in support is open There is a thing called file, which is a type, but don’t use it Believe me, don’t Use open It makes a file for you In the future, it will make more, better, brilliant stuff Just use open And basically, you can read, write, seek, and so on But there’s a lot of support for files in many ways Os basically gives you access to whatever typically a reasonable rating system would support Codecs.open lets you read a file that had used some including for unicode, and read them as unicode strings And many other things are supporting various formats of file For example, comma-separated variables Gzip and bz2 compressed, and zipfile compressed Or they’re supporting comparisons of file, temporary

files, looking for input from several files, matching file names Shutil– it stands for shell utils; it’s not an immediately obvious name– is about copying, comparing, removing trees, entire trees of files, and so on And of course, the third party is even more, including all sort of file format magic It’s like the file command in UNIX uses a magic number, a pretty big array of magic number, given an arbitrary file to tell you what type it is, if it knows it all Fdsend lets you send file descriptors between different processes in operating systems that support that Rarfile supports a rar format, which is an ancient archive format you may have around Dbfpy, so to support the dbf format, which was used by dBASE V and other similar tools, and so, and so forth So what is all in the standard library? Well, there’s fundamental things, stuff that in other languages may be in the language themselves Bisect, inspect for debugging Copy for copying The collections we’ve seen Functools for functional programming support And so on, and so forth There’s testing and debugging There is processing of files and text more generally There’s persistence and databases, time and date support, something about numbers, and huge tons of network and web stuff Network and web stuff is far too much to list. For many things, email, we just let you receive and send emails Will be plenty, but of course there’s low level networking support There is support for HTML and XML as formats There’s a lot of support for protocols, such as NMTP, SMTP Smtplib is for client Smtpd is a daemon server FTP, HTTP, pop and other protocol Wsgi’s are standard implementation for advanced web application Cgi is the old standard for web application, much simpler Urllib and urllib2 are for clients using URLs Xmlrpclib is a client, and this one is a server for XMLR protocols And there are so many hundreds [UNINTELLIGIBLE PHRASE] This guy, I don’t know if you’ve noticed– I was hoping he was in the audience, but apparently he isn’t– thinks this is the prettiest part of Python Essentially, he never can find wget when it sits at an [UNINTELLIGIBLE] machine Maybe it’s not installed Maybe it’s not in the path As long as he can have Python, he doesn’t care if wget is there, because he can write it very simply, including a nice little hook function which gets called periodically once in a while, and prints What’s going on? How much has been retrieved out of how much? And basically, you’re urllib.urlretrieve does all the work So this is in Guido’s chapter with Guido, the chapter Guido wrote that prefaced our Python Cookbook I decided, for the sake of this talk, to go a bit more complicated Hey, but this thing gets the farthest one after the other I want them all at once I want them all now Hey So actually, it’s not too bad We import all of this stuff, but then we make a queue In that queue, we will post work requests, and we will have worker threads 10 I decree exactly 10 Thou shall not have 11 Thou shall not have nine, unless it is on thy way to 10 We start 10 worker threads And then put in the queue all our URLs, and we’re done Actually, we’d better wait for the threads to be finished, join the threads But I didn’t have an extra line to show the join call And the reason we need to wait is that I set all the threads to be daemons, meaning they don’t keep the process alive It’s like weak references applied to threads and processes So I have a slight more detail in the hook, because I want to print define name every time so the information doesn’t get too mixed And basically, what a worker thread does is gets the next

work request. A work request is a URL, and then it does what the little thing did in Guido’s program So it’s not too bad There’s a lot of overhead, but every thread is working independently There is no need to worry about locking conditions There is no shared variable at all, except queue, which is thread-safe by definition So that’s the only way you should ever do threading in Python Or don’t do it at all This is a third-party thing called twisted It’s probably too big to actually get into the Python standard library But notice that it’s concise and powerful enough that I’ve actually managed to squeeze in the from_future_import with_statement It implements a few crucial design patterns connected with multiprogramming I’m going to give a talk in September at the ACCU in San Jose about Python patterns for event-driven and multithreading You should just go to the ACCU California web page It’s free night Everybody’s invited ACCU is the association of [UNINTELLIGIBLE] Yes? AUDIENCE: I think now that you could set def equal to something– the third one from the bottom ALEX MARTELLI: Whoops You’re right I abused the key word Thank you That was bad Let me fix that [LAUGHTER] ALEX MARTELLI: Thanks AUDIENCE: You should set def equal to sum ALEX MARTELLI: Hmm? What? I missed that one So anyway, the def being a reserved word, I was getting an error So the point is, there’s these few design patterns, and then there’s a lot of detail, protocol support detail, and twisted is a lot of it So client, well, it’s actually twisted web client, so it’s an HTTP client And it knows all the details of the protocol You don’t have to teach it Well, it’s HTTP 1.0 There’s not really all the refinements of 1.1 But it’s like urllib versus urllib2 But most clients, it’s kind of fine So basically, in this case, we do, for every URL, we add the callback done to a deferred, which is the get page of that URL So we accumulate the request not in a queue which different threads will serve, but, essentially, on a list on which a select or maybe a poll, epoll, or if we are on Windows, some devilish reactor implementation based on the Windows event loop, or whatever will serve it as it makes maybe serviceable And basically, getPage does all the hard work, just like urllib.urlretrieve And when it has all the information, it does the callback Note that I can prebind a argument there And this is the data, which just gets written to the file object Again, I’m using with, so the file object gets closed without me having to worry about such minutiae And so that’s it Giving the PDF URL again, just in case you forgot to write it down from the beginning And I’m available for Q&A, although I think we only have five minutes So sorry I ran a bit over the normal allotted time Any questions Yes? AUDIENCE: Just a URL, it’s the Silicon Valley chapter of the ACCU ALEX MARTELLI: Silicon Valley, not California, chapter of the ACCU Thank you AUDIENCE: And it’s accu/usa.org ALEX MARTELLI: accu/usa– and that’s in Silicon Valley Is there a difference between the Silicon Valley and USA?– org Anyway, just in case you’re interested, it’s September 12, in San Jose So, questions? Well, if there aren’t any, then thank you very much for your patience and for letting me correct that little error on the slide Thank you very much [APPLAUSE]