Selecting and operating on a subset of items from a list or group is a very common idiom in programming.
Python provides several built-in ways to do this task efficiently.
Python Filtering
1. Python Filter Function
The built-in filter() function operates on any iterable type (list, tuple, string, etc).
It takes a function and an iterable as arguments. filter() will invoke the function on each element of the iterable, and return a new iterable composed of only those elements for which the function returned True.
The return type depends on the type of the iterable passed in. If the iterable is either a string or a tuple, the return type will reflect the input type. Otherwise, the filter function will always return a list.
2. Python Filter with Number
Lets look at a couple of examples. We’ll work in the Python intrepreter.
First let’s make a list of numbers:
>>> numbers = [1, 6, 3, 8, 4, 9]
Next, we’ll define a function to act as our criteria to filter on:
>>> def lessThanFive(element): ... return element < 5 ...
Notice that in order to work properly, our criteria function must take a single argument (filter() will call it on each element of the iterable, one at a time). Using our newly defined lessThanFive() criteria function, we expect filter() to return a list of only the elements with a value of less than 5. and that’s exactly what we get.
>>> filter(lessThanFive, numbers) [1, 3, 4]
3. Python Filter with String
Let’s look at another example. This time, we’ll make a tuple of names:
>>> names = ('Jack', 'Jill', 'Steve', '')
Notice that the last name in the tuple is the empty string. If None is handed as the first argument to filter(), it falls back to use the identity function (a function that simply returns the element).
Since Python evaluates empty strings, 0’s and None’s as False in a boolean context, this default behaviour can be useful to remove the ‘empty’ elements of a literable:
>>> filter(None, names) ('Jack', 'Jill', 'Steve')
4. Python Filter with a Function
Let’s write a function to find only the names that start with ‘J’:
>>> def startsWithJ(element): ... if len(element) > 0: ... return element[0] == 'J' ... return False ...
I check the length of the name first to ensure that there is in fact a first character.
We should expect to get back a tuple containing only ‘Jack’ and ‘Jill’ when we use this as our criteria function:
>>> filter(startsWithJ, names) ('Jack', 'Jill')
Again, notice that the return type is a tuple, and not a list.
For more information about the filter function, type help(filter) in a python interpreter to see the man page, or browse the online python filter docs.
List Comprehension
Another way to approach this idiom lists is to use a list comprehension. Essentially, a list comprehension is a compact for-loop that builds lists. Each iteration, an element can be appended to list being built. The syntax is:
5. Basic List Comprehension Usage
[ <output value> for <element> in <list> <optional criteria> ]
This looks like a lot so lets start off with a simple example. Remember the ‘numbers’ list we defined earlier?
>>> numbers [1, 6, 3, 8, 4, 9]
we can use a list comprehension to simply ‘echo’ the list:
>>> [ num for num in numbers ] [1, 6, 3, 8, 4, 9]
6. Process List as For Loop
Let’s break this expression down:
- The middle part of the comprehension, ‘for num in numbers’, looks exactly like a for loop. It tells us that we are iterating over the ‘numbers’ list, and binds the name ‘num’ to the current element each iteration.
- The leading ‘num’ tells us what to append to the list we are building. We could have just as easily said: to build a list with each element of ‘numbers’ doubled.
>>> [ num * 2 for num in numbers ] [2, 12, 6, 16, 8, 18]
7. If Condition in Python List
Let’s add a criteria to recreate what we did with the built-in filter function earlier:
>>> [ num for num in numbers if num < 5 ] [1, 3, 4]
This time, we’re still building our new list from the original values, but we only append an original value to the new list if it satisfies the criteria (‘if num < 5’). I hope you can start to see the similarities between comprehensions and filtering.
8. Python List like an Array
Our intentions don’t always have to be to create a new list when using list comprehensions though. We can make use of side-effects, carrying out behaviour on a subset of a list all in one step. Let’s look a slightly less trivial example. Lets say we have a list of customers:
>>> customers = [('Jack', 'jack@example.com', True), ('Jill', 'jill@example.com', False)]
Customers are represented by a 3-tuple containing their name, email address, and whether they want to receive email notifications, in that order. We can write:
>>> def subscribesForUpdates(customer): ... return customer[2] ...
to capture the intent of this 3rd field.
Now, lets say we have a message to send, but we only want to send it to only those customers who subscribe for updates. The following function will be responsible to send the notifications:
>>> def emailUpdate(customer): ... # stub for actually sending the email ... print 'emailing update to: %s' % customer[1] ...
this function is only a stub. If we were doing this for real, the logic to send an email would go here, but in our example, we simply print out to show that the function was called for a given customer. Notice that emailUpdate does not return a customer, but instead carries out behaviour using one.
9. Function, If condition and For loop in Python List
Now, we have everything we need to use a list comprehension to send out our notifications:
>>> [ emailUpdate(customer) for customer in customers if subscribesForUpdates(customer) ] emailing update to: jack@example.com [None]
There are similar comprehension syntaxes for Dictionaries, tuples, and even sets. To read more about list comprehensions, visit the offical python lists docs.
I hope you’re getting familiar with the list comprehension syntax, but let’s break this one down:
- The middle of the comprehension, ‘for customer in customers’, tells us that we are iterating over the list ‘customers’. It also tells us that the current list element for each iteration will be bound to the variable ‘customer’.
- The start of the comprehension, ’emailUpdate(customer)’, tells us that we will invoke the emailUpdate function on each customer (this is where the side-effect behaviour is important), and then append its return value to the list we are building.
- The end of the comprehension, ‘if subscribesForUpdates(customer)’, says to only execute the emailUpdate function if the condition is true (i.e. if the customer subscribed for emails).
From the output, we see that the emailUpdate function was indeed only invoked on Jack (as Jill does not subscribe for emails).
We also see that the return value from the comprehension was [None]. Why is this?
Remember that list comprehensions build lists. This time, we are building a list using the return of emailUpdates().
Since emailUpdates() doesn’t explicitly return a value, Python will make it implicitly return None. Since only Jack subscribed for email updates, the emailUpdates() function was only called once, and only 1 None was appended to the initially empty list that the comprehension built for us.
Comments on this entry are closed.
In the little python, I’ve written, I have not used filter() or list comprehension. I’ll add that to my bag of tricks. I found it useful to follow along with your examples using idle. That way I could play with variations of the examples. Good article – thanks.
Nice article.
I was aware of list comprehension. But I can see how elegantly you have used it in example 9 !
Hi, good article. But I have a few recommendation for you.
First of all, read PEP 8, especially this .
The second one, you should prefer list comprehension over ‘filter’ and ‘map’ functions.
Another thing, use ‘startswith’ method of str instead of your own function.
Good article, useful.
Great Article!
Thanks
Nice,
I really liked example 9
Python 3.4.0 (v3.4.0:04f714765c13, Mar 15 2014, 23:02:41)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type “copyright”, “credits” or “license()” for more information.
>>> WARNING: The version of Tcl/Tk (8.5.9) in use may be unstable.
Visit http://www.python.org/download/mac/tcltk/ for current information.
>>>
>>> number = [1, 6, 3, 8, 4, 9]
>>> print numbers
SyntaxError: invalid syntax
>>> print(numbers)
Traceback (most recent call last):
File “”, line 1, in
print(numbers)
NameError: name ‘numbers’ is not defined
>>> print(number)
[1, 6, 3, 8, 4, 9]
>>> def lessThanFive(element):
return element >> filter(lessThanFive, number)
>>> print(filter(lessThanFive, number))
>>> names = (‘Jack’, ‘Jill’, ‘Steve’, ”)
>>> filter(None, names)
>>>
sorry but my copy and pastes above do not show the entire things that I pasted in.
not sure why but many lines were not pasted in, resulting in blank lines where there is suppoessed to be output frok IDLE. Anyways it just show in those areas… not sure why it prints the object and not the results as in your post.
this is what sucks about learning to code.
Bill, the filter() function works differently in Python 3.x (the author used Python 2.x).
In Python 2, filter() returns a list, tuple or string depending on what you used as argument, but in Python 3 it constructs an iterator.
That means you can use a ‘for loop’ to print out all the values in the filter object
or you can call the list() or tuple() function to create a list or tuple: list(filter_object). Note that the filter object is empty after you do that. You could also use a generator comprehension (just like a list comprehension but with parentheses () ) instead of filter().
If something is wrong, please correct me. I’m also just a newbie. 🙂
I have to correct myself. The right term is “generator expression” not “generator comprehension”.