Iterators and generators in Python
Iterators and generators are powerful tools that can help you write cleaner and more expressive code. I review iterators, generators, and give some practice problems.
Iterators
In Python, iterators are objects that represent a sequence of values. Calling
next()
on an iterator returns you the next element in that sequence.
When an iterator is exhausted (i.e. has no more elements), next()
will throw
a StopIteration
error which you can catch. Alternatively, you can pass in a
default value to next
(next(it, default)
); then the iterator will return
that default value instead when it is exhausted.
Iterators are so core to the Python language that for
loops can be thought
of as “syntactic sugar”^{1} for while loops and iterators:


Uses
Iterators are useful when you need to process a sequence of values in order. Many algorithms need only linearly scan through a sequence once, so using an iterator interface makes it easy to write code generic over the underlying data structure.
In Python, because the for
statement was designed to use iterators, it is
natural to write such code.


Defining an iterator for a data structure also lets you pass that data structure into other functions that only rely on the “sequence” interface of iterators.
If this is too abstract, we’ll see concrete examples below and in the practice problems.
Iterable vs. Iterator
Python also has an “iterable” interface. Iterables are generally objects that
can be iterated over, like lists, sets, dictionaries, and even the range
object. They implement the __iter__
method (which constructs a new
iterator from the iterable).
Iterables are not to be confused with iterators! Iterables are not necessary
iterators, and vice versa. Python will error if you call next
on an object
that isn’t an iterator, even if it is an iterable.
So be careful if you receive an sequence and you want to treat it like an
iterator! It might be just an iterable, so always convert it first using
iter()
. (If it’s already an iterator, iter
will return it as is.)


Generators
Generators are, roughly, objects (created from generator functions) that
“yield” one or more values through the iterator interface. Since they implement
the iterator interface, you can call next
on them.


As such, you can use generators as the iteration target in for
loops:


Uses
Outputting a sequence of values
One basic use for a generator is for a function that returns a list of values: instead of returning a list, you could yield values one by one.


One major advantage to yielding values one by one is because the computation is
lazy. For example, it’s possible the code calling filter
might not need to
filter out all values as it might not consume the whole sequence.
The downside is that the extra communication overhead—resuming the generator, and getting the result—can be worse for performance overall. (But it’s important to measure your code’s performance before you optimize!)
Iterators for data structures
In a similar vein, generators can be used to implement iterators for custom data structures.
Generators work especially well for implementing iterators for recursive data structures like trees, as these iterators can be difficult to code imperatively because they need to carefully keep track of state (e.g., in a tree, the path to the current node). Generators keep track of that state for us using the call stack.


Infinite streams
Because generators are lazy, they can also represent infinite streams of values. A canonical example is a lazy infinite sequence of primes:


Because primes
is lazy, it only gives us a new prime every time we ask for
one. Calling list
directly on primes()
would infinitely loop; that’s why
we use islice
to only take the first 10 values.^{2}
Practice Problems
Many of these solutions involve writing a helper iterator function that captures the iteration logic. Then, use the helper function to concisely implement the main logic.
Warmup: Linked List
Find the maximum number in a linked list. Assume the 61A definition of linked
list: linked list nodes are represented by instances of a Link
class with a
first
and a rest
member.


Once you’re done with that, find the minimum number. How can you reuse the most amount of code possible?
Nested List
Flatten a nested list, which is a list whose elements may be other (potentially nested) lists or nonlist elements.
Hint: Use isinstance(value, list)
to check if a value is a list.


Consecutive Numbers
Find the largest difference between two consecutive numbers in a list.
Hint: Generators can yield tuples.


Now find the smallest difference.


Merge Sorted Interval Lists
Question from Facebook. Given two sorted interval lists, merge the intervals such that there are no overlaps. Output the combined list, after merging all intervals. For example,
 The intervals
[1, 7]
and[2, 10]
can be merged into[1, 10]
.  The intervals
[1, 5]
and[5, 10]
can be merged into[1, 10]
.  The intervals
[1, 5]
and[6, 10]
cannot be merged.
Hint: yield from
, and create a second generator to help with your logic.


Merge \(k\) Sorted Lists
There exists a \(O(n\log{k})\) solution using a heap, but there is a divide and conquer solution with the same time complexity. You can use iterators and generators to implement the divide and conquer solution. It’s a bit messy, but it’s good practice.


Permutations
Find all permutations of the set of integers from \(1\) to \(n\).
Hint: Create a second generator to help with your logic, and solve a more general problem.


In practice they generate different code; CPython has a special
FOR_ITER
instruction in its virtual machine: https://godbolt.org/z/1j1nKP5hn ↩︎This is inefficient as we loop through all numbers up to the square root when checking if a number is prime or not. Ideally, we could only loop through all primes. Unfortunately this is not as convenient to express in Python as every
primes()
generator is distinct. In truly lazy languages we could represent theprimes
list as a singular self referential data structure: theprimes
list would referenceis_prime
, which would recursively referenceprimes
to look up the primes up to \(\lfloor\sqrt{n}\rfloor\). ↩︎