Iterable objects and iterators

Motivation

The objective is to write more elegant code that the for(int i = 0; i < n; i++) style of code. For instance, the following code is ugly.

A = [1, 7, 9, 3, 4]
for i in range(len(A)):
    print(f"We have {A[i]} in the array")
On a 1 dans le tableau
On a 7 dans le tableau
On a 9 dans le tableau
On a 3 dans le tableau
On a 4 dans le tableau

Instead, write this:

A = [1, 7, 9, 3, 4]
for el in A:
    print(f"We have {el} in the array")
We have 1 in the array
We have 7 in the array
We have 9 in the array
We have 3 in the array
We have 4 in the array

Definition

Iterators

An iterator is an object that gives you elements when pressing the next button. Like a coffee machine or a food dispenser:

Iterable objects

An iterable object is an object that can be transformed into an iterator. In for x in BLOUP, BLOUP should be an iterable object.

List

[1, 6, 2]
[1, 6, 2]

In Python:

  • An iterable object can be transformed into an interator via __iter__()
  • An iterator is an object implementing __next__().
it = [1, 6, 2].__iter__()
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__()) # raise a StopIteration error
1
6
2



---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

Cell In[12], line 5
      3 print(it.__next__())
      4 print(it.__next__())
----> 5 print(it.__next__()) # raise a StopIteration error


StopIteration: 

Range

range(4)
range(0, 4)
for i in range(4):
    print(i)
0
1
2
3
it = range(4).__iter__()
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__())  # raise a StopIteration error
0
1
2
3



---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

Cell In[17], line 6
      4 print(it.__next__())
      5 print(it.__next__())
----> 6 print(it.__next__())  # raise a StopIteration error


StopIteration: 

Many iterable objects and iterators are lazy data structures. For instance, range(100000000) will not create a list of 100000000 elements! It is inspired from constructions in Haskell.

Generators

A generator is a special kind of function that contains the keyword yield. It returns an iterable object.

def squares(start, stop):
    for i in range(start, stop):
        yield i * i
it = squares(1, 5).__iter__()
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__())
print(it.__next__())
1
4
9
16



---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

Cell In[10], line 6
      4 print(it.__next__())
      5 print(it.__next__())
----> 6 print(it.__next__())


StopIteration: 

slots and dict


for i in squares(1, 6):
    print(i)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Cell In[1], line 1
----> 1 for i in squares(1, 6):
      2     print(i)


NameError: name 'squares' is not defined
def all_integers():
    i = 0
    while True:
        yield i
        i += 1
N = all_integers()
iter = N.__iter__()
print(next(iter))
print(next(iter))
0
1

Generators can also be defined with a functional construction, using parenthesis.

all_squares = (a ** 2 for a in all_integers())
N = all_squares
iter = N.__iter__()
print(next(iter))
print(next(iter))
print(next(iter))
print(next(iter))
0
1
4
9

Nest generators

Generators can be nested but not in naive way:

def f():
    yield 1
    yield 2

def g():
    f()
    yield 3

for i in g():
    print(i)
3

You may write:

def f():
    yield 1
    yield 2

def g():
    for j in f():
        yield j
    yield 3

for i in g():
    print(i)
1
2
3

Or more elegantly:

def f():
    yield 1
    yield 2

def g():
    yield from f()
    yield 3

for i in g():
    print(i)
1
2
3

Converting iterable objects into whatever you want

list(ITERABLE) takes the iterable, get an iterator, and then press the next button until no more food gets out. All the items are then collected into a list.

list((1, 2, 3))
[1, 2, 3]
list(squares(1, 6))
[1, 4, 9, 16, 25]
list(all_integers())
---------------------------------------------------------------------------

KeyboardInterrupt                         Traceback (most recent call last)

Cell In[20], line 1
----> 1 list(all_integers())


Cell In[19], line 4, in all_integers()
      2 i = 0
      3 while True:
----> 4     yield i
      5     i += 1


KeyboardInterrupt: 
list({"a": 0, "b": 1})
['a', 'b']
set((1, 2, 3))
{1, 2, 3}
frozenset([1, 2, 3])
frozenset({1, 2, 3})
tuple([1,2, 3])
(1, 2, 3)

Enumerate

Sometimes you need the index and the element and you are stuck with: The enumerate function transforms any iterable object into a new iterable object that incorporates a counter.

fruits = ['banana', 'kiwi', 'strawberry']
for fruit in fruits:
    print(f"Fruit number ?? is a {fruits[i]}")
le fruit numéro ?? est une pomme
le fruit numéro ?? est une banane
le fruit numéro ?? est une orange

So you are lazy and you write:

fruits = ['banana', 'kiwi', 'strawberry']
for i in range(len(fruits)):
    fruit = fruits[i]
    print(f"Fruit number {i+1} is a {fruits[i]}")
le fruit numéro 1 est une pomme
le fruit numéro 2 est une banane
le fruit numéro 3 est une orange

But you can use enumerate that creates an interable object.

fruits = ['banana', 'kiwi', 'strawberry']
enumerate([fruits], start=1)
<enumerate at 0x7f87a626d200>
fruits = ['banana', 'kiwi', 'strawberry']

for i, fruit in enumerate(fruits, start=1):
    print(f"Fruit number {i+1} is a {fruit}")

1 pomme
2 banane
3 orange

Zip

Zip consists in transforming several iterables into a single one.

In order to see the result, we put list(....) to transform the produced iterable into a list.

fruits = ['banana', 'kiwi', 'strawberry']
prices = [2, 4, 8]

for i in len(fruits):
    print(f"{fruits[i]} and costs {prices[i]}")

The fruits[i] and prices[i] are ugly. The solution is to zip fruits and prices.

fruits = ['banana', 'kiwi', 'strawberry']
prices = [2, 4, 8]

for (fruit, price) in zip(fruits, prices):
    print(f"{fruit} and costs {price}")
banana and costs 2
kiwi and costs 4
strawberry and costs 8
fruits = ['banana', 'kiwi', 'strawberry']
prices = [2, 4, 8]

for i, (fruit, price) in enumerate(zip(fruits, prices), start=1):
    print(f"{i}: {fruit} and costs {price}")
1: banana and costs 2
2: kiwi and costs 4
3: strawberry and costs 8
zip(["a", "b", "c", "d"], (1, 2, 3, 4))
<zip at 0x7fda498d3b40>
list(zip(["a", "b", "c", "d"], (1, 2, 3, 4)))
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
zip(["a", "b", "c", "d"], (1, 2, 3, 4), ("alpha","beta", "gamma", "delta"))
<zip at 0x7fda498c3e40>

zip peut être défini comme ça :

def myZip(A, B):
    map(lambda x, y: (x, y), A, B)
list(zip(["a", "b", "c", "d"], (1, 2, 3, 4), ("alpha","beta", "gamma", "delta")))
[('a', 1, 'alpha'), ('b', 2, 'beta'), ('c', 3, 'gamma'), ('d', 4, 'delta')]
from itertools import zip_longest
list(zip_longest(["a", "b", "c"],(1, 2, 3, 4), ("alpha","beta", "gamma", "delta"), fillvalue=0))
[('a', 1, 'alpha'), ('b', 2, 'beta'), ('c', 3, 'gamma'), (0, 4, 'delta')]

Dictionnaries

costs = {"banana": 2, "kiwi": 4, "straberry": 8}
for key in costs.keys():
    print(f"{key} costs {costs[key]}")
banana costs 2
kiwi costs 4
straberry costs 8
costs = {"banana": 2, "kiwi": 4, "straberry": 8}
for key in costs:
    print(f"{key} costs {costs[key]}")
banana costs 2
kiwi costs 4
straberry costs 8
costs = {"banana": 2, "kiwi": 4, "straberry": 8}
for key, cost in costs.items():
    print(f"{key} costs {cost}")
banana costs 2
kiwi costs 4
straberry costs 8

Modifing a dictionnary during a loop

costs = {"banana": 2, "kiwi": 4, "straberry": 8}
for key in costs:
    costs[f"{key}'"] = costs[key]*2

print(costs)

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

Cell In[39], line 2
      1 cost = {"banana": 2, "kiwi": 4, "straberry": 8}
----> 2 for key in cost:
      3     cost[f"{key}'"] = cost[key]*2
      5 print(cost)


RuntimeError: dictionary changed size during iteration
costs = {"banana": 2, "kiwi": 4, "straberry": 8}
for key in list(costs):
    costs[f"{key}'"] = costs[key]*2
    
print(costs)
{'banana': 2, 'kiwi': 4, 'straberry': 8, "banana'": 4, "kiwi'": 8, "straberry'": 16}

Counting

import itertools
itertools.count(start=10, step=2)
count(10, 2)
import itertools
itertools.cycle(["player1", "player2"])
<itertools.cycle at 0x7fda4a716d00>
import itertools
itertools.repeat("X", 4)

Permutations

itertools.permutations(ITERABLE, k) returns an iterable object that enumerates all k-length tuples which are all possible orderings with no-repeated elements.

import itertools
list(itertools.permutations(range(4), 2))
[(0, 1),
 (0, 2),
 (0, 3),
 (1, 0),
 (1, 2),
 (1, 3),
 (2, 0),
 (2, 1),
 (2, 3),
 (3, 0),
 (3, 1),
 (3, 2)]
import itertools
list(itertools.permutations(range(4), 4))
[(0, 1, 2, 3),
 (0, 1, 3, 2),
 (0, 2, 1, 3),
 (0, 2, 3, 1),
 (0, 3, 1, 2),
 (0, 3, 2, 1),
 (1, 0, 2, 3),
 (1, 0, 3, 2),
 (1, 2, 0, 3),
 (1, 2, 3, 0),
 (1, 3, 0, 2),
 (1, 3, 2, 0),
 (2, 0, 1, 3),
 (2, 0, 3, 1),
 (2, 1, 0, 3),
 (2, 1, 3, 0),
 (2, 3, 0, 1),
 (2, 3, 1, 0),
 (3, 0, 1, 2),
 (3, 0, 2, 1),
 (3, 1, 0, 2),
 (3, 1, 2, 0),
 (3, 2, 0, 1),
 (3, 2, 1, 0)]

The following program computes all $n \times n$ chess configurations of $n$ non-attacking queens.

from itertools import permutations

"""
We represent a board of n-queens by an array board such that 
board[r] is the column index such that there is a queen at row r and column board[r]
"""

def printBoard(board):
  """
  print a board in the terminal
  """
  n = len(board)
  print('_' * (n+2))
  for r in board:
    print('|' + ' ' * r + '♛' + ' ' * (n-r-1) + '|')
  print('\n')
 
def nqueenboards(n):
  """
  A generator that generates all configurations of non-attacking n queens on a n*n boards
  """
  columns = range(n)
  for board in permutations(columns):
    s1 = set(board[i] + i for i in columns)
    s2 = set(board[i] - i for i in columns)
    if n == len(s1) == len(s2):
      yield board

def printAllBoards(N):
  for board in nqueenboards(N):
    printBoard(board)

printAllBoards(5)

_______
|♛    |
|  ♛  |
|    ♛|
| ♛   |
|   ♛ |


_______
|♛    |
|   ♛ |
| ♛   |
|    ♛|
|  ♛  |


_______
| ♛   |
|   ♛ |
|♛    |
|  ♛  |
|    ♛|


_______
| ♛   |
|    ♛|
|  ♛  |
|♛    |
|   ♛ |


_______
|  ♛  |
|♛    |
|   ♛ |
| ♛   |
|    ♛|


_______
|  ♛  |
|    ♛|
| ♛   |
|   ♛ |
|♛    |


_______
|   ♛ |
|♛    |
|  ♛  |
|    ♛|
| ♛   |


_______
|   ♛ |
| ♛   |
|    ♛|
|  ♛  |
|♛    |


_______
|    ♛|
| ♛   |
|   ♛ |
|♛    |
|  ♛  |


_______
|    ♛|
|  ♛  |
|♛    |
|   ♛ |
| ♛   |

Cartesian product

import itertools
list(itertools.product(["a", "b", "c"], [1, 2, 3, 4]))
[('a', 1),
 ('a', 2),
 ('a', 3),
 ('a', 4),
 ('b', 1),
 ('b', 2),
 ('b', 3),
 ('b', 4),
 ('c', 1),
 ('c', 2),
 ('c', 3),
 ('c', 4)]