Learn practical skills, build real-world projects, and advance your career

Data generators

In Python, a generator is a function that behaves like an iterator. It will return the next item. Here is a link to review python generators. In many AI applications, it is advantageous to have a data generator to handle loading and transforming data for different applications.

You will now implement a custom data generator, using a common pattern that you will use during all assignments of this course.
In the following example, we use a set of samples a, to derive a new set of samples, with more elements than the original set.

Note: Pay attention to the use of list lines_index and variable index to traverse the original list.

import random 
import numpy as np

# Example of traversing a list of indexes to create a circular list
a = [1, 2, 3, 4]
b = [0] * 10

a_size = len(a)
b_size = len(b)
lines_index = [*range(a_size)] # is equivalent to [i for i in range(0,a_size)], the difference being the advantage of using * to pass values of range iterator to list directly
index = 0                      # similar to index in data_generator below
for i in range(b_size):        # `b` is longer than `a` forcing a wrap
    # We wrap by resetting index to 0 so the sequences circle back at the end to point to the first index
    if index >= a_size:
        index = 0
    
    b[i] = a[lines_index[index]]     #  `indexes_list[index]` point to a index of a. Store the result in b
    index += 1
    
print(b)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]

Shuffling the data order

In the next example, we will do the same as before, but shuffling the order of the elements in the output list. Note that here, our strategy of traversing using lines_index and index becomes very important, because we can simulate a shuffle in the input data, without doing that in reality.

# Example of traversing a list of indexes to create a circular list
a = [1, 2, 3, 4]
b = []

a_size = len(a)
b_size = 10
lines_index = [*range(a_size)]
print("Original order of index:",lines_index)

# if we shuffle the index_list we can change the order of our circular list
# without modifying the order or our original data
random.shuffle(lines_index) # Shuffle the order
print("Shuffled order of index:",lines_index)

print("New value order for first batch:",[a[index] for index in lines_index])
batch_counter = 1
index = 0                # similar to index in data_generator below
for i in range(b_size):  # `b` is longer than `a` forcing a wrap
    # We wrap by resetting index to 0
    if index >= a_size:
        index = 0
        batch_counter += 1
        random.shuffle(lines_index) # Re-shuffle the order
        print("\nShuffled Indexes for Batch No.{} :{}".format(batch_counter,lines_index))
        print("Values for Batch No.{} :{}".format(batch_counter,[a[index] for index in lines_index]))
    
    b.append(a[lines_index[index]])     #  `indexes_list[index]` point to a index of a. Store the result in b
    index += 1
print()    
print("Final value of b:",b)
Original order of index: [0, 1, 2, 3] Shuffled order of index: [2, 1, 0, 3] New value order for first batch: [3, 2, 1, 4] Shuffled Indexes for Batch No.2 :[3, 1, 2, 0] Values for Batch No.2 :[4, 2, 3, 1] Shuffled Indexes for Batch No.3 :[1, 2, 0, 3] Values for Batch No.3 :[2, 3, 1, 4] Final value of b: [3, 2, 1, 4, 4, 2, 3, 1, 2, 3]