Jovian
Sign In
Learn practical skills, build real-world projects, and advance your career

Parameter Logging in DL/ML Models

Parameter Logging in DL/ML Models: Part 1

Overview

Analyzing the model metric is one of the most crucial tasks while training any machine Learning or Deep Learning model. It gives us the capability to diagnose the model statistics if model prediction is not up to the mark. In this article I will be discussing how we can log the model metric from model architecture, hardware metric and epoch data in Wandb.ai.

Wandb.ai

Scope

In this article I will discuss Wandb.ai. Wandb.ai is an experiment logging 3rd party platform which enables us to log Machine Learning (ML) or Deep Learning (DL) model data, training information, dataset, model weights and many other useful information related to the model which we have trained. The following points are discussed in this article:

  • Wandb.ai account setup and logging in
  • Connection out notebook/IDE with Wandb.ai
  • Saving the model configuration in Wandb.ai Dashboard
  • Logging epoch information while training the model.
  • Visualization of the model statistic in the Wandb.ai dashboard

Note : This article is for the readers who are comfortable with colab, Jupyter notebooks, Python Script and fundamental concepts of Keras Apis such as Loading Datasets, Creating Model, Training model and Callbacks.

Experiment Logging

Logging the parameters of the model while training is a very crucial part of the model optimisation. After the training session, we should analyze the models parameters as well as the metrics based on the model use case (i.e. in which sector model will be deployed and what metrics we need to focus more as compared to others) visually as well as statistically in order to analyze the model performance for improvement. ML/DL framework has inbuilt options such as callbacks, csv logger or manually extraction of the model metric into csv/excel file and visualizing them manually. But it gives us limited power since we have to continuously monitor each epoch cycle of the model training process and if the model stops training due to some error or some unfortunate circumstances we lose all the progress and we have to start again. To counter this problem I have designed this article in which we will walk through the python library known as wandb.ai - an experiment tracking, dataset management and versioning tool. Wandb.ai gives us the capability to log the model parameter as well as to visualize the metrics in order to get the model insight for further optimization and inference.

  1. Wandb.ai Account (Free/Paid)
  2. IDE/Colab/Jupyter Notebook

This article is structured in a step by step process on how to integrate wandb.ai with Jypyter notebook. The same procedure can be followed in case of python script. The Steps are discussed below:

Step 1 : Creating an account on wandb.ai.

Since, Wandb.ai is a third party app, which is available for free as well as paid version. First we need to create our account with our gmail account. The steps are mentioned below:

  1. Open the browser and navigate to Link
  2. Click on the Signup button if you are registering it for the first time else you can directly log in if you have an existing account with wandb.ai.
  3. If you are a new user then you need to create one account by submitting the details such as email, name and other relevant information. In this case, I have already created my account, so I will login directly.

Image 1 . Wandb.ai Login

  1. Choose the option in which you are comfortable, for me it is Sign in with Google i will choose this option and proceed further.

Step 2 : Setting up Wandb.ai Project and getting API key

  1. After successfully registering in the wandb.ai you will be redirected to homepage where Overview, Project and Likes options will be available. Select the Project tab and click on the Create new Project button. Image 2. Wandb.ai Home Page

  2. Since the scope of this article is to explain how to use the wandb.ai so we will create a new project. After clicking the Create new project button it will take you to the next page shown below. You will have to give the name of your project, I have given Mnist because I will be explaining on MNIST Dataset and click on Create Project button.

Image 3. Wandb.ai Create Project

  1. After creating the Project you will be directed to a page which will have basic information of how to integrate and run the wandb.ai with API key in Jupyter notebook. The below image shows the page on which you will be redirecting after Creating project with Api key, copy the Api key and we will use it in our Colab notebook.

Image 4. API Key wandb.ai

Step 3 : Connecting Wandb.ai with Colab

In this section, we will discuss the integration of wandb.ai with Colab, how we can integrate the wandb.ai experiment logging with Colab and a little bit of the workflow of the wanddb.ai.

  1. Integration of Wandb.ai and Tensorflow Installation

As in Section 2, you will have API key for specific project, open the colab notebook, start the session and add the following code snippets:

These snippets will install the required libraries associated with wandb.ai along with Tensorflow are running this snippets the following output will appear in Colab.

!pip install wandb --q
!pip install tensorflow --q

The next step is to login via your Colab into the wandb.ai with your Api key and connect the Colab with you account. The below script will login in the wandb.ai and ask for the authentication i.e. Api key.

!wandb login
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server) wandb: You can find your API key in your browser here: https://wandb.ai/authorize wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc

After running the above code snippets you will the input box for submitting the Api key, fill the Api key and hit enter the following output.

Now after successfully initialization of the session, we can now start writing our Keras Model for logging the metrics.

import numpy as np
import wandb
from wandb.keras import WandbCallback 
from tensorflow import keras
from tensorflow.keras import layers

The above snippets are used to import the required libraries for keras and wandb.ai so that the classes and functions can be implemented in our code. Next step is to create our configuration variable i.e. dictionary. The config variable will have model parameters which will be used in model parameters as well. It will be logged in the wandb.ai also. The below snippets show the config variable and model parameters.

config = dict (
              num_classes=10,
              input_shape=(28, 28, 1),
              normalize_factor=255,
              Layer1_Conv2d_filter=32,
              Conv2d_kernel_size=(3,3),
              Layer1_Conv2d_activation_size="relu",
              max_pool_size=(2,2),
              Layer2_Conv2d_filter=64,
              Droupout=0.5,
              Layer2_Conv2d_activation_size="relu",
              classification_activation="softmax",
              batch_size = 128,
              epochs = 15,
              loss="categorical_crossentropy",
              optimizer="adam",
              validation_split=0.1)

Now, we have to initialize our session for this we need to run the following commands with the config variable along with our username and project name (entity) as shown below in the snippets.

wandb.init(project="Jovian_Article",  entity="happyman",config =config )

After executing the above snippets, you will see the session name which is initialized randomly and path of the wandb.ai log which is send to the wandb.ai platform for tracking. Basically, wandb.ai creates its own logs file structure by collecting the environment variable of Colab notebook. These environment variables are then sent to the via Api to the wandb.ai server for populating the dashboard of associated projects.

If you will open the wandb.ai dashboard you will see the CPU utilization metric and Configuration variable as shown below.

Image 4. CPU urilization Wandb.ai

Now, session has sucessfully started and config variables i.e. model parameter is sucessfully uploaded now it the time to start preprocessing the dataset.

If you will open the wandb.ai dashboard you will see the CPU utilisation metric and Configuration variable as shown below.

add here image

Step 4 : Dataset Preprocessing

For the explanation purpose, I have used MNIST-Digit dataset which consists of 60,000 greyscale images of handwritten digits from 0-1. The dataset is splitted into two section i.e. Train set which consists of 50,000 images and Test set which consist of 10,000 images. I have also preprocessed the dataset by normalization and converting the labels into categorical values. The below code snippets depict the preprocessing steps of the MNIST Digit dataset.

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11490434/11490434 [==============================] - 1s 0us/step
x_train = x_train.astype("float32") / config["normalize_factor"]
x_test = x_test.astype("float32") /config["normalize_factor"]
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, config["num_classes"])
y_test = keras.utils.to_categorical(y_test, config["num_classes"])

Step 5 : Keras Model

Now after preprocessing the dataset it's time for creation of the model which we will train. For sake of simplicity I have constructed a simple 2D-CNN model with 1 dense layer and initialized the model parameter using the config variable which is a dictionary. The explanation of the model creation and data preprocessing is out of the scope of this article. That is why I have not explained it in depth. The below code snippets depicts the model creation.

model = keras.Sequential(
    [
        keras.Input(shape=config["input_shape"]),
        layers.Conv2D(config["Layer1_Conv2d_filter"], kernel_size=config["Conv2d_kernel_size"], activation=config["Layer1_Conv2d_activation_size"]),
        layers.MaxPooling2D(pool_size=config["max_pool_size"]),
        layers.Conv2D(config["Layer2_Conv2d_filter"], kernel_size=config["Conv2d_kernel_size"], activation=config["Layer2_Conv2d_activation_size"]),
        layers.MaxPooling2D(pool_size=config["max_pool_size"]),
        layers.Flatten(),
        layers.Dropout(config["Droupout"]),
        layers.Dense(config["num_classes"], activation=config["classification_activation"]),
    ]
)

After model creation we are logging the model summary in the config variable and displaying the summary in the outputs cell the snippets is shown below.

summary=model.summary()
config["model_summary"]=summary
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 26, 26, 32) 320 max_pooling2d (MaxPooling2D (None, 13, 13, 32) 0 ) conv2d_1 (Conv2D) (None, 11, 11, 64) 18496 max_pooling2d_1 (MaxPooling (None, 5, 5, 64) 0 2D) flatten (Flatten) (None, 1600) 0 dropout (Dropout) (None, 1600) 0 dense (Dense) (None, 10) 16010 ================================================================= Total params: 34,826 Trainable params: 34,826 Non-trainable params: 0 _________________________________________________________________

Step 6 : Compiling and Trainning Model

In this section, we are going to compile our model. The hyperparameters/model parameters will be used from the config variable as discussed in the Section 4. The below snippets depicts the model compilation with config variable parameters.

model.compile(loss=config["loss"], optimizer=config["optimizer"], metrics=["accuracy"])

After successful compilation of the model, our final step is to train the model, with callback. Basically callback will be implemented in order to send the data of each epoch during model training to the wandb.ai which will be displayed in our project dashboard. The below snippets depicts the code for model training with wandb.ai.

history=model.fit(x_train, y_train, batch_size=config["batch_size"], epochs=config["epochs"], validation_split=config["validation_split"],callbacks=[WandbCallback()])
wandb: WARNING The save_model argument by default saves the model in the HDF5 format that cannot save custom objects like subclassed models and custom layers. This behavior will be deprecated in a future release in favor of the SavedModel format. Meanwhile, the HDF5 model is saved as W&B files and the SavedModel as W&B Artifacts.
Epoch 1/15 422/422 [==============================] - ETA: 0s - loss: 0.3768 - accuracy: 0.8868
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 49s 114ms/step - loss: 0.3768 - accuracy: 0.8868 - val_loss: 0.0852 - val_accuracy: 0.9785 Epoch 2/15 422/422 [==============================] - ETA: 0s - loss: 0.1160 - accuracy: 0.9650
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 48s 114ms/step - loss: 0.1160 - accuracy: 0.9650 - val_loss: 0.0600 - val_accuracy: 0.9817 Epoch 3/15 422/422 [==============================] - ETA: 0s - loss: 0.0867 - accuracy: 0.9739
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 49s 117ms/step - loss: 0.0867 - accuracy: 0.9739 - val_loss: 0.0474 - val_accuracy: 0.9878 Epoch 4/15 422/422 [==============================] - ETA: 0s - loss: 0.0733 - accuracy: 0.9776
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 51s 121ms/step - loss: 0.0733 - accuracy: 0.9776 - val_loss: 0.0439 - val_accuracy: 0.9882 Epoch 5/15 422/422 [==============================] - ETA: 0s - loss: 0.0636 - accuracy: 0.9804
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 52s 123ms/step - loss: 0.0636 - accuracy: 0.9804 - val_loss: 0.0403 - val_accuracy: 0.9883 Epoch 6/15 422/422 [==============================] - ETA: 0s - loss: 0.0572 - accuracy: 0.9821
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 50s 117ms/step - loss: 0.0572 - accuracy: 0.9821 - val_loss: 0.0371 - val_accuracy: 0.9908 Epoch 7/15 422/422 [==============================] - ETA: 0s - loss: 0.0534 - accuracy: 0.9835
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 48s 113ms/step - loss: 0.0534 - accuracy: 0.9835 - val_loss: 0.0336 - val_accuracy: 0.9917 Epoch 8/15 422/422 [==============================] - ETA: 0s - loss: 0.0497 - accuracy: 0.9846
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 49s 117ms/step - loss: 0.0497 - accuracy: 0.9846 - val_loss: 0.0334 - val_accuracy: 0.9915 Epoch 9/15 422/422 [==============================] - ETA: 0s - loss: 0.0450 - accuracy: 0.9859
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 51s 121ms/step - loss: 0.0450 - accuracy: 0.9859 - val_loss: 0.0306 - val_accuracy: 0.9915 Epoch 10/15 422/422 [==============================] - ETA: 0s - loss: 0.0432 - accuracy: 0.9865
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 50s 118ms/step - loss: 0.0432 - accuracy: 0.9865 - val_loss: 0.0301 - val_accuracy: 0.9927 Epoch 11/15 422/422 [==============================] - 48s 113ms/step - loss: 0.0413 - accuracy: 0.9870 - val_loss: 0.0317 - val_accuracy: 0.9917 Epoch 12/15 422/422 [==============================] - 50s 118ms/step - loss: 0.0390 - accuracy: 0.9876 - val_loss: 0.0301 - val_accuracy: 0.9917 Epoch 13/15 422/422 [==============================] - ETA: 0s - loss: 0.0369 - accuracy: 0.9880
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 50s 119ms/step - loss: 0.0369 - accuracy: 0.9880 - val_loss: 0.0287 - val_accuracy: 0.9928 Epoch 14/15 422/422 [==============================] - ETA: 0s - loss: 0.0348 - accuracy: 0.9893
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading. wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 51s 122ms/step - loss: 0.0348 - accuracy: 0.9893 - val_loss: 0.0259 - val_accuracy: 0.9927 Epoch 15/15 422/422 [==============================] - 50s 119ms/step - loss: 0.0341 - accuracy: 0.9891 - val_loss: 0.0292 - val_accuracy: 0.9927

Step 7 : Finishing Run Session

After successfully running the training process its time to close the session of wandb.ai by running the below snippets.

The following output will show with the metric statistics and graphs.

wandb.finish()

Step 8 : Wanddb.ai Dashboard

Now its time to log-in in the wandb.ai project session dashboard and see the results.

First, we will start with the config variable which we saved initially while starting the session. For this you need to navigate to the Project > Session > and information Panel, you will be able to see the config variable which is shown below.

Image 5. Config Variable Wandb.ai

We have also saved the model summary in the config variable so for this we need to go to the Project> Session > and Model Panel and we can analyse the model summary. Image 6. Plottrd Metrics wandb.ai

And finally, if you navigate to the Project > Session> and Graphs you can easily see the plotted graphs of the metrics which were used by the model.

In this article we have seen the basics of wandb.ai experimental metric logging. In the next article we will implement in detail how we can upload dataset and download it for training model as well as logging custom metrics in wanddb.ai as well as modifying graphs.

Special Thanks

As we say Car is useless if it doesn’t have a good engine similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr. P. Supraja” and “A. Helen Victoria”- guided me throughout the journey, from the bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.

Conclusion

In this article, I have discussed how to set up our IDE or Jupyter notebook with Wandb.ai for logging the parameters. Some important point from the article is mentioned below:

  • The experiment logging works by adding the callback while training the model.
  • We can visualize metrics loss and architecture of the model in Wandb.ai
  • For every run new session is created in Wandb.ai
  • Wandb.ai is compatible with Sklearn Pytorch, and Keras as well.

Refrences

Wandb.ai: Link
Wandb.ai Docs : Link

Written by

Ravi Shekhar Tiwari
AI Engineer, Bengaluru, Karnataka, India

Website Portfolio Blog

Medium Linkdin Twitter Facebook Twitter