Assignment 4 - Speech Command Recognition with M3, M5, M11, M18 CNN networks - 30 Epoch

Assignment 4 - Final Course Project for jovian.ml ZeroToGANS course on the implementation of neural networks using PyTorch.

This notebook implements speech command recognition using convolutional neural networks trained on the Google SpeechCommand dataset.

The networks are based on a 3, 5, 11 or 18 layer architecture convolutional neural networks (M3, M5, M11, M18) as described in this Very Deep Convolutional Neural Networks For Raw Waveforms paper. The networks are trained on the time domain waveform inputs of the SpeechCommand dataset.

The dataset is part of the Pytorch common datasets [https://pytorch.org/audio/stable/datasets.html]. There is more information on the dataset in this Speech Commands paper. The dataset consists of more than 105,000 WAVE audio files of various speakers saying thirtyfive different words such as "yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go" and numerical digits 0-9. Similarly to MNIST dataset for images, using the SpeechCommand dataset enables us to understand and work with techniques involved in audio processing and recognition.

project_name='Assignment 4 - Speech Command Recognition with M3, M5, M11, M18 CNN networks'

# Uncomment the following line to run in Google Colab

# CPU:
#!pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# GPU:
!pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

import os

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchaudio

import matplotlib.pyplot as plt
import IPython.display as ipd
from tqdm.notebook import tqdm

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already satisfied: torch==1.7.0+cu101 in /usr/local/lib/python3.6/dist-packages (1.7.0+cu101)
Requirement already satisfied: torchvision==0.8.1+cu101 in /usr/local/lib/python3.6/dist-packages (0.8.1+cu101)
Collecting torchaudio==0.7.0
  Downloading https://files.pythonhosted.org/packages/3f/23/6b54106b3de029d3f10cf8debc302491c17630357449c900d6209665b302/torchaudio-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (7.6MB)
     |████████████████████████████████| 7.6MB 11.9MB/s 
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (1.19.5)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (0.16.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (3.7.4.3)
Requirement already satisfied: dataclasses in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (0.8)
Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.6/dist-packages (from torchvision==0.8.1+cu101) (7.0.0)
Installing collected packages: torchaudio
Successfully installed torchaudio-0.7.0

/usr/local/lib/python3.6/dist-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda