Learn how to implement speech command recognition using convolutional neural networks trained on the Google SpeechCommand dataset with M3, M5, M11, M18 CNN networks. This project is part of the ZeroToGANS course on neural networks using PyTorch.
Assignment 4 - Final Course Project for jovian.ml ZeroToGANS course on the implementation of neural networks using PyTorch.
This notebook implements speech command recognition using convolutional neural networks trained on the Google SpeechCommand dataset.
The networks are based on a 3, 5, 11 or 18 layer architecture convolutional neural networks (M3, M5, M11, M18) as described in this Very Deep Convolutional Neural Networks For Raw Waveforms paper. The networks are trained on the time domain waveform inputs of the SpeechCommand dataset.
The dataset is part of the Pytorch common datasets [https://pytorch.org/audio/stable/datasets.html]. There is more information on the dataset in this Speech Commands paper. The dataset consists of more than 105,000 WAVE audio files of various speakers saying thirtyfive different words such as "yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go" and numerical digits 0-9. Similarly to MNIST dataset for images, using the SpeechCommand dataset enables us to understand and work with techniques involved in audio processing and recognition.
project_name='Assignment 4 - Speech Command Recognition with M3, M5, M11, M18 CNN networks'
# Uncomment the following line to run in Google Colab # CPU: #!pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html # GPU: !pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html import os import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import torchaudio import matplotlib.pyplot as plt import IPython.display as ipd from tqdm.notebook import tqdm
Looking in links: https://download.pytorch.org/whl/torch_stable.html Requirement already satisfied: torch==1.7.0+cu101 in /usr/local/lib/python3.6/dist-packages (1.7.0+cu101) Requirement already satisfied: torchvision==0.8.1+cu101 in /usr/local/lib/python3.6/dist-packages (0.8.1+cu101) Collecting torchaudio==0.7.0 Downloading https://files.pythonhosted.org/packages/3f/23/6b54106b3de029d3f10cf8debc302491c17630357449c900d6209665b302/torchaudio-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (7.6MB) |████████████████████████████████| 7.6MB 11.9MB/s Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (1.19.5) Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (0.16.0) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (126.96.36.199) Requirement already satisfied: dataclasses in /usr/local/lib/python3.6/dist-packages (from torch==1.7.0+cu101) (0.8) Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.6/dist-packages (from torchvision==0.8.1+cu101) (7.0.0) Installing collected packages: torchaudio Successfully installed torchaudio-0.7.0
/usr/local/lib/python3.6/dist-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail. '"sox" backend is being deprecated. '
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(device)