Forex Project
Using LSTM to predict Dow Jones price
This was a great class and really got me thinking about how to apply deep learning solutions.
One of the things I'm interested in is analysis and predictions on time-series data
I did some reading and practicing outside of the course, and learned something about LSTM architecture. LSTM is a type of recurrent neural net that allows information contained in seqences of data to be leveraged in the model. In other words, relationships between successive data points can be used in the model, not just individual data points.
I also learned how to get started with the Skorch library, which is a wrapper around PyTorch that allows PyTorch models to be used in a typical Scikit-Learn pipeline. Since I use Scikit at work, I thought might be a good idea to get up to speed with Skorch.
So my general approach was to use:
- Skorch
- LSTM
To predict a time series.
I'm joining two daily datasets:
- Dow Jones Industrial Average
- Foreign exchange rates (Euro, Japanese Yen, Mexican Peso, Chines Yuan) in terms of US Dollar
And trying to use the previous 5 days (i.e. a moving window) of data to predict the closing Dow Jones price the following day
import pandas as pd
import numpy as np
import sqlite3
from sklearn.preprocessing import StandardScaler
import torch
import torch.nn as nn
import torch.functional as F
from skorch import NeuralNetRegressor
import seaborn as sns
import matplotlib.pyplot as plt
import pickle
con = sqlite3.connect(":memory:")
#read in stock market data:
df_djia = pd.read_csv("/mnt/c/Users/jdbri/Downloads/^DJI.csv",index_col="Date")
print(df_djia.head())
df_djia.to_sql("djia", con, if_exists="replace")
Open High Low Close \
Date
2000-01-03 11501.849609 11522.009766 11305.690430 11357.509766
2000-01-04 11349.750000 11350.059570 10986.450195 10997.929688
2000-01-05 10989.370117 11215.099609 10938.669922 11122.650391
2000-01-06 11113.370117 11313.450195 11098.450195 11253.259766
2000-01-07 11247.059570 11528.139648 11239.919922 11522.559570
Adj Close Volume
Date
2000-01-03 11357.509766 169750000
2000-01-04 10997.929688 178420000
2000-01-05 11122.650391 203190000
2000-01-06 11253.259766 176550000
2000-01-07 11522.559570 184900000
/home/jeff/.local/lib/python3.8/site-packages/pandas/core/generic.py:2602: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
sql.to_sql(
#read in forex data:
df = pd.read_csv("./FRB_H10.csv", na_values=['ND'], skiprows=5, index_col="Time Period").dropna() #first 5 rows are metadata
df.columns ="EUR CAD CNY MXN".split()
df['EUR'] = 1/df['EUR'] #the EUR column is originally in $/EUR, need to change to EUR/$
print(df.head())
df.to_sql("forex",con,if_exists="replace")
EUR CAD CNY MXN
Time Period
2000-01-03 0.984737 1.4465 8.2798 9.4015
2000-01-04 0.970026 1.4518 8.2799 9.4570
2000-01-05 0.967586 1.4518 8.2798 9.5350
2000-01-06 0.968617 1.4571 8.2797 9.5670
2000-01-07 0.971440 1.4505 8.2794 9.5200
/home/jeff/.local/lib/python3.8/site-packages/pandas/core/generic.py:2602: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
sql.to_sql(