Let's see what is available in numpy library in order to do statistics on an n-dimensional array. First get CSV files for AAPL, GOOG, IBM and GLD, from this page: https://finance.yahoo.com/quote/AAPL/history?p=AAPL.
Plot and calculate mean, std etc...
import os
import pandas as pd
import matplotlib.pyplot as plt
def symbol_to_path(symbol, base_dir="data"):
return os.path.join(base_dir, "{}.csv".format(str(symbol)))
def get_data(symbols, dates):
df = pd.DataFrame(index=dates)
if 'SPY' not in symbols:
symbols.insert(0, 'SPY')
for symbol in symbols:
path = symbol_to_path(symbol)
newDf = pd.read_csv(path,
index_col='Date',
parse_dates=True,
usecols=['Date', 'Adj Close'],
na_values=['nan'])
newDf = newDf.rename(columns={'Adj Close': symbol})
df = df.join(newDf)
df = df.dropna()
return df
def plot_data(df):
df.plot()
plt.show()
def test_run():
# Define a date range
dates = pd.date_range('2010-01-22', '2012-01-26')
# Choose stock symbols to read
symbols = ['GOOG', 'IBM', 'GLD', 'AAPL']
# Get stock data
df = get_data(symbols, dates)
print "Mean:"
print df.mean()
print "Median:"
print df.median()
print "Standard deviation:"
print df.std()
print "Sum:"
print df.sum()
plot_data(df)
if __name__ == "__main__":
test_run()
Here is the output. First lets have a look at the image, so we have better understanding of what the data look like.
Now we can check calculated numbers. Check standard deviation, the highest number is for Google, because their stocks are going up and down by a lot. Standard deviation is measure of deviation from the center, mean, of the data.