To get up and running: install Anaconda: https://www.continuum.io/downloads get the spyder ide anaconda includes SciPy and scikit-learn then you can install packages using conda or pip
First install pip by downloading from here to some default folder http://pip.readthedocs.io/en/stable/installing/
in cmd (or use the command prompt in Spyder under Tools), point it to the directory where get-pip.py is and run this
python get-pip.py
then install each package from the command prompt like this:
pip install numpy
pip install yahoo_finance
See what version of python you have installed by opening cmd and type: python --version
confirm installation of scipy, numpy, matplotlib, pandas. This is the python ecosystem.
In [6]:
#scipy
import scipy
print('scipy: {}'.format(scipy.__version__))
#numpy
import numpy
print('numpy: {}'.format(numpy.__version__))
# matplotlib
import matplotlib
print('matplotlib: {}'.format(matplotlib.__version__))
# pandas
import pandas
print('pandas: {}'.format(pandas.__version__))
# scikit-learn
import sklearn
print('sklearn: {}'.format(sklearn.__version__))
In [1]:
# Strings
data = 'hello world'
print(data[0])
print(len(data))
print(data)
number counts start with 0 in python. so the first letter in hello world is in the 0 place.
In [2]:
# Numbers
value = 123.1
print(value)
value = 10
print(value)
In [3]:
# Boolean
a = True
b = False
print(a, b)
In [4]:
# Multiple Assignment
a, b, c = 1, 2, 3
print(a, b, c)
In [5]:
# No value
a = None
print(a)
In [6]:
# If-Then-Else Conditional
value = 99
if value == 99:
print('That is fast')
elif value > 200:
print('That is too fast')
else:
print('That is safe')
Notice the colon (:) at the end of the condition and the meaningful tab intend for the code block under the condition.
In [7]:
# For-Loop
for i in range(10):
print(i)
In [8]:
# While-Loop
i = 0
while i < 10:
print(i)
i += 1
In [9]:
# Tuple
# Tuples are read-only collections of items.
a = (1, 2, 3)
print(a)
In [10]:
# List
# Lists use the square bracket notation and can be index using array notation.
mylist = [1, 2, 3]
print("Zeroth Value: %d" % mylist[0])
mylist.append(4)
print("List Length: %d" % len(mylist))
for value in mylist:
print(value)
In [11]:
# Dictionary
# Dictionaries are mappings of names to values, like key-value pairs. Note the use of the curly bracket and colon notations when defining the dictionary.
mydict = {'a': 1, 'b': 2, 'c': 3}
print("A value: %d" % mydict['a'])
mydict['a'] = 11
print("A value: %d" % mydict['a'])
print("Keys: %s" % mydict.keys())
print("Values: %s" % mydict.values())
for key in mydict.keys():
print(mydict[key])
Functions The biggest gotcha with Python is the whitespace. Ensure that you have an empty new line after indented code. The example below defines a new function to calculate the sum of two values and calls the function with two arguments.
In [12]:
# Sum function
def mysum(x, y):
return x + y
# Test sum function
result = mysum(1, 3)
print(result)
In [13]:
# define an array
import numpy
mylist = [1, 2, 3]
myarray = numpy.array(mylist)
print(myarray)
print(myarray.shape)
Notice how we easily converted a Python list to a NumPy array.
In [14]:
# Access Data
# Array notation and ranges can be used to efficiently access data in a NumPy array.
# access values
import numpy
mylist = [[1, 2, 3], [3, 4, 5]]
myarray = numpy.array(mylist)
print(myarray)
print(myarray.shape)
print("First row: %s" % myarray[0])
print("Last row: %s" % myarray[-1])
print("Specific row and col: %s" % myarray[0, 2])
print("Whole col: %s" % myarray[:, 2])
In [15]:
# Arithmetic
# NumPy arrays can be used directly in arithmetic.
import numpy
myarray1 = numpy.array([2, 2, 2])
myarray2 = numpy.array([3, 3, 3])
print("Addition: %s" % (myarray1 + myarray2))
print("Multiplication: %s" % (myarray1 * myarray2))
In [17]:
# Line Plot
# example below creates a simple line plot from one dimensional data
# basic line plot
import matplotlib.pyplot as plt
import numpy
myarray = numpy.array([1, 2, 3])
plt.plot(myarray)
plt.xlabel('some x axis')
plt.ylabel('some y axis')
plt.show()
In [18]:
# Scatter Plot
# a simple example of creating a scatter plot from two dimensional data
# basic scatter plot
import matplotlib.pyplot as plt
import numpy
x = numpy.array([1, 2, 3])
y = numpy.array([2, 4, 6])
plt.scatter(x,y)
plt.xlabel('some x axis')
plt.ylabel('some y axis')
plt.show()
In [19]:
# Series
# A series is a one dimensional array of data where the rows are labeled using a time axis.
import numpy
import pandas
myarray = numpy.array([1, 2, 3])
rownames = ['a', 'b', 'c']
myseries = pandas.Series(myarray, index=rownames)
print(myseries)
In [20]:
#You can access the data in a series like a NumPy array and like a dictionary, for example
print(myseries[0])
print(myseries['a'])
In [21]:
# DataFrame
# A data frame is a multi-dimensional array where the rows and the columns can be labeled.
import numpy
import pandas
myarray = numpy.array([[1, 2, 3], [4, 5, 6]])
rownames = ['a', 'b']
colnames = ['one', 'two', 'three']
mydataframe = pandas.DataFrame(myarray, index=rownames, columns=colnames)
print(mydataframe)
In [22]:
# Data can be index using column names.
print("method 1:")
print("one column:\n%s" % mydataframe['one'])
print("method 2:")
print("one column:\n%s" % mydataframe.one)
You can load your CSV data using Pandas and the pandas.read csv() function. This function is very flexible The function returns a pandas.DataFrame http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
In [ ]:
# Load CSV using Pandas
# https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes
from pandas import read_csv
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(filename, names=names)
print(data.shape)
In [26]:
# Load CSV using Pandas from URL
from pandas import read_csv
url = 'https://goo.gl/vhm1eU'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(url, names=names)
print(data.shape)
In [27]:
#look at the raw data
# 1st column is row number
peek = data.head(20)
print(peek)
In [30]:
# Dimensions of your data
# You can review the shape and size of your dataset by printing the shape property on the Pandas DataFrame.
shape = data.shape
print(shape)
shows 768 rows, 9 columns
In [31]:
# Data Type For Each Attribute
#The type of each attribute is important.
#Strings may need to be converted to floating point values or
#integers to represent categorical or ordinal values. You can get an idea
#of the types of attributes by peeking at the raw data, as above.
#You can also list the data types used by the DataFrame to
#characterize each attribute using the dtypes property.
types = data.dtypes
print(types)
Stopped on page 33.