This is a simple dataset with 10,000 rows, I use it for my deep learning course. There is no noise, no missing values, and it is 100% predictable if you know the rule. The following fields are present:
The following code shows 10 sample rows.
In [31]:
import pandas as pd
path = "./data/"
filename = os.path.join(path,"toy1.csv")
df = pd.read_csv(filename,na_values=['NA','?'])
df[0:10]
Out[31]:
This dataset deals with the weights of different geometric solids, of different sizes, and of different metals. Use the columns metal, shape, height, length, and width to determine the weight.
The geometric solids are:
Box
$ V = lwh $
Cylinder
$ V = \pi r^2 h = \pi {\frac{w}{2}}^2 h $
Sphere
$ V = \frac{4}{3} \pi r^3 = \frac{4}{3} \pi {\frac{h}{2}}^3 $
The following code shows how to exactly calculate the weight for any row in the dataset. Of course, the idea is to create a model, of some sort, that is able to obtain the same value.
In [35]:
import math
def calculate_weight(metal,shape,h,l,w):
metal_name = ['gold','silver','bronze','tin','platinum']
metal_density = [19.32,10.49, 9.29,7.31, 21.09 ]
shape_name = ['sphere','box','cylinder']
metal = metal_name.index(metal)
shape = shape_name.index(shape)
if shape==0:
# sphere
vol = (4.0/3.0) * math.pi * ((l/2.0)**3)
elif shape==1:
# box
vol = l * w * h
elif shape==2:
# cylinder
vol = math.pi * ((w/2.0)**2.0) * h
weight = vol * metal_density[metal]
return weight
print(calculate_weight('silver', 'cylinder', 6, 5, 5))
print(calculate_weight('bronze', 'cylinder', 2, 6, 6))
print(calculate_weight('bronze', 'sphere', 2, 2, 2))
print(calculate_weight('silver', 'sphere', 6, 6, 6))
print(calculate_weight('tin', 'cylinder', 10, 6, 6))
print(calculate_weight('tin', 'box', 10, 2, 2))
print(calculate_weight('tin', 'box', 6, 1, 4))
In [ ]: