In [2]:
import csv
import yaml
In [3]:
reader = csv.reader(open("../data/questions.csv"))
Read the first line and see the structure.
In [4]:
question_1 = reader.next()
In [5]:
question_1
Out[5]:
Yes, each line is converted into list and it has 6 items as expected. However, how can we use the last item? It is string type but it seems dictionary or json.
OK, let's try to convert it into dictionary.
In [6]:
yaml.load(question_1[-1].replace(": u'", ": '"))
Out[6]:
Now, you know how to convert csv files into other formats that you want. So, you can handle all the given files.
In [12]:
reader = csv.reader(open("../data/train.csv"))
However, you know that train.csv has header which is not data we want to use. So, you might need to get rid of the first line. By the way, we need to know that reader returned by csv.reader is enumerater not list. So, you just use reader only once. If you want to use it once again, you need to use csv.reader once.
In [13]:
reader.next()
Out[13]:
OK, now reader is on the 2nd line of the csv flie. Try to convert it into list.
In [14]:
train_set = []
for row in reader:
train_set.append(row)
In [15]:
print len(train_set)
In [16]:
print len(train_set[0])
In [18]:
print train_set[0]
print train_set[-1]
I guess you know realized that why csv.reader return enumerator instead of list. This is because we don't know how the size of given csv file. If the file is too big, we got memory fault. So, in this case, enumerator is much bettern than list.
It means that, if we don't need to convert csv into list, please don't convert csv into list to save memory.
In [ ]: