In [ ]:
# Reading CSV
In this section we read a more unstructured data set:
It's a Git log output in the following format.
<timestamp><whitespace><timezone><tabulator><author>
It contains two different separators: whitespace and tabular. Here is an the content of the file datasets/mixed_dataset.csv
1514531161 -0800 Linus Torvalds
1514489303 -0500 David S. Miller
1514487644 -0800 Tom Herbert
1514487643 -0800 Tom Herbert
1514482693 -0500 Willem de Bruijn
We can read in this kind of data:
In [54]:
import pandas as pds
pd.read_csv(
"datasets/mixed_separators.txt",
sep="^([0-9]*?) (.*?)\t(.*?)$",
engine='python',
names=['timestamp', 'timezone', 'author'],
header=None)
Out[54]: