This is a simple function to load from a text file a particular block of data that is headed with some identification text.
Given a text file with blocks of data (each block headed with some title or comment), the function Data_Block_Loader() (see below) loads the data block from the file and return a list of data columns (tuples). It has been tested in Python 2.7.6 and in Python 3.4.1, and only makes use of the standard library (no use of Numpy or Scipy).
For very large data sets, a Numpy implementation would be desirable to avoid computing costs at loops. For normal data sets, this function should be more than enough.
In [8]:
%%file Data_Block_Loader.py
def Data_Block_Loader(fname,keytext,instance=1,skip=0):
'''Loads a block of data from a file and returns a list of tuples,
with each tuple corresponding to one column of the data block.
Parameters
----------
fname : a string representing the filename
(including a full path if the file is not in the working directory).
keytext : any identification text at the head of the data block that is to be loaded.
instance : (optional) If keytext appears in more than one data block header,
load the block in the position indicated by instance.
skip : (optional) The number of lines to skip just after the header of the data block
before starting loading data.
'''
#- Read the list of file lines.
f = open(fname)
lines = f.readlines()
f.close()
#- Eliminate newline and return characters, if any.
numlines = len(lines)
lines = [lines[i].replace('\n','').replace('\r','') for i in range(numlines)]
#- Loop until keytext is found, and load data line by line as long as the line text is not empty and can be casted to float.
rows = []
inst = 1
for i in range(numlines):
if keytext in lines[i] and inst == instance:
for j in range(i+1+skip,numlines):
try:
#- Split the line text. Replace commas (if any) because split() won't interpret them as separators.
splitted_list = lines[j].replace(',',' ').split()
if len(splitted_list) == 0:
raise ValueError
rowdata = [float(splitted_list[i]) for i in range(len(splitted_list))]
rows.append(rowdata)
except ValueError: #- The data block has finished because the line text cannot be casted to float.
break
break
elif keytext in lines[i]:
inst += 1
#- Ensure that all rows have the same number of points. If not, fill with nan (not a number)
if len(rows) > 0:
rows_lengths = [len(rows[i]) for i in range(len(rows))]
max_length = max(rows_lengths)
for i in range(len(rows)):
for j in range(max_length-rows_lengths[i]):
rows[i].append(float('nan'))
#- Transpose the data so that every tuple is a column of the data block.
block = list(zip(*rows)) #The function list() was added to make it Python 3 compatible.
#- Inform and return.
if len(block) > 0:
print('{0} columns were loaded, each with {1} rows.'.format(len(block),len(block[0])))
return block
else:
print('The block was not found or is empty.')
return None
Consider a data file called MyData.txt. The data has the following contents:
Results of the simulation blah blah
Potentials generated with A
2.346 3.624 6.346 3.645
8.135 7.234 5.245 1.324
7.123 2.234 9.134 2.234
3.141
Potentials generated with B
and considering such and such
6.234 3.243 6.234 2.345
5.234 6.234 2.345 6.345
9.123 4.134 6.134 3.123
2.718 9.123
In [2]:
myblock = Data_Block_Loader('MyData.txt','Potentials generated with A')
myblock
is now a list of tuples (each representing a column). Let's see the contents of the second column:
In [3]:
myblock[1]
Out[3]:
The same result would be obtained if the parameter keytext
is set to 'with A'
In [4]:
myblock = Data_Block_Loader('MyData.txt','with A')
To load the second block using a keytext that is present also in the first block (like for example 'Potentials'
), we can use the parameter instance
and set it to 2:
In [5]:
myblock = Data_Block_Loader('MyData.txt','Potentials',instance=2)
In this case, the block was not found because there is an extra text line between the line containing the keytext 'Potentials
' and the data itself. In these cases, we can use the skip
parameter:
In [6]:
myblock = Data_Block_Loader('MyData.txt','Potentials',instance=2,skip=1)
We use the basic plot
function from matplotlib.pyplot
to display, for example, column 0 vs. column 3:
In [7]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(myblock[3],myblock[0],'bo',ms=10)
plt.show()