This is a simple function to load from a text file a particular block of data that is headed with some identification text.

The function

Given a text file with blocks of data (each block headed with some title or comment), the function Data_Block_Loader() (see below) loads the data block from the file and return a list of data columns (tuples). It has been tested in Python 2.7.6 and in Python 3.4.1, and only makes use of the standard library (no use of Numpy or Scipy).

For very large data sets, a Numpy implementation would be desirable to avoid computing costs at loops. For normal data sets, this function should be more than enough.


In [8]:
%%file Data_Block_Loader.py
def Data_Block_Loader(fname,keytext,instance=1,skip=0):
    '''Loads a block of data from a file and returns a list of tuples,
    with each tuple corresponding to one column of the data block.
    
    Parameters
    ----------
    fname : a string representing the filename 
            (including a full path if the file is not in the working directory).

    keytext : any identification text at the head of the data block that is to be loaded.
    
    instance : (optional) If keytext appears in more than one data block header, 
                load the block in the position indicated by instance.

    skip : (optional) The number of lines to skip just after the header of the data block
            before starting loading data.    
    '''
    
    #- Read the list of file lines.
    f = open(fname)
    lines = f.readlines()
    f.close()

    #- Eliminate newline and return characters, if any.    
    numlines = len(lines)
    lines = [lines[i].replace('\n','').replace('\r','') for i in range(numlines)]
    
    #- Loop until keytext is found, and load data line by line as long as the line text is not empty and can be casted to float.
    rows = []
    inst = 1
    
    for i in range(numlines):
        if keytext in lines[i] and inst == instance:
            for j in range(i+1+skip,numlines):
                try:
                    #- Split the line text. Replace commas (if any) because split() won't interpret them as separators.
                    splitted_list = lines[j].replace(',',' ').split() 
                    if len(splitted_list) == 0:
                        raise ValueError
                    rowdata = [float(splitted_list[i]) for i in range(len(splitted_list))]
                    rows.append(rowdata)
                except ValueError: #- The data block has finished because the line text cannot be casted to float.
                    break
            break
        elif keytext in lines[i]:
            inst += 1

    #- Ensure that all rows have the same number of points. If not, fill with nan (not a number)
    if len(rows) > 0:
        rows_lengths = [len(rows[i]) for i in range(len(rows))]
        max_length = max(rows_lengths)
    
        for i in range(len(rows)):
            for j in range(max_length-rows_lengths[i]):
                rows[i].append(float('nan'))
    
    #- Transpose the data so that every tuple is a column of the data block.
    block = list(zip(*rows)) #The function list() was added to make it Python 3 compatible.
    
    #- Inform and return.
    if len(block) > 0:
        print('{0} columns were loaded, each with {1} rows.'.format(len(block),len(block[0])))
        return block
    else:
        print('The block was not found or is empty.')
        return None


Overwriting Data_Block_Loader.py

Example

Consider a data file called MyData.txt. The data has the following contents:

Results of the simulation blah blah

    Potentials generated with A
    2.346   3.624   6.346   3.645
    8.135   7.234   5.245   1.324
    7.123   2.234   9.134   2.234
    3.141 

    Potentials generated with B
    and considering such and such
    6.234   3.243   6.234   2.345
    5.234   6.234   2.345   6.345
    9.123   4.134   6.134   3.123
    2.718   9.123

Load the data

Let's load the first block into a variable called myblock:


In [2]:
myblock = Data_Block_Loader('MyData.txt','Potentials generated with A')


4 columns were loaded, each with 4 rows.

myblock is now a list of tuples (each representing a column). Let's see the contents of the second column:


In [3]:
myblock[1]


Out[3]:
(3.624, 7.234, 2.234, nan)

The same result would be obtained if the parameter keytext is set to 'with A'


In [4]:
myblock = Data_Block_Loader('MyData.txt','with A')


4 columns were loaded, each with 4 rows.

To load the second block using a keytext that is present also in the first block (like for example 'Potentials'), we can use the parameter instance and set it to 2:


In [5]:
myblock = Data_Block_Loader('MyData.txt','Potentials',instance=2)


The block was not found or is empty.

In this case, the block was not found because there is an extra text line between the line containing the keytext 'Potentials' and the data itself. In these cases, we can use the skip parameter:


In [6]:
myblock = Data_Block_Loader('MyData.txt','Potentials',instance=2,skip=1)


4 columns were loaded, each with 4 rows.

Display the data

We use the basic plot function from matplotlib.pyplot to display, for example, column 0 vs. column 3:


In [7]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(myblock[3],myblock[0],'bo',ms=10)
plt.show()