Identifying Hard Coded Single Precision Variables

In looking to merge MARCS with DSEP, it is immediately clear that the two codes are incompatible when it comes to passing variables. MARCS is written with single precision declarations for real variables and DSEP is written with double precision declarations. Type conversions can be carried out in several different ways, but perhaps it is worth upgrading MARCS to double precision. However, while it is simple enough to redefine real declarations to real(dp), where dp is a double precision type declaration, there are many instances of hard coded single precision variables. That is, variables defined with X.XeYYY instead of X.XdYYY. Finding such declarations is non-trivial.


In [1]:
import fileinput as fi

An initial attempt to extract all instances of hard coded single precision real variables was done using grep and a regular expression,

grep -i "[0-9].e.[0-9]" *.f > marcs_single_precision.txt

where -i flags the result to be case insensitive. Let's read in the result.


In [2]:
single_precision_vars = [line.rstrip('\n') for line in fi.input('marcs_single_precision.txt')]

Take a quick look at the first few lines,


In [3]:
print single_precision_vars[:5]


['CIAh2h.f:75:        propac=1.e-30', 'CIAh2h2.f:75:        propac=1.e-30', 'CIAh2he.f:75:        propac=1.e-30', 'CIAhhe.f:75:        propac=1.e-30', 'archiv.f:156:cc        print 215, k,log10(max(1.e-30,xmettryck(k,1))),']

Everything looks reasonble, but I know for a fact that scientific notation can also be declared in FORMAT statements. Therefore, it's necessary to remove these from the entry list.


In [4]:
single_precision_vars = [line for line in single_precision_vars if line.lower().find('format') == -1]

Make sure we haven't removed valid entries, as compared to the initial output.


In [5]:
print single_precision_vars[:5]


['CIAh2h.f:75:        propac=1.e-30', 'CIAh2h2.f:75:        propac=1.e-30', 'CIAh2he.f:75:        propac=1.e-30', 'CIAhhe.f:75:        propac=1.e-30', 'archiv.f:156:cc        print 215, k,log10(max(1.e-30,xmettryck(k,1))),']

We can now organize the data by subroutine and output the lines on which a single precision real is expcted to occur. There will be false positives, but those can be checked by eye.


In [6]:
file_stream = open('single_precision_index.txt', 'w')
routine = ''
for line in single_precision_vars:
    line = line.split(':')
    if line[0] == routine:
        file_stream.write('\t{:4s} :: {:50s}\n'.format(line[1].strip(), line[2].strip()))
    else:
        routine = line[0]
        file_stream.write('\n\n{:s}\n\n'.format(line[0].rstrip('.f').upper()))
        file_stream.write('\t{:4s} :: {:70s}\n'.format(line[1].strip(), line[2].strip()))
file_stream.close()

Markdown is also a helpful format.


In [7]:
file_stream = open('single_precision_index.md', 'w')
routine = ''
for line in single_precision_vars:
    line = line.split(':')
    if line[0] == routine:
        file_stream.write('\t{:4s} :: {:50s}\n'.format(line[1].strip(), line[2].strip()))
    else:
        routine = line[0]
        file_stream.write('\n\n# {:s}\n\n'.format(line[0].rstrip('.f').upper()))
        file_stream.write('\t{:4s} :: {:70s}\n'.format(line[1].strip(), line[2].strip()))
file_stream.close()

Changing these can be done by a single person, but would perhaps benefit from multiple editors working on a subset of the declarations.


In [8]:
print "There are {:.0f} lines that need editing.".format(len(single_precision_vars))


There are 287 lines that need editing.

Which means that, split among 3 people, it's approximately 96 lines of code per person. There are some routines that contain significantly more lines that require editing, as compared to others. We can create a histogram showing number of lines per subroutine.


In [9]:
N_lines = []
routine = ''
n = 0
for line in single_precision_vars:
    line = line.split(':')
    if line[0] == routine:
        n += 1
    else:
        # output previous routine count
        N_lines.append([routine, n])
        n = 0
        
        # start a new routine
        routine = line[0]
        n += 1
N_lines.pop(0)


Out[9]:
['', 0]

Now we should aim to sort and then distribute routines to be modified.


In [10]:
N_lines = sorted(N_lines, key = lambda routine: routine[1])

In [11]:
print N_lines
print "\n \t There are {:.0f} subroutines that need editing.".format(len(N_lines))


[['CIAh2h.f', 1], ['CIAh2h2.f', 1], ['CIAh2he.f', 1], ['CIAhhe.f', 1], ['bpl.f', 1], ['checkpartf.f', 1], ['gausi.f', 1], ['momeqcheck.f', 1], ['scale.f', 1], ['setdis.f', 1], ['oslistmo.f', 2], ['osmet_35.f', 2], ['osmet_separate_35.f', 2], ['pemake.f', 2], ['injon.f', 3], ['molfys.f', 3], ['startm.f', 3], ['inabs.f', 4], ['tranfr.f', 4], ['archiv.f', 5], ['osopac_35.f', 5], ['die_pe.f', 7], ['die_pe_lu.f', 7], ['jon.f', 7], ['osmainb.f', 8], ['eqmol_pe.f', 10], ['eqmol_pe_lu.f', 10], ['ossolve.f', 10], ['hydropacmodif.f', 32], ['detabs.f', 43], ['hlinopbpz.f', 53], ['hlinopmodif.f', 53]]

 	 There are 32 subroutines that need editing.

This is not too unreasonable for a single person, but additional checking for single precision declarations and functions would be beneficial. This requires that we also consider the total number of lines in each file that needs to be edited.


In [12]:
lines_in_routines = [line.split() for line in fi.input('marcs_routines_nlines.txt')]

In [13]:
lines_in_routines = [[x[1], int(x[0])] for x in lines_in_routines]

We now have a record for number of single precision statements to be changed and the number of lines in the entire routine.


In [14]:
total_lines = []
for routine in lines_in_routines:
    n_edits = [x[1] for x in N_lines if x[0] == routine[0]]
    if n_edits == []:
        n_edits = 0
    else:
        n_edits = int(n_edits[0])
    total_lines.append([routine[0], routine[1] + n_edits])

In [15]:
total_lines = sorted(total_lines, key = lambda routine: routine[1])

We've now combined the total number of lines in each file with the number of edits to fixed single precision constants that need to be made. Of course, there are instances where function calls need to be verified, but one expects this is somewhat proportional to the number of lines in the entire file.


In [16]:
print "\t There are {:.0f} subroutines that need checking.".format(len(total_lines))


	 There are 77 subroutines that need checking.

If divided equally, that means each of 3 people will need to check 26 separate subroutines. However, these should we weighted so that each person searches roughly the same number of routines and the same number of lines.


In [18]:
person_1 = []
person_2 = []
person_3 = []
person_4 = []
for i, routine in enumerate(total_lines):
    if i % 4 == 0:
        person_1.append(routine)
    elif i % 4 == 1:
        person_2.append(routine)
    elif i % 4 == 2: 
        person_3.append(routine)
    elif i % 4 == 3:
        person_4.append(routine)
    else:
        print "Whoops, misplaced", routine

Now, let's confirm that everything looks about equal.


In [19]:
print len(person_1), len(person_2), len(person_3), len(person_4)


20 19 19 19

and the number of lines of code


In [20]:
for person in [person_1, person_2, person_3, person_4]:
    print sum([x[1] for x in person])


6086
4167
4646
4853

So that didn't work out quite as well as I had anticipated. Certainly person 2 will have considerably more work cut out for them, of order 1400 lines of code.

Looking at the routines in each set, it's possible to even up the 1st and 3rd sets in terms of lines of code by switching one entry with approx. 150 lines of code.


In [24]:
print person_1
print
print person_2
print
print person_3
print
print person_4


[['kap5.f', 13], ['newsta.f', 15], ['zerof.f', 19], ['lubksb.f', 30], ['qas.f', 49], ['setdis.f', 60], ['ludcmp.f', 66], ['CIAh2h.f', 81], ['osdata_35.f', 81], ['scale.f', 99], ['trrays_pr.f', 127], ['spline.f', 161], ['osmet_35.f', 213], ['startm.f', 258], ['osmainb.f', 291], ['archiv.f', 384], ['marcs35sph.f', 489], ['oslistmo.f', 618], ['die_pe.f', 754], ['ossolve.f', 2278]]

[['problemstop.f', 13], ['lenstr.f', 18], ['dumin.f', 20], ['clock.f', 31], ['checkpartf.f', 50], ['gausi.f', 61], ['pemake.f', 72], ['CIAh2h2.f', 81], ['readabund.f', 96], ['algebn.f', 105], ['modjon.f', 135], ['onfrom.f', 169], ['takemolec.f', 216], ['makeabund.f', 268], ['oscontmol.f', 296], ['hydropacmodif.f', 389], ['injon.f', 498], ['die_pe_lu.f', 691], ['osopac_35.f', 958]]

[['initjn.f', 14], ['osopacr.f', 19], ['bpl.f', 29], ['four.f', 35], ['momeqcheck.f', 54], ['termo.f', 65], ['matinv.f', 73], ['CIAh2he.f', 81], ['tausca.f', 97], ['trrays.f', 106], ['osmet_separate_35.f', 151], ['tabs.f', 185], ['traneq.f', 243], ['osinit.f', 268], ['oldsta.f', 302], ['tranfr.f', 435], ['inabs.f', 548], ['eqmol_pe.f', 744], ['hlinopmodif.f', 1197]]

[['vvmlt.f', 14], ['rossop.f', 19], ['mult.f', 29], ['molfys.f', 43], ['qtrav.f', 57], ['termon.f', 65], ['vaagl.f', 74], ['CIAhhe.f', 81], ['transc.f', 98], ['partf.f', 122], ['molecpartf.f', 153], ['funcv.f', 200], ['oldsta_test.f', 253], ['osabsko.f', 277], ['tryck.f', 344], ['jon.f', 456], ['detabs.f', 587], ['eqmol_pe_lu.f', 747], ['hlinopbpz.f', 1234]]

In [26]:
person_2.append(person_4.pop(-5))

Now, trying this again,


In [27]:
for person in [person_1, person_2, person_3, person_4]:
    print sum([x[1] for x in person])


6086
4511
4646
4509

Much better!

From these lists, we can create simple markdown and text file to-do lists for each person.


In [33]:
for i, person in enumerate([person_1, person_2, person_3, person_4]):
    text_stream = open('Person_{:02.0f}.txt'.format(i + 1), 'w')
    mark_stream = open('Person_{:02.0f}.md'.format(i + 1), 'w')
    for routine in person:
        s = '{:20s}  ::  {:4.0f} lines \n'.format(routine[0], routine[1])
        text_stream.write(s)
        mark_stream.write('### ' + s)
    text_stream.close()
    mark_stream.close()

In [ ]: