Using Bioinformatic Algorithms to Analyze the Politics of Form in Modernist Urdu Poetry

A. Sean Pue, Michigan State University pue@msu.edu, @seanpue

HASTAC 2015 28 May 2015


In [1]:
import settings
import re
from scanner import Scanner

Here, we are dealing with Miraji's poem "Nāg sabhā kā nāch". First, let's transliterate it.


In [3]:
from IPython.display import Image
Image('nagsabha.png')


Out[3]:

Here is the original


In [4]:
original_poem = """ناگ راج سے، ناگ راج سےملنے جاؤں آج، 
ناگ راج ساگر میں بیٹھے سر پر پہنے تاج، 
ناگ راج کی سبھا جمی ہے خوشبوئیں لہرائیں، 
بہتی، رُکتی، اُلجھتی جاتی، من کو مست بنائیں، 
چندرماں کی کرنیں آئیں بل کھائیں۔۔۔ بل کھائیں، 
ننّھے ننّھے، ہلکے ہلکے، میٹھے گیت سنائیں، 
گاتے گاتے تھکتی جائیں، سوئیں سُکھ کی نِنید، 
(ناگ سبھا میں) ہلکی ہلکی، میٹھی میٹھی نیند، 
کچھ گھڑیاں یوں بِیتیں اور پھر سنکھ بجائیں ناگ، 
وحشی اور بے باک، انوکھے نشّے لائیں ناگ، 
سوئی کرنیں جاگ اُٹھیں اور ناچیں سندر ناچ، 
دیواداسی یاد آجائے، ہاں۔۔۔ اور مندر۔۔۔ ناچ، 
ناگ سبھا کے ناچ انوکھے، سارا ساگر۔۔۔ ناچ، 
میرا من بھی بنتا جائے دیکھ دیکھ کر۔۔۔ ناچ، 
"""

Let's assume we've magically transliterated it below.


In [5]:
poem="""naag raaj se naag raaj se milne jaa))uu;n aaj 
naag raaj saagar me;n bai;the sar par pahne taaj 
naag raaj kii sabhaa jamii hai ;xvush buu))e;n lahraa))e;n 
bahtii ruktii ulajhtii jaatii man ko mast banaa))e;n 
chandar-maa;n kii kirne;n aa))e;n bal khaa))e;n bal khaa))e;n 
nanhe nanhe halke halke mii;the giit sunaa))e;n 
gaate gaate thaktii jaa))e;n so))e;n sukh kii nii;nd
naag sabhaa me;n halkii halkii mii;thii mii;thii nii;nd 
kuchh gha;riyaa;n yuu;n biite;n aur phir sankh bajaa))e;n naag 
va;hshii aur be-baak anokhe nashshe laa))e;n naag 
so))ii kirne;n jaag u;the;n aur naache;n sundar naach
devaadaasii yaad aa jaa))e haa;n aur mandir naach
naag sabhaa ke naach anokhe saaraa saagar naach
meraa man bhii bantaa jaa))e dekh dekh kar naach"""
lines = poem.split('\n')
lines = [' '+l for l in lines]
#lines

Now, let's load the metrical parser and scan all the possibilities of the lines


In [6]:
s = Scanner()

line_scans=[]
for line in lines:
    y = s.scan(line,known_only=False)
    line_scans.append(y)

Each scan contains the results of the original parse into metrical building blocks (b=word break, c=consonant, v=long vowel, s=short vowel)


In [7]:
test_id=0
test = line_scans[test_id] # keys are ['results', 'tkns', 'orig_parse']
print lines[test_id],'==>',''.join(test['tkns'])


 naag raaj se naag raaj se milne jaa))uu;n aaj  ==> bcvcbcvcbcvbcvcbcvcbcvbcsccvbcvcvnbvcb

The results of a scan save the details of the matches as well as the scan itself (where a long syllable is '=' and short one is '-')


In [8]:
test_results = test['results'][0] # keys are ['matches', 'index', 'meter_string', 'scan']
print 'the first scan of',len(test['results']),'is',test_results['scan']


the first scan of 16 is =-=-==-=-======-

In [9]:
print 'Details of the first few matches are:\n'
test_results['matches'][0:3]


Details of the first few matches are:

Out[9]:
[{'meter_string': '=',
  'rule': {'production': 'l_bcv', 'tokens': ['b', 'c', 'v']},
  'rule_id': 33,
  'start': 0,
  'tokens': ['b', 'c', 'v']},
 {'meter_string': '-',
  'rule': {'production': 's_c', 'tokens': ['c']},
  'rule_id': 49,
  'start': 3,
  'tokens': ['c']},
 {'meter_string': '=',
  'rule': {'production': 'l_bcv', 'tokens': ['b', 'c', 'v']},
  'rule_id': 33,
  'start': 4,
  'tokens': ['b', 'c', 'v']}]

There are quite a number of possibilities without any constraints.

The possibilities for the first line are...


In [10]:
scan_results = [ result['scan'] for result in test['results']]
print "There are ",len(scan_results),"matches"
scan_results


There are  16 matches
Out[10]:
['=-=-==-=-======-',
 '=-=-==-=-====-=-',
 '=-=-==-=-==-===-',
 '=-=-==-=-==-=-=-',
 '=-=-==-=--=====-',
 '=-=-==-=--===-=-',
 '=-=-==-=--=-===-',
 '=-=-==-=--=-=-=-',
 '=-=--=-=-======-',
 '=-=--=-=-====-=-',
 '=-=--=-=-==-===-',
 '=-=--=-=-==-=-=-',
 '=-=--=-=--=====-',
 '=-=--=-=--===-=-',
 '=-=--=-=--=-===-',
 '=-=--=-=--=-=-=-']

The # of possibilities for all of the lines are...


In [11]:
print 'The # of possibilities for all of the lines are...'

all_scan_results= [ [result['scan'] for result in scan['results']] for scan in line_scans]
for i,ls in enumerate(all_scan_results):
    print '  line',i+1,'has',len(ls)#,lines[i]


The # of possibilities for all of the lines are...
  line 1 has 16
  line 2 has 8
  line 3 has 64
  line 4 has 64
  line 5 has 64
  line 6 has 64
  line 7 has 64
  line 8 has 64
  line 9 has 32
  line 10 has 128
  line 11 has 64
  line 12 has 48
  line 13 has 32
  line 14 has 16

Is there any meter in common across all the lines?

Let's check the one with the fewest results against the others


In [12]:
fewest, fewest_idx = min((len(val), idx) for (idx, val) in enumerate(all_scan_results)) 
print "Trying line",fewest_idx+1,"which has only",fewest,"possible scans"


Trying line 2 which has only 8 possible scans

In [13]:
fewest_results = all_scan_results[fewest_idx]

for i,ls in enumerate(all_scan_results):
    if i == fewest_idx: continue
    matched = set(ls) & set(fewest_results)
 #   print ls
    if matched: print 'It only matched with line',i+1, 'which contains the',set(ls) & set(fewest_results)


It only matched with line 10 which contains the set(['=-=-==-=======-', '=-=-==-=-===-=-', '=-=-==-=====-=-', '=-=-==-=-=====-'])

So what's the meter?


In [14]:
save_scan = [] # list for each line, holding list of scan ids
matching_scan_line = [] # best choice (used later)

for i,sr in enumerate(all_scan_results):
    fewest_short_clusters = 99
    save_scan.append([])
    matching_scan_line.append('None')
    print '\n***** Line',(i+1),lines[i]
    for si,l in enumerate(sr):
        count = 0
        counts = []
     
        for c in l:
            if c =='=': 
                count+=2
            else: 
                count+=1
            counts.append(count)
    
        if count==27 and 16 in counts and not '---' in l and l[-1]=='-': 
            tkns = line_scans[i]['tkns']
            matches = line_scans[i]['results'][si]['matches']
            part_two = counts.index(16)
            x = [''.join(m['tokens']) for m in matches]
            part_one_tkns = ''.join(x[0:part_two+1])
            part_two_tkns = ''.join(x[part_two+1:])
            
            # check that word break starts second part of line
            if not part_two_tkns[0]=='b':
                continue
            num_short_clusters = len(re.findall('[-]+',l))
            if num_short_clusters < fewest_short_clusters:
                fewest_short_clusters = num_short_clusters
                matching_scan_line[-1]=l
            print l,'scan #', si, num_short_clusters
            print part_one_tkns,'/',part_two_tkns
            
            save_scan[-1].append(si)


***** Line 1  naag raaj se naag raaj se milne jaa))uu;n aaj 
=-=-==-=-======- scan # 0 5
bcvcbcvcbcvbcvcbcvcbcv / bcsccvbcvcvnbvc

***** Line 2  naag raaj saagar me;n bai;the sar par pahne taaj 
=-=-==========- scan # 0 3
bcvcbcvcbcvcscbcvnbcvcv / bcscbcscbcsccvbcvc

***** Line 3  naag raaj kii sabhaa jamii hai ;xvush buu))e;n lahraa))e;n 
=-=-=-=-=======- scan # 0 5
bcvcbcvcbcvbcscvbcscvbcv / bc<;xv>scbcvcvnbcsccvcvn

***** Line 4  bahtii ruktii ulajhtii jaatii man ko mast banaa))e;n 
====-===-===--=- scan # 4 4
bcsccvbcsccvbscsccvbcvcv / bcscbcvbcsccbcscvcvn
====-=-=====--=- scan # 8 4
bcsccvbcsccvbscsccvbcvcv / bcscbcvbcsccbcscvcvn
===--=======--=- scan # 16 3
bcsccvbcsccvbscsccvbcvcv / bcscbcvbcsccbcscvcvn
=-==-=======--=- scan # 32 4
bcsccvbcsccvbscsccvbcvcv / bcscbcvbcsccbcscvcvn

***** Line 5  chandar-maa;n kii kirne;n aa))e;n bal khaa))e;n bal khaa))e;n 
=============- scan # 0 1
bcsccscbcvnbcvbcsccvnbvcvn / bcscbcvcvnbcscbcvcvn

***** Line 6  nanhe nanhe halke halke mii;the giit sunaa))e;n 
===========--=- scan # 0 2
bcsccvbcsccvbcsccvbcsccv / bcvcvbcvcbcscvcvn

***** Line 7  gaate gaate thaktii jaa))e;n so))e;n sukh kii nii;nd
=============- scan # 0 1
bcvcvbcvcvbcsccvbcvcvn / bcvcvnbcscbcvbcvnc

***** Line 8  naag sabhaa me;n halkii halkii mii;thii mii;thii nii;nd 
=--===========- scan # 0 2
bcvcbcscvbcvnbcsccvbcsccv / bcvcvbcvcvbcvnc

***** Line 9  kuchh gha;riyaa;n yuu;n biite;n aur phir sankh bajaa))e;n naag 
=--=======--===- scan # 2 3
bcscbcscscvnbcvnbcvcvnbv<aur>cbcsc / bcsccbcscvcvnbcvc
=--===-=-==--===- scan # 4 5
bcscbcscscvnbcvnbcvcvnbv<aur>cbcsc / bcsccbcscvcvnbcvc
=--=-===-==--===- scan # 8 5
bcscbcscscvnbcvnbcvcvnbv<aur>cbcsc / bcsccbcscvcvnbcvc

***** Line 10  va;hshii aur be-baak anokhe nashshe laa))e;n naag 
===-==--=-=====- scan # 4 4
bcsccvbv<aur>cbcvbcvcbscvcv / bcsccvbcvcvnbcvc
===-==-=======- scan # 8 3
bcsccvbv<aur>cbcvbcvcbscvcv / bcsccvbcvcvnbcvc
===--=--=======- scan # 16 3
bcsccvbv<aur>cbcvbcvcbscvcv / bcsccvbcvcvnbcvc
=====--=======- scan # 32 2
bcsccvbv<aur>cbcvbcvcbscvcv / bcsccvbcvcvnbcvc
=-=-==--=======- scan # 64 4
bcsccvbv<aur>cbcvbcvcbscvcv / bcsccvbcvcvnbcvc

***** Line 11  so))ii kirne;n jaag u;the;n aur naache;n sundar naach
=====--=======- scan # 2 2
bcvcvbcsccvnbcvcbscvnbv<aur>c / bcvcvnbcsccscbcvc
=====-==-=====- scan # 8 3
bcvcvbcsccvnbcvcbscvnbv<aur>c / bcvcvnbcsccscbcvc
===-=--==-=====- scan # 16 4
bcvcvbcsccvnbcvcbscvnbv<aur>c / bcvcvnbcsccscbcvc
=-===--==-=====- scan # 32 4
bcvcvbcsccvnbcvcbscvnbv<aur>c / bcvcvnbcsccscbcvc

***** Line 12  devaadaasii yaad aa jaa))e haa;n aur mandir naach
=============- scan # 1 1
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc
========-=-===- scan # 2 3
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc
=====-==-=====- scan # 13 3
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc
=====-==--=-===- scan # 14 4
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc
=====--=======- scan # 17 2
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc
=====--==-=-===- scan # 18 4
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc
===-=-========- scan # 33 3
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc
===-=-===-=-===- scan # 34 5
bcvcvcvcvbcvcbvbcvcv / bcvnbv<aur>cbcsccscbcvc

***** Line 13  naag sabhaa ke naach anokhe saaraa saagar naach
=--===--=======- scan # 0 3
bcvcbcscvbcvbcvcbscvcv / bcvcvbcvcscbcvc

***** Line 14  meraa man bhii bantaa jaa))e dekh dekh kar naach
=========-=-==- scan # 0 3
bcvcvbcscbcvbcsccvbcvcv / bcvcbcvcbcscbcvc

Answer: An Urdu metrical adaption of a Braj pada in sarasī 16+11 mātra meter, where = (long) is 2, and - (short) is 1

Limiting the results to verses with 27 total counts, and a word break starting the 16th we get...


In [16]:
save_scan


Out[16]:
[[0],
 [0],
 [0],
 [4, 8, 16, 32],
 [0],
 [0],
 [0],
 [0],
 [2, 4, 8],
 [4, 8, 16, 32, 64],
 [2, 8, 16, 32],
 [1, 2, 13, 14, 17, 18, 33, 34],
 [0],
 [0]]

How do we choose the best ones? Lets try to find ones with smallest number of clusters of - or -- syllables.


In [17]:
for i,sis in enumerate(save_scan):
    print "Line ",(i+1),": ",lines[i],matching_scan_line[i]
for i,sis in enumerate(save_scan):
    print matching_scan_line[i]


Line  1 :   naag raaj se naag raaj se milne jaa))uu;n aaj  =-=-==-=-======-
Line  2 :   naag raaj saagar me;n bai;the sar par pahne taaj  =-=-==========-
Line  3 :   naag raaj kii sabhaa jamii hai ;xvush buu))e;n lahraa))e;n  =-=-=-=-=======-
Line  4 :   bahtii ruktii ulajhtii jaatii man ko mast banaa))e;n  ===--=======--=-
Line  5 :   chandar-maa;n kii kirne;n aa))e;n bal khaa))e;n bal khaa))e;n  =============-
Line  6 :   nanhe nanhe halke halke mii;the giit sunaa))e;n  ===========--=-
Line  7 :   gaate gaate thaktii jaa))e;n so))e;n sukh kii nii;nd =============-
Line  8 :   naag sabhaa me;n halkii halkii mii;thii mii;thii nii;nd  =--===========-
Line  9 :   kuchh gha;riyaa;n yuu;n biite;n aur phir sankh bajaa))e;n naag  =--=======--===-
Line  10 :   va;hshii aur be-baak anokhe nashshe laa))e;n naag  =====--=======-
Line  11 :   so))ii kirne;n jaag u;the;n aur naache;n sundar naach =====--=======-
Line  12 :   devaadaasii yaad aa jaa))e haa;n aur mandir naach =============-
Line  13 :   naag sabhaa ke naach anokhe saaraa saagar naach =--===--=======-
Line  14 :   meraa man bhii bantaa jaa))e dekh dekh kar naach =========-=-==-
=-=-==-=-======-
=-=-==========-
=-=-=-=-=======-
===--=======--=-
=============-
===========--=-
=============-
=--===========-
=--=======--===-
=====--=======-
=====--=======-
=============-
=--===--=======-
=========-=-==-

And now we have a model of what the poem sounds like. What now?

1. The possibility of correlating this model with actual performance.

2. Discerning patterns in sound, alongside the sematic level, including across languages and in relation to music.

3. Mapping on a macro level the evolution of poetic forms, including free verse, across languages.

4. Cultural heritage perservation, making annotated texts and performances available and open to new discoveries, both scholarly and creative.


In [ ]: