Data loading, storage, and file formats

Python has become a beloved language for text and file munging due to its simple syntax for interacting with files, intuitive data structures, and convenient features like tuple packing and unpacking. pandas features a number of functions for reading tabular data as a DataFrame object. Table 6-1 has a summary of all of them, though read_csv and read_table are likely the ones you’ll use the most.



In [30]:

    
from __future__ import division
from numpy.random import randn
import numpy as np
import os
import sys
import matplotlib.pyplot as plt
np.random.seed(12345)
plt.rc('figure', figsize=(10, 6))
from pandas import Series, DataFrame
import pandas as pd
np.set_printoptions(precision=4)



In [2]:

    
%pwd









    Out[2]:





u'/Users/pmui/datascience/lecture03.more.wrangling'

Reading and Writing Data in Text Format

Type inference is one of the more important features of these functions; that means you don’t have to specify which columns are numeric, integer, boolean, or string. Handling dates and other custom types requires a bit more effort, though. Let’s start with a small comma-separated (CSV) text file:



In [3]:

    
!cat ch06/ex1.csv









    



a,b,c,d,message
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo



In [31]:

    
df = pd.read_csv('ch06/ex1.csv')
df



In [34]:

    
!cat ch06/test.csv









    



message,a,b,c,d
hello,1,2,3,4
world,5,6,7,8



In [33]:

    
dfx = pd.read_csv('ch06/test.csv')
dfx



In [5]:

    
pd.read_table('ch06/ex1.csv', sep=',')

A file will not always have a header row. Consider this file:



In [6]:

    
!cat ch06/ex2.csv









    



1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

To read this in, you have a couple of options. You can allow pandas to assign default column names, or you can specify names yourself:



In [36]:

    
pd.read_csv('ch06/ex2.csv', names=['a', 'b', 'c', 'd', 'message'])



In [8]:

    
pd.read_csv('ch06/ex2.csv', header=None)

Suppose you wanted the message column to be the index of the returned DataFrame. You can either indicate you want the column at index 4 or named 'message' using the index_col argument:



In [37]:

    
names = ['a', 'b', 'c', 'd', 'message']
pd.read_csv('ch06/ex2.csv', names=names, index_col='message')

In the event that you want to form a hierarchical index from multiple columns, just pass a list of column numbers or names:



In [10]:

    
!cat ch06/csv_mindex.csv
parsed = pd.read_csv('ch06/csv_mindex.csv', index_col=['key1', 'key2'])
parsed









    



key1,key2,value1,value2
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16






    Out[10]:






  
    
      
      
      value1
      value2
    
    
      key1
      key2
      
      
    
  
  
    
      one
      a
      1
      2
    
    
      b
      3
      4
    
    
      c
      5
      6
    
    
      d
      7
      8
    
    
      two
      a
      9
      10
    
    
      b
      11
      12
    
    
      c
      13
      14
    
    
      d
      15
      16



In [11]:

    
list(open('ch06/ex3.txt'))









    Out[11]:





['            A         B         C\n',
 'aaa -0.264438 -1.026059 -0.619500\n',
 'bbb  0.927272  0.302904 -0.032399\n',
 'ccc -0.264273 -0.386314 -0.217601\n',
 'ddd -0.871858 -0.348382  1.100491\n']

In some cases, a table might not have a fixed delimiter, using whitespace or some other pattern to separate fields. In these cases, you can pass a regular expression as a delimiter for read_table. Consider a text file that looks like this:

While you could do some munging by hand, in this case fields are separated by a variable amount of whitespace. This can be expressed by the regular expression \s+, so we have then:



In [12]:

    
result = pd.read_table('ch06/ex3.txt', sep='\s+')
result

Because there was one fewer column name than the number of data rows, read_table infers that the first column should be the DataFrame’s index in this special case.

The parser functions have many additional arguments to help you handle the wide variety of exception file formats that occur. For example, you can skip the first, third, and fourth rows of a file with skiprows:



In [13]:

    
!cat ch06/ex4.csv
pd.read_csv('ch06/ex4.csv', skiprows=[0, 2, 3])









    



# hey!
a,b,c,d,message
# just wanted to make things more difficult for you
# who reads CSV files with computers, anyway?
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo





    Out[13]:






  
    
      
      a
      b
      c
      d
      message
    
  
  
    
      0
      1
      2
      3
      4
      hello
    
    
      1
      5
      6
      7
      8
      world
    
    
      2
      9
      10
      11
      12
      foo

Handling missing values is an important and frequently nuanced part of the file parsing process. Missing data is usually either not present (empty string) or marked by some sentinel value. By default, pandas uses a set of commonly occurring sentinels, such as NA, -1.#IND, and NULL:



In [14]:

    
!cat ch06/ex5.csv
result = pd.read_csv('ch06/ex5.csv')
result
pd.isnull(result)









    



something,a,b,c,d,message
one,1,2,3,4,NA
two,5,6,,8,world
three,9,10,11,12,foo





    Out[14]:






  
    
      
      something
      a
      b
      c
      d
      message
    
  
  
    
      0
      False
      False
      False
      False
      False
      True
    
    
      1
      False
      False
      False
      True
      False
      False
    
    
      2
      False
      False
      False
      False
      False
      False

The na_values option can take either a list or set of strings to consider missing values:



In [15]:

    
result = pd.read_csv('ch06/ex5.csv', na_values=['NULL'])
result

Different NA sentinels can be specified for each column in a dict:



In [16]:

    
sentinels = {'message': ['foo', 'NA'], 'something': ['two']}
pd.read_csv('ch06/ex5.csv', na_values=sentinels)

Reading text files in pieces

When processing very large files or figuring out the right set of arguments to correctly process a large file, you may only want to read in a small piece of a file or iterate through smaller chunks of the file.



In [38]:

    
result = pd.read_csv('ch06/ex6.csv')
result









    Out[38]:






  
    
      
      one
      two
      three
      four
      key
    
  
  
    
      0
      0.467976
      -0.038649
      -0.295344
      -1.824726
      L
    
    
      1
      -0.358893
      1.404453
      0.704965
      -0.200638
      B
    
    
      2
      -0.501840
      0.659254
      -0.421691
      -0.057688
      G
    
    
      3
      0.204886
      1.074134
      1.388361
      -0.982404
      R
    
    
      4
      0.354628
      -0.133116
      0.283763
      -0.837063
      Q
    
    
      5
      1.817480
      0.742273
      0.419395
      -2.251035
      Q
    
    
      6
      -0.776764
      0.935518
      -0.332872
      -1.875641
      U
    
    
      7
      -0.913135
      1.530624
      -0.572657
      0.477252
      K
    
    
      8
      0.358480
      -0.497572
      -0.367016
      0.507702
      S
    
    
      9
      -1.740877
      -1.160417
      -1.637830
      2.172201
      G
    
    
      10
      0.240564
      -0.328249
      1.252155
      1.072796
      8
    
    
      11
      0.764018
      1.165476
      -0.639544
      1.495258
      R
    
    
      12
      0.571035
      -0.310537
      0.582437
      -0.298765
      1
    
    
      13
      2.317658
      0.430710
      -1.334216
      0.199679
      P
    
    
      14
      1.547771
      -1.119753
      -2.277634
      0.329586
      J
    
    
      15
      -1.310608
      0.401719
      -1.000987
      1.156708
      E
    
    
      16
      -0.088496
      0.634712
      0.153324
      0.415335
      B
    
    
      17
      -0.018663
      -0.247487
      -1.446522
      0.750938
      A
    
    
      18
      -0.070127
      -1.579097
      0.120892
      0.671432
      F
    
    
      19
      -0.194678
      -0.492039
      2.359605
      0.319810
      H
    
    
      20
      -0.248618
      0.868707
      -0.492226
      -0.717959
      W
    
    
      21
      -1.091549
      -0.867110
      -0.647760
      -0.832562
      C
    
    
      22
      0.641404
      -0.138822
      -0.621963
      -0.284839
      C
    
    
      23
      1.216408
      0.992687
      0.165162
      -0.069619
      V
    
    
      24
      -0.564474
      0.792832
      0.747053
      0.571675
      I
    
    
      25
      1.759879
      -0.515666
      -0.230481
      1.362317
      S
    
    
      26
      0.126266
      0.309281
      0.382820
      -0.239199
      L
    
    
      27
      1.334360
      -0.100152
      -0.840731
      -0.643967
      6
    
    
      28
      -0.737620
      0.278087
      -0.053235
      -0.950972
      J
    
    
      29
      -1.148486
      -0.986292
      -0.144963
      0.124362
      Y
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      9970
      0.633495
      -0.186524
      0.927627
      0.143164
      4
    
    
      9971
      0.308636
      -0.112857
      0.762842
      -1.072977
      1
    
    
      9972
      -1.627051
      -0.978151
      0.154745
      -1.229037
      Z
    
    
      9973
      0.314847
      0.097989
      0.199608
      0.955193
      P
    
    
      9974
      1.666907
      0.992005
      0.496128
      -0.686391
      S
    
    
      9975
      0.010603
      0.708540
      -1.258711
      0.226541
      K
    
    
      9976
      0.118693
      -0.714455
      -0.501342
      -0.254764
      K
    
    
      9977
      0.302616
      -2.011527
      -0.628085
      0.768827
      H
    
    
      9978
      -0.098572
      1.769086
      -0.215027
      -0.053076
      A
    
    
      9979
      -0.019058
      1.964994
      0.738538
      -0.883776
      F
    
    
      9980
      -0.595349
      0.001781
      -1.423355
      -1.458477
      M
    
    
      9981
      1.392170
      -1.396560
      -1.425306
      -0.847535
      H
    
    
      9982
      -0.896029
      -0.152287
      1.924483
      0.365184
      6
    
    
      9983
      -2.274642
      -0.901874
      1.500352
      0.996541
      N
    
    
      9984
      -0.301898
      1.019906
      1.102160
      2.624526
      I
    
    
      9985
      -2.548389
      -0.585374
      1.496201
      -0.718815
      D
    
    
      9986
      -0.064588
      0.759292
      -1.568415
      -0.420933
      E
    
    
      9987
      -0.143365
      -1.111760
      -1.815581
      0.435274
      2
    
    
      9988
      -0.070412
      -1.055921
      0.338017
      -0.440763
      X
    
    
      9989
      0.649148
      0.994273
      -1.384227
      0.485120
      Q
    
    
      9990
      -0.370769
      0.404356
      -1.051628
      -1.050899
      8
    
    
      9991
      -0.409980
      0.155627
      -0.818990
      1.277350
      W
    
    
      9992
      0.301214
      -1.111203
      0.668258
      0.671922
      A
    
    
      9993
      1.821117
      0.416445
      0.173874
      0.505118
      X
    
    
      9994
      0.068804
      1.322759
      0.802346
      0.223618
      H
    
    
      9995
      2.311896
      -0.417070
      -1.409599
      -0.515821
      L
    
    
      9996
      -0.479893
      -0.650419
      0.745152
      -0.646038
      E
    
    
      9997
      0.523331
      0.787112
      0.486066
      1.093156
      K
    
    
      9998
      -0.362559
      0.598894
      -1.843201
      0.887292
      G
    
    
      9999
      -0.096376
      -1.012999
      -0.657431
      -0.573315
      0
    
  

10000 rows × 5 columns

If you want to only read out a small number of rows (avoiding reading the entire file), specify that with nrows:



In [18]:

    
pd.read_csv('ch06/ex6.csv', nrows=5)

To read out a file in pieces, specify a chunksize as a number of rows:



In [40]:

    
chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)
chunker









    Out[40]:





<pandas.io.parsers.TextFileReader at 0x1023d34d0>

The TextParser object returned by read_csv allows you to iterate over the parts of the file according to the chunksize. For example, we can iterate over ex6.csv, aggregating the value counts in the 'key' column like so:



In [42]:

    
chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)

tot = Series([])
for piece in chunker:
    tot = tot.add(piece['key'].value_counts(), fill_value=0)

tot = tot.sort_values(ascending=False)



In [43]:

    
tot[:10]









    Out[43]:





E    368.0
X    364.0
L    346.0
O    343.0
Q    340.0
M    338.0
J    337.0
F    335.0
K    334.0
H    330.0
dtype: float64

Writing data out to text format

Data can also be exported to delimited format. Let’s consider one of the CSV files read above:



In [22]:

    
data = pd.read_csv('ch06/ex5.csv')
data

Using DataFrame’s to_csv method, we can write the data out to a comma-separated file:



In [44]:

    
data.to_csv('ch06/out.csv')
!cat ch06/out.csv









    



,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo

Other delimiters can be used, of course (writing to sys.stdout so it just prints the text result; make sure to import sys):



In [24]:

    
data.to_csv(sys.stdout, sep='|')









    



|something|a|b|c|d|message
0|one|1|2|3.0|4|
1|two|5|6||8|world
2|three|9|10|11.0|12|foo

Missing values appear as empty strings in the output. You might want to denote them by some other sentinel value:



In [46]:

    
data.to_csv(sys.stdout, na_rep='HELLO')









    



,something,a,b,c,d,message
0,one,1,2,3.0,4,HELLO
1,two,5,6,HELLO,8,world
2,three,9,10,11.0,12,foo

With no other options specified, both the row and column labels are written. Both of these can be disabled:



In [26]:

    
data.to_csv(sys.stdout, index=False, header=False)









    



one,1,2,3.0,4,
two,5,6,,8,world
three,9,10,11.0,12,foo

You can also write only a subset of the columns, and in an order of your choosing:



In [27]:

    
data.to_csv(sys.stdout, index=False, columns=['a', 'b', 'c'])









    



a,b,c
1,2,3.0
5,6,
9,10,11.0

Series also has a to_csv method:



In [28]:

    
dates = pd.date_range('1/1/2000', periods=7)
ts = Series(np.arange(7), index=dates)
ts.to_csv('ch06/tseries.csv')
!cat ch06/tseries.csv









    



2000-01-01,0
2000-01-02,1
2000-01-03,2
2000-01-04,3
2000-01-05,4
2000-01-06,5
2000-01-07,6

With a bit of wrangling (no header, first column as index), you can read a CSV version of a Series with read_csv, but there is also a from_csv convenience method that makes it a bit simpler:



In [29]:

    
Series.from_csv('ch06/tseries.csv', parse_dates=True)









    Out[29]:





2000-01-01    0
2000-01-02    1
2000-01-03    2
2000-01-04    3
2000-01-05    4
2000-01-06    5
2000-01-07    6
dtype: int64

Manually working with delimited formats

Most forms of tabular data can be loaded from disk using functions like pan das.read_table. In some cases, however, some manual processing may be necessary. It’s not uncommon to receive a file with one or more malformed lines that trip up read_table. To illustrate the basic tools, consider a small CSV file:



In [ ]:

    
!cat ch06/ex7.csv



In [ ]:

    
import csv
f = open('ch06/ex7.csv')

reader = csv.reader(f)



In [ ]:

    
for line in reader:
    print(line)



In [ ]:

    
lines = list(csv.reader(open('ch06/ex7.csv')))
header, values = lines[0], lines[1:]
data_dict = {h: v for h, v in zip(header, zip(*values))}
data_dict



In [ ]:

    
class my_dialect(csv.Dialect):
    lineterminator = '\n'
    delimiter = ';'
    quotechar = '"'
    quoting = csv.QUOTE_MINIMAL



In [ ]:

    
with open('mydata.csv', 'w') as f:
    writer = csv.writer(f, dialect=my_dialect)
    writer.writerow(('one', 'two', 'three'))
    writer.writerow(('1', '2', '3'))
    writer.writerow(('4', '5', '6'))
    writer.writerow(('7', '8', '9'))



In [ ]:

    
%cat mydata.csv

JSON data



In [ ]:

    
obj = """
{"name": "Wes",
 "places_lived": ["United States", "Spain", "Germany"],
 "pet": null,
 "siblings": [{"name": "Scott", "age": 25, "pet": "Zuko"},
              {"name": "Katie", "age": 33, "pet": "Cisco"}]
}
"""



In [ ]:

    
import json
result = json.loads(obj)
result



In [ ]:

    
asjson = json.dumps(result)



In [ ]:

    
siblings = DataFrame(result['siblings'], columns=['name', 'age'])
siblings

XML and HTML, Web scraping

NB. The Yahoo! Finance API has changed and this example no longer works



In [ ]:

    
from lxml.html import parse
from urllib2 import urlopen

parsed = parse(urlopen('http://finance.yahoo.com/q/op?s=AAPL+Options'))

doc = parsed.getroot()



In [ ]:

    
links = doc.findall('.//a')
links[15:20]



In [ ]:

    
lnk = links[28]
lnk
lnk.get('href')
lnk.text_content()



In [ ]:

    
urls = [lnk.get('href') for lnk in doc.findall('.//a')]
urls[-10:]



In [ ]:

    
tables = doc.findall('.//table')
calls = tables[9]
puts = tables[13]



In [ ]:

    
rows = calls.findall('.//tr')



In [ ]:

    
def _unpack(row, kind='td'):
    elts = row.findall('.//%s' % kind)
    return [val.text_content() for val in elts]



In [ ]:

    
_unpack(rows[0], kind='th')
_unpack(rows[1], kind='td')



In [ ]:

    
from pandas.io.parsers import TextParser

def parse_options_data(table):
    rows = table.findall('.//tr')
    header = _unpack(rows[0], kind='th')
    data = [_unpack(r) for r in rows[1:]]
    return TextParser(data, names=header).get_chunk()



In [ ]:

    
call_data = parse_options_data(calls)
put_data = parse_options_data(puts)
call_data[:10]

Parsing XML with lxml.objectify



In [ ]:

    
%cd ch06/mta_perf/Performance_XML_Data



In [ ]:

    
!head -21 Performance_MNR.xml



In [ ]:

    
from lxml import objectify

path = 'Performance_MNR.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()



In [ ]:

    
data = []

skip_fields = ['PARENT_SEQ', 'INDICATOR_SEQ',
               'DESIRED_CHANGE', 'DECIMAL_PLACES']

for elt in root.INDICATOR:
    el_data = {}
    for child in elt.getchildren():
        if child.tag in skip_fields:
            continue
        el_data[child.tag] = child.pyval
    data.append(el_data)



In [ ]:

    
perf = DataFrame(data)
perf



In [ ]:

    
root



In [ ]:

    
root.get('href')



In [ ]:

    
root.text

Binary data formats



In [ ]:

    
cd ../..



In [ ]:

    
frame = pd.read_csv('ch06/ex1.csv')
frame
frame.to_pickle('ch06/frame_pickle')



In [ ]:

    
pd.read_pickle('ch06/frame_pickle')

Using HDF5 format



In [ ]:

    
store = pd.HDFStore('mydata.h5')
store['obj1'] = frame
store['obj1_col'] = frame['a']
store



In [ ]:

    
store['obj1']



In [ ]:

    
store.close()
os.remove('mydata.h5')

Interacting with HTML and Web APIs



In [ ]:

    
import requests
url = 'https://api.github.com/repos/pydata/pandas/milestones/28/labels'
resp = requests.get(url)
resp



In [ ]:

    
data[:5]



In [ ]:

    
issue_labels = DataFrame(data)
issue_labels

Interacting with databases



In [1]:

    
import sqlite3

query = """
CREATE TABLE test
(a VARCHAR(20), b VARCHAR(20),
 c REAL,        d INTEGER
);"""

con = sqlite3.connect(':memory:')
con.execute(query)
con.commit()



In [ ]:

    
data = [('Atlanta', 'Georgia', 1.25, 6),
        ('Tallahassee', 'Florida', 2.6, 3),
        ('Sacramento', 'California', 1.7, 5)]
stmt = "INSERT INTO test VALUES(?, ?, ?, ?)"

con.executemany(stmt, data)
con.commit()



In [ ]:

    
cursor = con.execute('select * from test')
rows = cursor.fetchall()
rows



In [ ]:

    
cursor.description



In [ ]:

    
DataFrame(rows, columns=zip(*cursor.description)[0])



In [ ]:

    
import pandas.io.sql as sql
sql.read_sql('select * from test', con)

	A	B	C
aaa	-0.264438	-1.026059	-0.619500
bbb	0.927272	0.302904	-0.032399
ccc	-0.264273	-0.386314	-0.217601
ddd	-0.871858	-0.348382	1.100491

	something	a	b	c	d	message
0	False	False	False	False	False	True
1	False	False	False	True	False	False
2	False	False	False	False	False	False

	one	two	three	four	key
0	0.467976	-0.038649	-0.295344	-1.824726	L
1	-0.358893	1.404453	0.704965	-0.200638	B
2	-0.501840	0.659254	-0.421691	-0.057688	G
3	0.204886	1.074134	1.388361	-0.982404	R
4	0.354628	-0.133116	0.283763	-0.837063	Q
5	1.817480	0.742273	0.419395	-2.251035	Q
6	-0.776764	0.935518	-0.332872	-1.875641	U
7	-0.913135	1.530624	-0.572657	0.477252	K
8	0.358480	-0.497572	-0.367016	0.507702	S
9	-1.740877	-1.160417	-1.637830	2.172201	G
10	0.240564	-0.328249	1.252155	1.072796	8
11	0.764018	1.165476	-0.639544	1.495258	R
12	0.571035	-0.310537	0.582437	-0.298765	1
13	2.317658	0.430710	-1.334216	0.199679	P
14	1.547771	-1.119753	-2.277634	0.329586	J
15	-1.310608	0.401719	-1.000987	1.156708	E
16	-0.088496	0.634712	0.153324	0.415335	B
17	-0.018663	-0.247487	-1.446522	0.750938	A
18	-0.070127	-1.579097	0.120892	0.671432	F
19	-0.194678	-0.492039	2.359605	0.319810	H
20	-0.248618	0.868707	-0.492226	-0.717959	W
21	-1.091549	-0.867110	-0.647760	-0.832562	C
22	0.641404	-0.138822	-0.621963	-0.284839	C
23	1.216408	0.992687	0.165162	-0.069619	V
24	-0.564474	0.792832	0.747053	0.571675	I
25	1.759879	-0.515666	-0.230481	1.362317	S
26	0.126266	0.309281	0.382820	-0.239199	L
27	1.334360	-0.100152	-0.840731	-0.643967	6
28	-0.737620	0.278087	-0.053235	-0.950972	J
29	-1.148486	-0.986292	-0.144963	0.124362	Y
...	...	...	...	...	...
9970	0.633495	-0.186524	0.927627	0.143164	4
9971	0.308636	-0.112857	0.762842	-1.072977	1
9972	-1.627051	-0.978151	0.154745	-1.229037	Z
9973	0.314847	0.097989	0.199608	0.955193	P
9974	1.666907	0.992005	0.496128	-0.686391	S
9975	0.010603	0.708540	-1.258711	0.226541	K
9976	0.118693	-0.714455	-0.501342	-0.254764	K
9977	0.302616	-2.011527	-0.628085	0.768827	H
9978	-0.098572	1.769086	-0.215027	-0.053076	A
9979	-0.019058	1.964994	0.738538	-0.883776	F
9980	-0.595349	0.001781	-1.423355	-1.458477	M
9981	1.392170	-1.396560	-1.425306	-0.847535	H
9982	-0.896029	-0.152287	1.924483	0.365184	6
9983	-2.274642	-0.901874	1.500352	0.996541	N
9984	-0.301898	1.019906	1.102160	2.624526	I
9985	-2.548389	-0.585374	1.496201	-0.718815	D
9986	-0.064588	0.759292	-1.568415	-0.420933	E
9987	-0.143365	-1.111760	-1.815581	0.435274	2
9988	-0.070412	-1.055921	0.338017	-0.440763	X
9989	0.649148	0.994273	-1.384227	0.485120	Q
9990	-0.370769	0.404356	-1.051628	-1.050899	8
9991	-0.409980	0.155627	-0.818990	1.277350	W
9992	0.301214	-1.111203	0.668258	0.671922	A
9993	1.821117	0.416445	0.173874	0.505118	X
9994	0.068804	1.322759	0.802346	0.223618	H
9995	2.311896	-0.417070	-1.409599	-0.515821	L
9996	-0.479893	-0.650419	0.745152	-0.646038	E
9997	0.523331	0.787112	0.486066	1.093156	K
9998	-0.362559	0.598894	-1.843201	0.887292	G
9999	-0.096376	-1.012999	-0.657431	-0.573315	0