JPK archive

JPK files are zipped archives of data.

  • There is a header file at the top-level
  • Header files are normal text files, nothing special needed to read them
  • There is a segments folder
    • the segments folder contains numbered folders, one per segment
    • each folder in segments contains another header file
    • each folder in segments contains a folder named channels
      • each channels folder contains several data files
      • data files contain data in C short format, at least in my example file
      • there seems to be no header in the .dat files, only pure (integer, i.e. short) data

Reading data from JPK archives

1. Open the zipped archive using zipfile


In [1]:
from zipfile import ZipFile
fname = "../examples/force-save-2016.06.15-13.17.08.jpk-force"

In [2]:
z = ZipFile(fname)

It you can get the list of files stored in the zip archive, and you can open files using the instance's open function


In [3]:
list_of_files = z.filelist
for f in list_of_files:
    print f.filename


header.properties
segments/
segments/0/
segments/0/segment-header.properties
segments/0/channels/height.dat
segments/0/channels/vDeflection.dat
segments/0/channels/strainGaugeHeight.dat
segments/0/channels/hDeflection.dat
segments/0/channels/error.dat
segments/1/
segments/1/segment-header.properties
segments/1/channels/height.dat
segments/1/channels/vDeflection.dat
segments/1/channels/strainGaugeHeight.dat
segments/1/channels/hDeflection.dat
segments/1/channels/error.dat
segments/2/
segments/2/segment-header.properties
segments/2/channels/height.dat
segments/2/channels/vDeflection.dat
segments/2/channels/strainGaugeHeight.dat
segments/2/channels/hDeflection.dat
segments/2/channels/error.dat
segments/3/
segments/3/segment-header.properties
segments/3/channels/height.dat
segments/3/channels/vDeflection.dat
segments/3/channels/strainGaugeHeight.dat
segments/3/channels/hDeflection.dat
segments/3/channels/error.dat

In [4]:
print list_of_files[0].filename
f = z.open(list_of_files[0].filename)
lines = f.readlines()
print lines[0]
print lines[1]
print lines[2]


header.properties
#Wed Jun 15 13:17:28 CEST 2016

jpk-data-file=spm-forcefile

file-format-version=0.12

2. Parse header files to dictionaries

As printed above, the first line of the top-level header.properties file contains date and time, preceded by a '#'.
The following lines contain properties of the form "key=value".

To extract the time, one can use dateutil.parser


In [6]:
from dateutil import parser

In [7]:
t = parser.parse(lines[0][1:])
print t


2016-06-15 13:17:28+02:00

The remainder of the lines should contain properties following the syntax mentioned above. They can easily be parsed to a dictionary.


In [8]:
_properties = {}
for line in lines[1:]:
    key, value = line.split("=")
    value.strip()
    _properties[key] = value

In [9]:
for p in _properties:
    print p," = ",_properties[p]


force-scan-series.header.force-settings.retract-scan-time  =  10.0

force-scan-series.header.force-settings.pause-before-first.pause-option.type  =  constant-height

force-scan-series.description.comment  =  no comment entry

force-scan-series.header.force-settings.force-baseline-adjust-settings.beginOfLine  =  true

force-scan-series.header.force-settings.retract-k-length  =  60000

force-scan-series.header.force-settings.control-settings-type  =  segment-control-settings

force-scan-series.header.force-settings.extended-pause-k-length  =  30000

force-scan-series.header.force-settings.z-end-pause-option.type  =  constant-height

force-scan-series.description.source-software  =  4.0.33

force-scan-series.header.force-settings.data-description.comment  =  

force-scan-series.header.force-settings.pause-before-first.height-limit  =  NaN

force-scan-series.description.name  =  no name entry

force-scan-series.header.force-settings.force-baseline-adjust-settings.deadtimeBeforeSamples  =  100

force-scan-series.header.force-settings.retracted-pause-time  =  5.0

force-scan-series.description.probe  =  no probe entry

force-scan-series.header.force-settings.pause-before-first.identifier.type  =  standard

force-scan-series.header.force-settings.relative-z-end  =  0.0

force-scan-series.header.type  =  simple-force-scan-series-header

force-scan-series.force-segments.count  =  5

force-scan-series.header.force-settings.data-description.source-software  =  

force-scan-series.header.force-settings.force-baseline-adjust-settings.enabled  =  true

file-format-version  =  0.12

force-scan-series.description.user-name  =  jpkuser

force-scan-series.description.modification-software  =  

force-scan-series.header.force-settings.extend-k-length  =  60000

force-scan-series.header.force-settings.pause-before-first.identifier.name  =  pause-cellhesion200

force-scan-series.header.force-settings.type  =  relative-force-settings

force-scan-series.header.force-settings.force-baseline-adjust-settings.interval  =  1

force-scan-series.header.force-settings.data-description.modification-software  =  

force-scan-series.description.instrument  =  JPK00842-CellHesion-200

force-scan-series.header.force-settings.closed-loop  =  true

force-scan-series.header.force-settings.start-option.type  =  continue

force-scan-series.header.force-settings.data-description.instrument  =  

force-scan-series.header.force-settings.data-description.user-name  =  

force-scan-series.header.force-settings.extended-pause-time  =  5.0

type  =  force-scan-series

force-scan-series.header.force-settings.z-start-pause-option.type  =  constant-height

force-scan-series.header.force-settings.relative-z-start  =  5.0E-5

jpk-data-file  =  spm-forcefile

force-scan-series.header.force-settings.pause-before-first.num-points  =  0

force-scan-series.header.force-settings.extend-scan-time  =  10.0

force-scan-series.header.force-settings.data-description.probe  =  

force-scan-series.header.force-settings.pause-before-first.type  =  constant-height-pause

force-scan-series.header.force-settings.data-description.name  =  

force-scan-series.header.force-settings.force-baseline-adjust-settings.averageSamples  =  100

force-scan-series.header.force-settings.relative-setpoint  =  0.3278660666914438

force-scan-series.header.force-settings.pause-before-first.style  =  pause

force-scan-series.header.force-settings.start-with-retract  =  false

force-scan-series.header.force-settings.pause-before-first.duration  =  0.0

2.1 Parsing properties into tree-like dictionary

Properties seem to have a tree like structure, with node labels separated by dots. It appears more appropriate to parse them into a dictionary with sub-dictionaries recursively.


In [10]:
properties = {}
for line in lines[1:]:
    key,value = line.split("=")
    value = value.strip()
    
    split_key = key.split(".")
    d = properties
    if len(split_key) > 1:
        for s in split_key[:-1]:
            if d.keys().count(s):
                d = d[s]
            else:
                d[s] = {}
                d = d[s]
    d[split_key[-1]] = value

In [11]:
for p in properties:
    print p, " = ",properties[p]


type  =  force-scan-series
file-format-version  =  0.12
jpk-data-file  =  spm-forcefile
force-scan-series  =  {'header': {'force-settings': {'data-description': {'comment': '', 'source-software': '', 'user-name': '', 'name': '', 'probe': '', 'instrument': '', 'modification-software': ''}, 'relative-z-start': '5.0E-5', 'extended-pause-k-length': '30000', 'start-option': {'type': 'continue'}, 'force-baseline-adjust-settings': {'averageSamples': '100', 'deadtimeBeforeSamples': '100', 'interval': '1', 'enabled': 'true', 'beginOfLine': 'true'}, 'retracted-pause-time': '5.0', 'retract-scan-time': '10.0', 'relative-z-end': '0.0', 'extend-scan-time': '10.0', 'extended-pause-time': '5.0', 'pause-before-first': {'style': 'pause', 'pause-option': {'type': 'constant-height'}, 'num-points': '0', 'height-limit': 'NaN', 'duration': '0.0', 'identifier': {'type': 'standard', 'name': 'pause-cellhesion200'}, 'type': 'constant-height-pause'}, 'z-end-pause-option': {'type': 'constant-height'}, 'z-start-pause-option': {'type': 'constant-height'}, 'start-with-retract': 'false', 'retract-k-length': '60000', 'closed-loop': 'true', 'relative-setpoint': '0.3278660666914438', 'type': 'relative-force-settings', 'extend-k-length': '60000', 'control-settings-type': 'segment-control-settings'}, 'type': 'simple-force-scan-series-header'}, 'description': {'comment': 'no comment entry', 'source-software': '4.0.33', 'user-name': 'jpkuser', 'name': 'no name entry', 'probe': 'no probe entry', 'instrument': 'JPK00842-CellHesion-200', 'modification-software': ''}, 'force-segments': {'count': '5'}}

In [12]:
properties['force-scan-series']['header']['force-settings']['force-baseline-adjust-settings']


Out[12]:
{'averageSamples': '100',
 'beginOfLine': 'true',
 'deadtimeBeforeSamples': '100',
 'enabled': 'true',
 'interval': '1'}

2.2 Lower level header files appear to have a slightly different header with one additional line

So here one would have to skip one line at the start, apart from that the format seems to be identical.


In [13]:
fname = z.filelist[-6].filename
print fname


segments/3/segment-header.properties

In [14]:
f = z.open(fname)
lines = f.readlines()
print(lines[0])
print(lines[1])


### ----------- internal settings, do not edit this file -----------

#Wed Jun 15 13:17:28 CEST 2016

3. Read data from files

Data files (.dat) contain data apparently exclusively in C short format. To convert it to python-compatible integers, use the struct module.


In [14]:
from struct import unpack

In [15]:
fname = z.filelist[-12].filename
print fname


segments/2/channels/height.dat

In [16]:
f = z.open(fname)
content = f.read()
print(len(content))


120000

According to the JPKay guys, every 4 items make one data point


In [17]:
content[0], content[1], content[2], content[3]


Out[17]:
('8', '\x02', '8', '\x02')

In [18]:
data = unpack(">i", content[0:4])
print data


(939669506,)

According to the struct.unpack documentation, however, every 2 items should make a data point in short format. I don't get why the header says format is short ...


In [19]:
fname = z.filelist[-13].filename
print fname
f = z.open(fname)
lines = f.readlines()
properties = {}
for line in lines[2:]:
    key,value = line.split("=")
    value = value.strip()
    
    split_key = key.split(".")
    d = properties
    if len(split_key) > 1:
        for s in split_key[:-1]:
            if d.keys().count(s):
                d = d[s]
            else:
                d[s] = {}
                d = d[s]
    d[split_key[-1]] = value
print properties['channel']['height']['data']['type']


segments/2/segment-header.properties
short

... and they still use 4 items instead of 2.


In [20]:
data = unpack(">h", content[0:2])
print data


(14338,)

With 120000 items per data file, and a number of points apparently 60000 ...


In [21]:
properties['force-segment-header']['num-points']


Out[21]:
'60000'

... 2 items has to be right, not 4.


In [ ]: