CorMel data comprises segments that are conceptually nodes in a graph. This is not surprising, since the CorMel data structures represent transaction trees. Representing and analysing them as graphs in a graph database such as Neo4j offers many opportunities to gain insight into transaction trees, both individually and as groups sharing properties such as whether the transaction tree completed successfully. This type of analysis can be done in the graph database, in applications downstream for the database, or in both.
However, a significant amount of processing is needed to support this analysis, and this processing can be arranged in a pipeline, as described in more detail below.
The CorMel system takes data from the logs and generates the transaction trees.
This data is then serialised in Avro format and
stored efficiently in (binary) data files. A single example file was provided
for development purposes and placed in the cormel2neo/input
directory. In Avro
format it occupies 13.9MB and contains 24,499 transaction tree records, as
counted by the avrocount command-line
tool:
java -jar ~/tools/avro/avrocount-0.3.0.jar\
input/par_U170504_010000_S170504_005800_D60_lgcaa101_20205_0000.gz.avro\
2> /dev/null
The Avro format, being binary, is not suitable for inspecting the data.
However, it can be converted easily to JSON format. First, we use the
avro-tools tool to derive the CorMel Avro schema
cormel.avsc
:
java -jar ~/tools/avro/avro-tools-1.8.2.jar getschema\
input/par_U170504_010000_S170504_005800_D60_lgcaa101_20205_0000.gz.avro\
> cormel.avsc
We can now use avro-tools to generate the JSON format,
using the --pretty
option, otherwise the generated JSON lines are extremely long
and hard to read:
java -jar ~/tools/avro/avro-tools-1.8.2.jar tojson --pretty\
input/par_U170504_010000_S170504_005800_D60_lgcaa101_20205_0000.gz.avro\
> input/converted.json
Alternatively, jq
can be used with its default operation being to pretty
print with 2 spaces used for indentation:
jq . < (input).json > (input)pp.json
For reference, the resulting input/converted.json
occupies 395.2MB and has
14,470,577 lines of text.
Once the data is in CorMel Avro format, it is possible to investigate it and generate subsets.
One investigation concerned whether the DcxID
field in each Usegment
is
sufficient to identify transaction trees uniquely. The following command
generated a count of DcxId:
grep '^ "DcxId" :' input/converted.json | sort | uniq | wc -l
which had fewer unique values (19,815) than there were transaction trees
(24,499). Further inspection indicated that the combination of DcxId
and
TreeId
appears to be unique to each transaction tree.
This analysis proved to be useful during development when it was found that
some records were not being uploaded to neo4j. By identifying an example DcxId
= "08044X4RIR6H1CW6S739ZB#T91"
that was associated with 3 transaction trees,
it was possible to check that the revised version was loading all the
transactions, as required. First, we used jq
to extract the 3 transaction tree records into a smaller CorMel JSON file,
input/filtered.json
:
jq '. | select(.DcxId | contains("08044X4RIR6H1CW6S739ZB#T91"))?' <\
input/converted.json > input/filtered.json
The avro-tools tool was then used to convert
input/filtered.json
file back to Avro format as input/filtered.avro
, for a
test upload to neo4j.
java -jar ~/tools/avro/avro-tools-1.8.2.jar fromjson --schema-file\
cormel.avsc input/filtered.json > input/filtered.avro
In general, however, it is more convenient to work with the (space-efficient) Avro format directly, especially since avro-tools provides bindings for common languages (such as Java and python) to perform operations (notably serialising and deserialising) to/from Avro data.
Neo4j claims to be the world's leading graph database platform. It is relatively mature and so offers many tools, particularly in its APOC extension suite, for graph data analysis and other advanced operations. Thus it was the obvious graph platform choice.
To upload the data into Neo4j, it is necessary to parse the CorMel data
hierarchy for each transaction tree, and to map each entity into a Neo4j node,
with the edges in the transaction tree being mapped to Neo4j relationships. The
data fields in each CorMel segment are mapped to properties in the associated
Neo4j node. The node types (A
, E
, H
, T
and U
) become Neo4j node
labels, and are also added as an extra node property for convenience when
creating queries.
As with any database, it is necessary to define constraints so that duplicate nodes and relationships are not generated when parsing the transaction trees. For the legacy indexes in Neo4j 1.x, it was the developer's responsibility to write the business logic to enforce the constraints to prevent entity duplication. Since Neo4j version 2.0, database-level index operations have become available and much of this logic can be defined declaratively, at the database level, as would be common in RDBMS. For Neo4j Enterprise customers, uniqueness constraints can be defined in terms of a combination of property fields, but each constraint in Neo4j Community Edition is limited to a single property field. In RDBMS terms, the distinction is between simple and compound indexes. As a workaround, we derived a key based on a concatenation of the relevant properties in each node, stored it in the node as an additional property, and defined the uniqueness constraint in terms of this derived key.
This workaround achieves the objective of guaranteeing uniqueness, but at the cost of increasing the space requirement for each node.
Initially, Neo4j was deployed as a service running in a docker container on the
development laptop. This worked well, until we needed to deploy the latest
version of APOC, and could not find a suitable container definition. We then
switched to a local (in the sense of being installed into ~/tools
)
installation of the Neo4j service.
The upload application is written in Java and interacts with the Neo4j server
by issuing parametrised cypher commands via the the Java database driver over
Neo4j's Bolt
binary protocol. Generally, the parameters define the node
properties (key-value pairs) containing the data that needs to be uploaded from
the CorMel segments comprising the transaction tree.
The CorMel upload runs in its own JVM, outside the JVM used by Neo4j, but sharing resources such as the laptop's CPU, memory and disk. Therefore efficient use of these resources is a priority.
Initial upload runs had poor performance. Switching to a more powerful laptop (with a Xeon server-class processor and 64GB of memory) brought little improvement. By instrumenting the code, we discovered that performance dropped as more data was uploaded: transaction tree load times started at about 14 seconds and steadily increased as more transaction trees were added to the database. Thus a full load (of approximately 25K transaction trees) would take days to complete. CPU activity was very high so the laptop fan needed to work hard to keep the laptop cool. Thus we stopped the uploads as soon as it became apparent they were making slow progress.
Further analysis indicated that initial versions of the upload application did not rebuild the Neo4j indexes after refreshing the database and so were unable to benefit from these indexes when enforcing the constraints, resulting in the graph database equivalent of "full table scans". When this problem was fixed, the overall run time dropped to less than 7 minutes for the full set of transaction trees.
Resource usage remained quite high, so more flexible transaction handling was
introduced. Initial versions opened up a single session for each file upload,
which had the effect of beginning a transaction which was closed when the
session closed after all the CorMel data in that file had been processed.
However, Neo4j allows developers to create transactions explicitly. It is even
possible, though discouraged in the documentation, to insert
beginTransaction()
and corresponding success()
(commit) and close()
method calls in the code. This fine degree of transaction control was added,
shaving about 20 seconds off the overall run time, and reducing the resource
usage (as seen from the output of the top
command).
Timing data for each run can be found in output/timings/yyyymmdd_HHMMSS.txt
,
where yyyymmdd_HHMMSS
represents a typical timestamp for when the run
started. Timing data can be plotted using the octave
function
script/plotTimings.m
as follows:
echo "cd script; plotTimings(\"../output/timings/20170802_153002.txt\")"\
| octave -qf 2> /dev/null
where the resulting plot can be found in output/timings/20170802_153002.pdf
which can be viewed in Figure 1 below. For convenience, the PDF can be cropped
as follows:
pdfcrop --margins 5 output/timings/20170802_153002.pdf\
graphics/20170802_153002-CROPPED.pdf
and converted to PNG (for insertion into MS documents on MS Windows; PDF gives better results (because it is a vector format) in both LaTeX and LibreOffice documents and MS documents on MacOS) using
pdftoppm -f 1 -singlefile -png graphics/20170802_153002-CROPPED.pdf\
graphics/20170802_153002-CROPPED
The resulting plot can be viewed in Figure 1 below.
For convenience, it is possible to recreate this and other PNG files using
script/pdf2png.sh 1024
where 1024
in this example represents the desired resolution, in pixels, of
the longest side of the image. Note that script/pdf2png.sh
is designed not to
overwrite existing PNG files in the set it generates.
The upload application has many transitive dependencies and so was built using
maven. It is run as a command line application with two arguments: the
location of the Avro input file, and a string (which defaults to partial
)
indicating whether this is a full (all records) or partial (just a subset of
the records) upload. If it is a partial load,
stdout
and sent to a timings file.transaction
per session
For a full
upload, progress reporting frequency is reduced (once every 500
transaction tree uploads, say) and there are multiple transaction
s per
session
(a transaction is committed, closed and a new one opened after 2500
transactions trees have been uploaded, say).
Upload runs have the following form:
script/uploadCormelAvroToNeo4j.sh\
-r\
-t full\
input/par_U170504_010000_S170504_005800_D60_lgcaa101_20205_0000.gz.avro
or
script/uploadCormelAvroToNeo4j.sh -r -t partial input/sample.avro
This bash script stops the Neo4j database if it is running, resets the database and restarts it before invoking the Java application with the appropriate arguments. The Java application does the work.
Since the Java application was developed using maven, it can be rebuilt using
cd <path_to_cormel2neo>
mvn clean generate-sources compile
In the future other tasks might be added. Therefore a convenience script has been added which can be invoked as follows:
script/rebuild.sh
Neo4j, via its APOC extension package, offers the following graph analysis algorithms:
apoc.algo.closeness(...)
apoc.algo.betweenness(...)
apoc.algo.pageRankWithConfig(...)
and apoc.algo.pageRank(...)
Because of the structure of the graph, the results are not particularly interesting. However, if the transaction trees were classified into "successful" and "failed" categories, it might be possible to use such per-node scores to suggest interesting discriminating features.
Neo4j provides a basic visualisation using a force-directed layout. While this is adequate for development and testing purposes, it is difficult to see the underlying "forest of trees" structure of the graph.
yEd and Gephi are attractive tools for visualising graphs. The former is more of a business diagram drawing tool with particularly good support for graph-based diagrams. The latter is intended more for visualising graphs in their own right; it also has extensive graph metric calculations.
There are potentially ways to stream Neo4j data to Gephi for plotting, but no equivalent support is offered for yEd. Therefore, the approach that offers the most flexibility is to serialise Neo4j data to a common format. In that regard, graphml is the most attractive format, as
The main downside is that graphml is a verbose XML-based format and hence results in relatively large files.
The following command can be used to export the Neo4j database in graphml format:
script/exportNeo4jToGraphml.sh script/exportNeo4jToGraphml.cql\
script/full.cql output/graphml/full.graphml
The problem with this procedure is that it does not scale well with the size of the database and has never run to completion in any test so far. However, in that regard, it would be impossible to interpret a visualisation of 24,499 transaction trees, so it makes sense to derive a random sample of such transaction trees, and to export this to graphml instead.
This stackexchange question
indicates two ways to extract a random subforest of the full forest. Neo4j JVM
memory management problems arose with both, even with the JVM memory limit
setting in neo4j.conf
increased to 4GB (from 512MB). A common error message
was GC limit exceeded
where GC
is the JVM's Garbage Control process. Even
when just the queries run, without attempting to serialise the query results to
graphml, the problem persists. Therefore, an alternative approach was needed.
There is a tool (ratatool
) that
extracts a sample of records from Avro files. A bash script was written to
provide a more convenient interface. An example invocation of this bash script
is:
script/randomSampleFromAvro.sh -i\
input/par_U170504_010000_S170504_005800_D60_lgcaa101_20205_0000.gz.avro\
-o input/sample20.avro -n 20
This sample data can be uploaded, replacing any existing CorMel data in the Neo4j database, using
script/uploadCormelAvroToNeo4j.sh input/sample20.avro partial
This data can then be exported as graphml using either
script/exportNeo4jToGraphml.sh script/exportNeo4jToGraphml.cql\
script/full.cql output/graphml/sample20.graphml
or
script/exportNeo4jToGraphml.sh script/exportNeo4jToGraphml.cql\
script/fullAPOC.cql output/graphml/sample20.graphml
Both yEd and Gephi import the exported graphml files without complaint.
However, there is some data loss. This is because the APOC graphml export
function does not not "register" graphml <key ..>
elements for each of the
Neo4j node properties stored in graphml <data ..>
elements. Consequently
these <data ..>
elements are dropped silently on import. The properties and
their values provide vital context for each node, notably their Neo4j label
(equivalently, their CorMel segment type) among others, so it is necessary to
ensure that this data is protected from deletion on upload.
A python script was written to add the missing elements, and also to add data
that is interpreted by the importing application (yEd or Gephi) to display the
node in colour. This type of additional data is application-specific (different
XML attributes are used for the Neo4j properties in each <key..>
definition
format, and node colours are specified differently also), so two
variants of extended graphml are needed (one for Gephi, one for yEd).
script/neo4jGraphmlToOtherGraphml.py output/graphml/sample20.graphml\
output/graphml/sample20gephi.graphml gephi
and
script/neo4jGraphmlToOtherGraphml.py output/graphml/sample20.graphml\
output/graphml/sample20yed.graphml yed
The files relevant to each application can be uploaded without data loss, and the nodes will be coloured according to their CorMel segment (equivalently, Neo4j label).
Several different layouts can be used to display transaction trees in both
Gephi and yEd. Gephi offers a larger choice of options, but in practice, the
graphviz/dot
layout offers the best way to visualise the transaction trees.
yEd offers fewer layout choices, but arguably more of them are suited to transaction tree visualisation. In particular, two sets of layouts are most helpful:
hierarchical
series-parallel
and
circular
tree-balloon
The first two are similar to Gephi's graphviz/dot
layout, and represent trees
in the traditional fashion, which works well for relatively narrow, deep trees.
The second two work better for broad, shallow trees. In the samples we have
seen, transaction trees can take either "shape".
Even with a sample of just 20 transaction trees, it can be difficult to interpret the graph visualisations. This is because some transaction nodes are degenerate in the sense that relatively few fields have values assigned to them. When such transaction nodes are shared between separate transaction trees, this adds complexity to the graph because the corresponding transaction trees overlap each other.
Thus there is a visualisation dilemma: visualise single trees (thereby losing some information and hiding some of the complexity), or visualise the sample of trees (keeping all the information but making interpretation more difficult). The latter is more realistic but the resulting "tangle" of transaction trees looks like overgrown woodland rather than a forestry plantation.
The solution was to show transaction trees in context, but to highlight the edges belonging to a specified tree. The transaction tree we wish to highlight is displayed with black edges but the edges of other trees are light grey.
To achieve this type of labeling, it was necessary to enhance the neo4j data
model as it applies to the relationships in the graph. In particular, it was
necessary determine a (combination) of identifiers that is unique per
transaction tree. Since each tree is rooted in a single U
node, it makes
sense to label the tree according to an identifier based on the fields (Neo4j
properties) in its root U
node. While the concatenation of all field values
in that node serves as a possible tree identifier, it is long and cumbersome to
use. A sufficient subset comprises the DcxId
and TreeId
fields. These
attributes were added to all the edges in the transaction tree, and were
assigned so that every edge "under" the root U
node has the same values of
DcxId
and TreeId
as the root node.
There were some technical issues to overcome, in the sense that yEd tended to
"drop" the value of the DcxId
field from the edges when exporting the diagram
as graphml. The TreeId
field did not have this problem, as it was an
integer. It appeared that yEd might choose to do so because of the presence of
non-alphanumeric characters (such as "$" and "#") in the DcxId
strings.
Wrapping the DcxId
values in 'CDATA[...]protected them from the first stage
of yEd exports, but not from the second. Therefore, it was decided to derive a
numeric *surrogate* key for transaction trees, and to add this field, with the
CorMel-derived
DcxIdand
TreeId` fields in the transaction tree edges.
A further technical issue was caused by the fact that some transaction tree edges, and not just transaction nodes, are "shared" between transaction trees. Such shared edges occur when a shared transaction node is connected to another shared transaction node in the same transaction tree. While shared edges share the same start and end nodes, they have different property values and hence neo4j sees them as different edges. This complication has two effects:
Gephi ignores the edge property values and focuses just on the start and end
nodes of each edge. Thus it sees "shared" edges as repeated edges, and displays
a single edge instead whose weight (hence line width) is the sum of the weights
of the individual edges sharing those start and end nodes. yEd notices the
differing property values between the "shared" edges and so does not do
anything special with such edges. It was decided that Gephi's interpretation
was unwelcome, because it drew excessive attention to such shared edges.
Consequently, a processing stage was added while populating the neo4j database
to create a table whose columns were a) the fromNodeId
, b) the toNodeId
and
c), the corresponding list of edgeId
s. In most cases, that list contained a
single element, but where "shared" edges occurred, two or more edgeId
s could
be found. This was added as a further neo4j edge property (Rel_TreeList
)
which is exported to graphml as edgeTreeList
. From the edgeTreeList
property for each edge, a derived card
(short for "cardinality") is computed
from the number of neo4j relationships sharing that edge. The exporter for
Gephi was modified so that the edge weight was assigned 1/card
instead of 1
as before.
A race condition is introduced by the presence of potentially contrasting line colours for the "same" edge. That is, if the highlighted transaction tree "shares" an edge with one or more standard transaction trees, the colour of that edge depends on which tree has the highest surrogate key value, and not on whether the transaction tree is highlighted or not. The solution was not to change the colour, but to change the line width of all non-selected transaction tree edges in that set of shared edges, provided the selected tree is one of them. Since the line width was set to zero, it becomes invisible, and so cannot overwrite the edge if it was already drawn as a highlighted edge.
When the output/graphml/sample20gephi.graphml
or
output/graphml/sample20gephi.graphml
has been generated as described earlier,
it is possible to upload into the relevant application (Gephi or yEd,
respectively). After the layout algorithm has been applied, it is the possible
to export the resulting graph (with position information) to files such as:
output/graphml/sample20gephi_dot.graphml
output/graphml/sample20yed_circular.graphml
output/graphml/sample20yed_hierarchical.graphml
output/graphml/sample20yed_seriesParallel.graphml
output/graphml/sample20yed_treeBalloon.graphml
Such files show all transaction trees together. Some trees overlap, making interpretation difficult, as described earlier.
To highlight each of the transaction trees, for each of the yEd-based layouts ("circular", "hierarchical", "series-parallel" and "tree-balloon"), the following script is convenient:
script/highlightTypesRangeOfTrees.sh 20
where it has been assumed that the input file names follow the pattern above. The resulting graphml files take the form:
output/graphml/sample${numTrees}${app}_${layout}_hl${i}.graphml
with examples such as
output/graphml/sample20yed_circular_hl01.graphml
...
output/graphml/sample20yed_circular_hl20.graphml
output/graphml/sample20yed_hierarchical_hl01.graphml
...
output/graphml/sample20yed_hierarchical_hl20.graphml
output/graphml/sample20yed_seriesParallel_hl01.graphml
...
output/graphml/sample20yed_seriesParallel_hl20.graphml
output/graphml/sample20yed_treeBalloon_hl01.graphml
...
output/graphml/sample20yed_treeBalloon_hl20.graphml
For publication purposes, it is necessary to convert the 20*4 = 80 such files from graphml to pdf. This is achieved by opening each file in yEd and exporting it (in PDF format) to a file with the same name but a pdf extension. Unfortunately this is a manual operation: yEd is a freeware graph viewer and its parent company (yWorks GmbH) sells a suite of software that can be used to automate such processes.
For including the PDF graph "pictures", it is more convenient to trim unnecessary whitespace that had been added by yEd so that the graphs fit in an A4 page.This can be achieved using
script/cropPdfs.sh 20
which defaults to PDF filenames with the following pattern:
output/graphml/sample${numTrees}${app}_${layout}_hl${i}.graphml
Optionally, it is possible to combine these PDF files by yEd layout type, using
script/assemblePdfs.sh
giving
output/pdf/sample20yed_circular_hl.pdf
output/pdf/sample20yed_hierarchical_hl.pdf
output/pdf/sample20yed_seriesParallel_hl.pdf
output/pdf/sample20yed_treeBalloon_hl.pdf
It is then possible to open such files in PDF viewer and compare different transaction trees in a more convenient fashion, e.g., by viewing them as "animations" by advancing the page: human visual perception is relatively good at noticing differences between successive "frames" (graph plots).
CorMel focuses on building transaction trees, and uses time-based criteria to decide when the transaction tree is complete, and when to flush the current set of transaction trees to disk.
Meanwhile, the logging system captures ApplicationEvents which include the transactions but also other data, including error and warning messages.
To obtain transaction-related AppEvent data, it is necessary to filter the
enormous volume of such records. This can be done by issuing an elasticsearch
query (with suitable criteria) via a webform on Amadeus' Sentinel logging
portal. The results are provided as indented JSON records to the client's
browser. Because of rendering/memory limitations, the results need to be
limited to 5000 records per query.
CorMel U
segments are keyed by the DcxId
and TreeId
fields. The first
identifies the distributed context of the root transaction (a T
segment).
The second reflects the fact that the DcxId
is not unique per session.
However the combination is unique per transaction tree over all sessions.
CorMel T
segments inherit the DcxId
of their parent U
segment. Combining
this with their TrxNb
field, which indicates their position in the
transaction tree, e.g., 2-3-4
is the fourth "grandchild" transaction of the
third "child" transaction of the root transaction with TreeId
= 2.
By contrast, AppEvents have different keys depending on the source of those
events. In particular, for DCS (i.e., (aircraft) Departure Control System)
events, the key field is named Prefix
, which is generally a concatenation of
a code, followed by the DcxId
followed by the TrxNb
. Therefore, by parsing
this Prefix
, it is possible in principle to match an AppEvent involving a
specific transaction to the same transaction in CorMel. Unfortunately, some
AppEvents have degenerate Prefix
fields, usually when they contain just the
DcxId
component. In such cases, the AppEvent is arbitarily assigned a
TrxNb
of 0 (which is not used in CorMel). Note that AppEvent data is
processed using the fum2neo
project, which is a sister project to this
project cormel2neo
.
The CorMel extract process, which saves CorMel data in Avro files, failed
during May. However, I fixed the extract code in the cormel-tree-parser
Java
project, so that it can convert Cormel text file output to Cormel Avro output,
as required by the cormel2neo
project. This conversion from text to Avro
needs to be run manually, because the fixes have not been approved yet to run
on production servers.
A period of a (localised) network outage was identified, and a query specifying
DCS-generated events, with non-empty TransactionStatus
field and timestamps
in that period, was issued to the elasticsearch processor. The results
comprised 7.4MB of indented JSON AppEvent records. In concert with this, a
minute's worth of CorMel data (2.2GB across 128 files) was collected, converted
to Avro format (typically about 10 seconds per file) and prepared for upload.
Unfortunately, since this upload process takes at least 6 minutes per file, and the Neo4j database is not distributed, file uploads need to be run sequentially and so the elapsed time to upload all 128 files is of the order of 14 hours.
Consequently, a different "filter first" approach was taken. According to this
strategy, fum2neo
generates two data structures from the AppEvent extract.
This first combines DcxId
, TrxNb
and TransactionStatus
, which can be used
to update a specific CorMel-supplied transaction with its AppEvent status. The
second combines DcxId
, TreeId
(derived from the TrxNb
field) and a
Count
of all the AppEvent
records with this combination of DcxId
and
TreeId
. The latter can be used to filter transaction trees from CorMel on
upload: if they do not exist in this filter data structure, they are not
uploaded to CorMel.
With this change, the time take to read the CorMel files and upload the
relevant transaction trees (and label the relevant transactions in these trees
with their TransactionStatus
) is reduced to about 12 minutes, by contrast
with approximately 15 hours for a "label afterwards" strategy. The disk space
requirement for the database (to store data as nodes and relationships, with
indexes to ensure good query performance) is also greatly reduced, of course.
Figure 1 below is a summary of the processing pipeline described above.
For testing and other purposes, extra options have been added. In particular, it is convenient to
cormel/output
directory. This data can also be obtained by other means, but it iis
convenient to have them in text files for comparison;As an example, we can use the following.
cd ~/cormel2neo
appEventFile=$HOME/fum2neo/output/FUM_DCS_NW_2017_08_29.gz.avro
cormelFile=$HOME/data/cormel/par_U170829_093700_S170829_093500_D60_lgcap602_34547_0097.gz.avro
script/uploadCormelAvroToNeo4j.sh -l -k -a $appEventFile $cormelFile
Note the -k
option to generate the key files
output/cormelTreeKeys/par_U170829_093700_S170829_093500_D60_lgcap602_34547_0097.txt
output/fumTreeKeys/FUM_DCS_NW_2017_08_29.txt
and the -l
option to upload into the Neo4j database.
Also, the filtered CorMel files can be found in output/filteredCormel/par_U170829_093700_S170829_093500_D60_lgcap602_34547_0097.gz.avro
etc.
Also, a set of CorMel transaction tree files can be processed with a single invocation, by changing the definition of cormelFile
above to something like
cormelFile=$HOME/data/cormel/incident
which is a directory of files containing CorMel data, in Avro format, relevant to an incident, say.