.. highlight:: none
.. _data-model-terms:
Data Model - Terminology
========================
**Note:** This documentation presents the data model from a User or Analyst perspective. See the online documentation on Types_ and Forms_ or the Synapse source code_ for more detailed information.
Recall that **Synapse is a distributed key-value hypergraph analysis framework.** That is, Synapse is a particular implementation of a hypergraph model, where an instance of a hypergraph is called a Cortex. In our brief discussion of graphs and hypergraphs, we pointed out some fundamental concepts related to the Synapse hypergraph implementation:
- **Everything is a node.** There are no pairwise ("two-dimensional") edges in a hypergraph the way there are in a directed graph. While Synapse includes some edge-like nodes (digraph nodes or "relationship" nodes) in its data model, they are still nodes.
- **Tags act as hyperedges.** In a directed graph, an edge connects exactly two nodes. In Synapse, tags are labels that can be applied to an arbitrary number of nodes. These tags effectively act as an n-dimensional edge that can connect any number of nodes – a hyperedge.
- **(Almost) every navigation of the graph is a pivot.** Since there are no pairwise edges in a hypergraph, you can’t query or explore the graph by traversing its edges. Instead, navigation primarily consists of pivoting from the properties of one set of nodes to the properties of another set of nodes. (Since tags are hyperedges, there are ways to lift by or "pivot through" tags to effectively perform "hyperedge traversal"; but most navigation is via pivots.)
To start building on those concepts, you need to understand the basic elements of the Synapse data model. The fundamental terms and concepts you should be familiar with are:
- Type_
- Form_
- Node_
- Property_
- Tag_
Synapse uses a query language called **Storm** (see :ref:`storm-ref-intro`) to interact with data in the hypergraph. Storm allows a user to lift, filter, and pivot across data based on node properties, values, and tags. **Understanding these model structures will significantly improve your ability to use Storm and interact with Synapse data.**
.. _data-type:
Type
----
A **type** is the definition of a data element within the Synapse data model. A type describes what the element is and enforces how it should look, including how it should be normalized, if necessary, for both storage (including indexing) and representation (display).
The Synapse data model includes standard types such as integers and strings, as well as common types defined within or specific to Synapse, including globally unique identifiers (``guid``), date/time values (``time``), time intervals (``ival``), and tags (``syn:tag``). Many objects (:ref:`data-form`) within the Synapse data model are built upon (extensions of) a subset of common types.
In addition, knowledge domain-specific objects may themselves be specialized types. For example, an IPv4 address (``inet:ipv4``) is its own specialized type. While an IPv4 address is ultimately stored as an integer, the type has additional constraints (i.e., to ensure that IPv4 objects in the Cortex can only be created using integer values that fall within the allowable IPv4 address space). These constraints may be defined by a constructor (``ctor``) that defines how a property of that type can be created (constructed).
Users typically will not interact with types directly; they are primarily used "behind the scenes" to define and support the Synapse data model. From a user perspective, it is important to keep the following points in mind for types:
- **Every element in the Synapse data model must be defined as a type.** Synapse uses **forms** to define the objects that can be represented (modeled) within a Synapse hypergraph. Forms have **properties** (primary and secondary) and every property must be explicitly defined as a particular type.
- **Type enforcement is essential to Synapse’s functionality.** Type enforcement means every property is defined as a type, and Synapse enforces rules for how elements of that type can (or can’t) be created. This means that elements of the same type are always created, stored, and represented in the same way which ensures consistency and helps prevent "bad data" from getting into a Cortex.
- **Type awareness facilitates interaction with a Synapse hypergraph.** Synapse and the Storm query language are "model aware" and know which types are used for each property in the model. At a practical level this allows users to use a more concise syntax when using the Storm query language because in many cases the query parser "understands" which navigation options make sense, given the types of the properties used in the query. It also allows users to use wildcards to pivot (see :ref:`storm-ref-pivot`) without knowing the "destination" forms or nodes - Synapse "knows" which forms can be reached from the current set of data based on types.
- **It is still possible to navigate (pivot) between elements of different types that have the same value.** Type enforcement simplifies pivoting, but does not restrict you to only pivoting between properties of the same type. For example, the value of a Windows registry may be a string (type ``str``), but that string may represent a file path (type ``file:path``). While the Storm query parser would not automatically "recognize" that as a valid pivot (because the property types differ), it is possible to explicitly tell Storm to pivot from a specific ``file:path`` node to any registry value nodes whose string property value (``it:dev:regval:str``) matches that path.
Type-Specific Behavior
++++++++++++++++++++++
Synapse implements various type-specific optimizations to improve performance and functionality. Some of these are "back end" optimizations (i.e., for indexing and storage) while some are more "front end" in terms of how users interact with data of certain types via Storm. See :ref:`storm-ref-type-specific` for additional detail.
Viewing or Working with Types
+++++++++++++++++++++++++++++
Types (both base and model-specific) are defined within the Synapse source code. An auto-generated dictionary (from current source code) of Types_ can be found in the online documentation.
Types can also be viewed within a Cortex. A full list of current types can be displayed with the following Storm command:
``cli> storm syn:type``
See :ref:`storm-ref-model-introspect` for additional detail on working with model elements within Storm.
Type Example
++++++++++++
The data associated with a type’s definition is displayed slightly differently between the Synapse source code, the auto-generated online documents, and from the Storm command line. Users wishing to review type structure or other elements of the Synapse data model are encouraged to use the source(s) that are most useful to them.
The example below shows the type for a fully qualified domain name (``inet:fqdn``) as it is represented in the Synapse source code, the online documents, and from Storm.
Source Code
***********
.. parsed-literal::
('inet:fqdn', 'synapse.models.inet.Fqdn', {}, {
'doc': 'A Fully Qualified Domain Name (FQDN).',
'ex': 'vertex.link'}),
Auto-Generated Online Documents
*******************************
**inet:fqdn**
A Fully Qualified Domain Name (FQDN). It is implemented by the following class: ``synapse.models.inet.Fqdn``.
A example of ``inet:fqdn``:
- ``vertex.link``
Storm
*****
.. _data-form:
Form
----
A **form** is the definition of an object in the Synapse data model. A form acts as a "template" that tells you how to create an object (Node_). While the concepts of form and node are closely related, it is useful to maintain the distinction between the template for creating an object (form) and an instance of a particular object (node). ``inet:fqdn`` is a form; ``inet:fqdn = woot.com`` (``
Form with secondary properties:
.. _data-node:
Node
----
A **node** is a unique object within the Synapse hypergraph. In Synapse nodes represent standard objects ("nouns") such as IP addresses, files, people, bank accounts, or chemical formulas. However, in Synapse nodes also represent relationships ("verbs") because what would have been an edge in a directed graph is now also a node in a Synapse hypergraph. It may be better to think of a node generically as a "thing" - any "thing" you want to model within Synapse (entity, relationship, event) is represented as a node.
Every node consists of the following components:
- A **primary property** that consists of the Form_ of the node plus its specific value. All primary properties (``
In the output above:
- ``inet:fqdn = google.com`` is the **primary property** (``
Secondary property:
.. _data-tag:
Tag
---
**Tags** are annotations applied to nodes. Simplistically, they can be thought of as labels that provide context to the data represented by the node.
Broadly speaking, within Synapse:
- Nodes represent **things:** objects, relationships, or events. In other words, nodes typically represent facts or observables that are objectively true and unchanging.
- Tags typically represent **assessments:** judgements that could change if the data or the analysis of the data changes.
For example, an Internet domain is an "objectively real thing" - a domain exists, was registered, etc. and can be created as a node such as ``inet:fqdn = woot.com``. Whether a domain has been sinkholed (i.e., where a supposedly malicious domain is taken over or re-registered by a researcher to identify potential victims attempting to resolve the domain) is an assessment. A researcher may need to evaluate data related to that domain (such as domain registration records or current and past IP resolutions) to decide whether the domain appears to be sinkholed. This assessment can be represented by applying a tag such as ``#cno.infra.sink.hole`` to the ``inet:fqdn = woot.com`` node.
Tags are unique within the Synapse model because tags are both **nodes** and **labels applied to nodes.** Tags are nodes based on a form (``syn:tag``, of type ``syn:tag``) defined within the Synapse data model. That is, the tag ``#cno.infra.sink.hole`` can be applied to another node; but the tag itself also exists as the node ``syn:tag = cno.infra.sink.hole``. This difference is illustrated in the example below.
Tags are introduced here but are discussed in greater detail in :ref:`analytical-model-tags`.
Viewing or Working with Tags
++++++++++++++++++++++++++++
As tags are nodes (data) within the Synapse data model, they can be viewed and operated upon just like other data in a Cortex. Users typically interact with Cortex data via the Synapse cmdr command line interface (:ref:`syn-tools-cmdr`) using the Storm query language (:ref:`storm-ref-intro`).
See :ref:`storm-ref-model-introspect` for additional detail on working with model elements within Storm.
Tag Example
+++++++++++
The Storm query below displays the **node** for the tag ``cno.infra.sink.hole``:
The Storm query below displays the **tag** ``#cno.infra.sink.hole`` applied to the **node** ``inet:fqdn = hugesoft.org``:
Note that a tag **applied to a node** uses the "tag" symbol ( ``#`` ). This is a visual cue to distinguish tags on a node from the node's secondary properties. The symbol is also used within the Storm syntax to reference a tag as opposed to a ``syn:tag`` node.
.. _Types: https://vertexprojectsynapse.readthedocs.io/en/latest/autodocs/datamodel_types.html
.. _Forms: https://vertexprojectsynapse.readthedocs.io/en/latest/autodocs/datamodel_forms.html
.. _code: https://github.com/vertexproject/synapse
.. _section: https://vertexprojectsynapse.readthedocs.io/en/latest/autodocs/datamodel_forms.html#universal-properties
.. _Majestic: https://majestic.com/reports/majestic-million