You are browsing the docs for Nexus v1.6.x, the latest release is available here
Graph analytics
Graph analytics are a functionality introduced by the graph-analytics
plugin and rooted in the /v1/graph-analytics/{org_label}/{project_label}
collection. They provide ways to get insights about the data and their relationships in terms of types (@type field).
When reading graph analytics, the caller must have resources/read
permissions on the current path of the project or the ancestor paths.
Please visit Authentication & authorization section to learn more about it.
The described endpoints are experimental and the responses structure might change in the future.
Fetch relationships
Obtains all the @type relationships and their counts - defined by nodes and edges - existing on a project.
The edges are the properties linking different nodes, and the nodes are the resources containing a certain @type.
GET /v1/graph-analytics/{org_label}/{project_label}/relationships
Example
- Request
-
source
curl "http://localhost:8080/v1/graph-analytics/myorg/myproj/relationships"
- Response
-
source
{ "@context": "https://bluebrain.github.io/nexus/contexts/relationships.json", "_edges": [ { "_count": 2, "_path": [ { "@id": "http://schema.org/brother", "_name": "brother" } ], "_source": "http://schema.org/Person", "_target": "http://schema.org/Person" } ], "_nodes": [ { "@id": "http://schema.org/Person", "_count": 3, "_name": "Person" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/View", "_count": 2, "_name": "View" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/DiskStorage", "_count": 1, "_name": "DiskStorage" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/ElasticSearchView", "_count": 1, "_name": "ElasticSearchView" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/InProject", "_count": 1, "_name": "InProject" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/Project", "_count": 1, "_name": "Project" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/Resolver", "_count": 1, "_name": "Resolver" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/SparqlView", "_count": 1, "_name": "SparqlView" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/Storage", "_count": 1, "_name": "Storage" } ] }
Fetch properties
Obtains all the @type properties and their counts.
The different between properties and relationships is that properties are enclosed inside the same resource, while relationships are statements between different resources.
GET /v1/graph-analytics/{org_label}/{project_label}/properties/{type}
…where {type}
is an IRI defining for which @type we want to retrieve the properties.
Example
- Request
-
source
curl "http://localhost:8080/v1/graph-analytics/myorg/myproj/properties/schema:Person"
- Response
-
source
{ "@context": "https://bluebrain.github.io/nexus/contexts/properties.json", "@id": "http://schema.org/Person", "_count": 3, "_name": "Person", "_properties": [ { "@id": "http://schema.org/givenName", "_count": 3, "_name": "givenName" }, { "@id": "http://schema.org/brother", "_count": 3, "_name": "brother" }, { "@id": "http://schema.org/address", "_count": 3, "_name": "address", "_properties": [ { "@id": "http://schema.org/zipcode", "_count": 3, "_name": "zipcode" }, { "@id": "http://schema.org/street", "_count": 3, "_name": "street" } ] } ] }
Fetch progress
GET /v1/graph-analytics/{org_label}/{project_label}/progress
It returns:
- the dateTime of the latest consumed event (
lastProcessedEventDateTime
). - the number of consumed events (
eventsCount
). - the number of consumed resources (
resourcesCount
). A resource might be made of multiple events (create, update, deprecate), so this number will always be smaller or equal toeventsCount
.
Example
- Request
-
source
curl "http://localhost:8080/v1/graph-analytics/myorg/myproj/progress"
- Response
-
source
{ "@context": "https://bluebrain.github.io/nexus/contexts/statistics.json", "delayInSeconds": 0, "discardedEvents": 0, "evaluatedEvents": 8, "failedEvents": 0, "lastEventDateTime": "2021-09-21T06:44:24.530Z", "lastProcessedEventDateTime": "2021-09-21T06:44:24.530Z", "processedEvents": 8, "remainingEvents": 0, "totalEvents": 8 }
Internals
In order to implement the described endpoints we needed a way to transform our data so that it would answer the desired questions in a performant manner.
The proposed solution was to stream our data, transform it and push it to a dedicated ElasticSearch index (one index per project). Then at query time we can run term aggregations in order to get the desired counts.
Document structure
An example of the ElasticSearch Document looks as follows:
{
"@type": "http://schema.org/Person",
"@id": "http://example.com/Anna",
"properties": [
{
"path": "http://schema.org/address",
"isInArray": false
},
{
"dataType": "string",
"path": "http://schema.org/address / http://schema.org/street",
"isInArray": false
},
{
"dataType": "numeric",
"path": "http://schema.org/address / http://schema.org/zipcode",
"isInArray": false
},
{
"@id": "http://example.com/Robert",
"path": "http://schema.org/brother",
"isInArray": false
},
{
"dataType": "string",
"path": "http://schema.org/givenName",
"isInArray": false
},
{
"path": "http://schema.org/studies",
"isInArray": true
},
{
"dataType": "string",
"path": "http://schema.org/studies / http://schema.org/name",
"isInArray": true
},
{
"path": "http://schema.org/studies",
"isInArray": true
},
{
"dataType": "string",
"path": "http://schema.org/studies / http://schema.org/name",
"isInArray": true
}
],
"relationshipCandidates": [
{
"found": true,
"@id": "http://example.com/Robert",
"path": "http://schema.org/brother",
"isInArray": false
}
],
"relationships": [
{
"@id": "http://example.com/Robert",
"@type": "http://schema.org/Person",
"path": "http://schema.org/brother",
"isInArray": false
}
]
}
… where:
properties
- Json Object Array: A flat collection of fields present on a resource.relationshipCandidates
- Json Object Array: A flat collection of fields present on a resource that could be potential candidates for relationships (they do have an @id).relationships
- Json Object Array: A flat collection of @id(s) that have been found in other resources.path
- String: The flat expanded path of a field present on a resource. A path of an embedded field will be encoded as follows:parent / child
.isInArray
- Boolean: Flag to inform whether the current path (or its parent) is part of an array.dataType
- String: The type of the value present in the current path. Possible values are: string, numeric and booleanfound
- Boolean: Flag to inform whether a path insiderelationshipCandidates
is now promoted torelationships
.