You are browsing the docs for the snapshot version of Nexus, the latest release is available here
Graph analytics
Graph analytics is a feature introduced by the graph-analytics
plugin and rooted in the /v1/graph-analytics/{org_label}/{project_label}
collection.
It runs for each project and it parses and breaks down non-deprecated resources to analyse their structure. For each of these resources, it extracts the following information:
- Its properties: their path and the type of the associated value
- Its relationships, that is to say the other resources in the same project it points to.
When reading graph analytics, the caller must have resources/read
permissions on the current path of the project or the ancestor paths.
Please visit Authentication & authorization section to learn more about it.
Graph analytics indexing routines are also subject to passivation.
Please visit Passivation section to learn more about it.
The described endpoints are experimental and the responses structure might change in the future.
Fetch relationships
Obtains all the @type relationships and their counts - defined by nodes and edges - existing on a project.
The edges are the properties linking different nodes, and the nodes are the resources containing a certain @type.
GET /v1/graph-analytics/{org_label}/{project_label}/relationships
Example
- Request
-
source
curl "http://localhost:8080/v1/graph-analytics/myorg/myproj/relationships"
- Response
-
source
{ "@context": "https://bluebrain.github.io/nexus/contexts/relationships.json", "_edges": [ { "_count": 2, "_path": [ { "@id": "http://schema.org/brother", "_name": "brother" } ], "_source": "http://schema.org/Person", "_target": "http://schema.org/Person" } ], "_nodes": [ { "@id": "http://schema.org/Person", "_count": 3, "_name": "Person" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/View", "_count": 2, "_name": "View" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/DiskStorage", "_count": 1, "_name": "DiskStorage" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/ElasticSearchView", "_count": 1, "_name": "ElasticSearchView" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/InProject", "_count": 1, "_name": "InProject" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/Project", "_count": 1, "_name": "Project" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/Resolver", "_count": 1, "_name": "Resolver" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/SparqlView", "_count": 1, "_name": "SparqlView" }, { "@id": "https://bluebrain.github.io/nexus/vocabulary/Storage", "_count": 1, "_name": "Storage" } ] }
Fetch properties
Obtains all the @type properties and their counts.
The different between properties and relationships is that properties are enclosed inside the same resource, while relationships are statements between different resources.
GET /v1/graph-analytics/{org_label}/{project_label}/properties/{type}
…where {type}
is an IRI defining for which @type we want to retrieve the properties.
Example
- Request
-
source
curl "http://localhost:8080/v1/graph-analytics/myorg/myproj/properties/schema:Person"
- Response
-
source
{ "@context": "https://bluebrain.github.io/nexus/contexts/properties.json", "@id": "http://schema.org/Person", "_count": 3, "_name": "Person", "_properties": [ { "@id": "http://schema.org/givenName", "_count": 3, "_name": "givenName" }, { "@id": "http://schema.org/brother", "_count": 3, "_name": "brother" }, { "@id": "http://schema.org/address", "_count": 3, "_name": "address", "_properties": [ { "@id": "http://schema.org/zipcode", "_count": 3, "_name": "zipcode" }, { "@id": "http://schema.org/street", "_count": 3, "_name": "street" } ] } ] }
Fetch progress
GET /v1/graph-analytics/{org_label}/{project_label}/progress
It returns:
- the dateTime of the latest consumed event (
lastProcessedEventDateTime
). - the number of consumed events (
eventsCount
). - the number of consumed resources (
resourcesCount
). A resource might be made of multiple events (create, update, deprecate), so this number will always be smaller or equal toeventsCount
.
Example
- Request
-
source
curl "http://localhost:8080/v1/graph-analytics/myorg/myproj/progress"
- Response
-
source
{ "@context": "https://bluebrain.github.io/nexus/contexts/statistics.json", "delayInSeconds": 0, "discardedEvents": 0, "evaluatedEvents": 8, "failedEvents": 0, "lastEventDateTime": "2021-09-21T06:44:24.530Z", "lastProcessedEventDateTime": "2021-09-21T06:44:24.530Z", "processedEvents": 8, "remainingEvents": 0, "totalEvents": 8 }
Search
POST /v1/graph-analytics/{org_label}/{project_label}/_search
{...}
Search documents that are in a given project’s Graph Analytics view.
The supported payload is defined on the ElasticSearch documentation.
Example
- Request
-
source
curl -XPOST \ -H "Content-Type: application/json" \ "http://localhost:8080/v1/graph-analytics/myorg/myproj/_search" -d \ '{ "query": { "term": { "@id": "https://example.com/person" } } }'
- Response
-
source
{ "hits": { "hits": [ { "_id": "http://example.com/person", "_index": "delta_ga_myorg_myproj", "_score": 1.3121864, "_source": { "@id": "http://example.com/person", "@type": "http://schema.org/Person", "properties": [ { "@id": "http://example.com/epfl", "dataType": "object", "isInArray": false, "path": "http://schema.org/worksFor" } ], "references": [ { "@id": "http://example.com/epfl", "@type": [ "http://schema.org/EducationalOrganization" ], "dataType": "object", "found": true, "isInArray": false, "path": "http://schema.org/worksFor" } ], "relationships": [ { "@id": "http://example.com/epfl", "@type": [ "http://schema.org/EducationalOrganization" ], "dataType": "object", "isInArray": false, "path": "http://schema.org/worksFor" } ], "remoteContexts": [ { "@type": "ProjectRemoteContextRef", "iri": "https://bbp.epfl.ch/contexts/person", "resource": { "id": "https://bbp.epfl.ch/contexts/person", "project": "myorg/myproj", "rev": 1 } } ], "_createdAt": "2023-08-08T15:49:14.081Z", "_createdBy": { "@type": "User", "realm": "internal", "subject": "delta" }, "_deprecated": false, "_project": "myorg/myproj", "_rev": 1, "_updatedAt": "2023-08-08T15:49:14.081Z", "_updatedBy": { "@type": "User", "realm": "internal", "subject": "delta" } } } ], "max_score": 1.3121864, "total": { "relation": "eq", "value": 1 } }, "timed_out": false, "took": 0, "_shards": { "failed": 0, "skipped": 0, "successful": 1, "total": 1 } }
Internals
In order to implement the described endpoints we needed a way to transform our data so that it would answer the desired questions in a performant manner.
The proposed solution was to stream our data, transform it and push it to a dedicated ElasticSearch index (one index per project). Then at query time we can run term aggregations in order to get the desired counts.
Document structure
An example of the ElasticSearch Document looks as follows:
{
"@id": "http://example.com/Anna",
"@type": "http://schema.org/Person",
"_project": "myorg/myproject",
"_rev": 4,
"_deprecated": false,
"_createdAt": "2023-06-01T00:00:00Z",
"_createdBy": { "@type": "User", "realm": "bbp", "subject": "Bob" },
"_updatedAt": "2023-06-12T00:00:00Z",
"_updatedBy": { "@type": "User", "realm": "bbp", "subject": "Alice" },
"properties": [
{
"dataType": "object",
"path": "http://schema.org/address",
"isInArray": false
},
{
"dataType": "string",
"path": "http://schema.org/address / http://schema.org/street",
"isInArray": false
},
{
"dataType": "number",
"path": "http://schema.org/address / http://schema.org/zipcode",
"isInArray": false
},
{
"dataType": "object",
"@id": "http://example.com/Robert",
"path": "http://schema.org/brother",
"isInArray": false
},
{
"dataType": "string",
"path": "http://schema.org/givenName",
"isInArray": false
},
{
"dataType": "object",
"path": "http://schema.org/studies",
"isInArray": true
},
{
"dataType": "string",
"path": "http://schema.org/studies / http://schema.org/name",
"isInArray": true
}
],
"references": [
{
"found": true,
"@id": "http://example.com/Robert"
}
],
"relationships": [
{
"dataType": "object",
"@id": "http://example.com/Robert",
"@type": "http://schema.org/Person",
"path": "http://schema.org/brother",
"isInArray": false
}
],
"remoteContexts": [
{
"@type": "ProjectRemoteContextRef",
"iri": "https://bbp.epfl.ch/contexts/person",
"resource": {
"id": "https://bbp.epfl.ch/contexts/person",
"project": "myorg/myproject",
"rev": 1
}
}
]
}
… where:
properties
- Json Object Array: A flat collection of fields present on a resource.references
- Json Object Array: A flat collection of fields present on a resource that could be potential candidates for relationships (they do have an @id).relationships
- Json Object Array: A flat collection of @id(s) that have been found in other resources in the same project.path
- String: The flat expanded path of a field present on a resource. A path of an embedded field will be encoded as follows:parent / child
.isInArray
- Boolean: Flag to inform whether the current path (or its parent) is part of an array.dataType
- String: The type of the value present in the current path. Possible values are: string, numeric and booleanfound
- Boolean: Flag to inform whether an @id insidereferences
has been resolved as a relationship.remoteContexts
- Json Object Array: A collection of remote contexts detected during the JSON-LD resolution for this resource. See the Resources - Fetch remote contexts operation to learn about the remote context types.