Graph analytics

Graph analytics are a functionality introduced by the graph-analytics plugin and rooted in the /v1/graph-analytics/{org_label}/{project_label} collection. They provide ways to get insights about the data and their relationships in terms of types (@type field).

Authorization notes

When reading graph analytics, the caller must have resources/read permissions on the current path of the project or the ancestor paths.

Please visit Authentication & authorization section to learn more about it.

Note

The described endpoints are experimental and the responses structure might change in the future.

Fetch relationships

Obtains all the @type relationships and their counts - defined by nodes and edges - existing on a project.

The edges are the properties linking different nodes, and the nodes are the resources containing a certain @type.

GET /v1/graph-analytics/{org_label}/{project_label}/relationships

Example

Request
sourcecurl "http://localhost:8080/v1/graph-analytics/myorg/myproj/relationships"
Response
source{
  "@context": "https://bluebrain.github.io/nexus/contexts/relationships.json",
  "_edges": [
    {
      "_count": 2,
      "_path": [
        {
          "@id": "http://schema.org/brother",
          "_name": "brother"
        }
      ],
      "_source": "http://schema.org/Person",
      "_target": "http://schema.org/Person"
    }
  ],
  "_nodes": [
    {
      "@id": "http://schema.org/Person",
      "_count": 3,
      "_name": "Person"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/View",
      "_count": 2,
      "_name": "View"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/DiskStorage",
      "_count": 1,
      "_name": "DiskStorage"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/ElasticSearchView",
      "_count": 1,
      "_name": "ElasticSearchView"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/InProject",
      "_count": 1,
      "_name": "InProject"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/Project",
      "_count": 1,
      "_name": "Project"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/Resolver",
      "_count": 1,
      "_name": "Resolver"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/SparqlView",
      "_count": 1,
      "_name": "SparqlView"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/Storage",
      "_count": 1,
      "_name": "Storage"
    }
  ]
}

Fetch properties

Obtains all the @type properties and their counts.

The different between properties and relationships is that properties are enclosed inside the same resource, while relationships are statements between different resources.

GET /v1/graph-analytics/{org_label}/{project_label}/properties/{type}

…where {type} is an IRI defining for which @type we want to retrieve the properties.

Example

Request
sourcecurl "http://localhost:8080/v1/graph-analytics/myorg/myproj/properties/schema:Person"
Response
source{
  "@context": "https://bluebrain.github.io/nexus/contexts/properties.json",
  "@id": "http://schema.org/Person",
  "_count": 3,
  "_name": "Person",
  "_properties": [
    {
      "@id": "http://schema.org/givenName",
      "_count": 3,
      "_name": "givenName"
    },
    {
      "@id": "http://schema.org/brother",
      "_count": 3,
      "_name": "brother"
    },
    {
      "@id": "http://schema.org/address",
      "_count": 3,
      "_name": "address",
      "_properties": [
        {
          "@id": "http://schema.org/zipcode",
          "_count": 3,
          "_name": "zipcode"
        },
        {
          "@id": "http://schema.org/street",
          "_count": 3,
          "_name": "street"
        }
      ]
    }
  ]
}

Fetch progress

GET /v1/graph-analytics/{org_label}/{project_label}/progress

It returns:

  • the dateTime of the latest consumed event (lastProcessedEventDateTime).
  • the number of consumed events (eventsCount).
  • the number of consumed resources (resourcesCount). A resource might be made of multiple events (create, update, deprecate), so this number will always be smaller or equal to eventsCount.

Example

Request
sourcecurl "http://localhost:8080/v1/graph-analytics/myorg/myproj/progress"
Response
source{
  "@context": "https://bluebrain.github.io/nexus/contexts/statistics.json",
  "delayInSeconds": 0,
  "discardedEvents": 0,
  "evaluatedEvents": 8,
  "failedEvents": 0,
  "lastEventDateTime": "2021-09-21T06:44:24.530Z",
  "lastProcessedEventDateTime": "2021-09-21T06:44:24.530Z",
  "processedEvents": 8,
  "remainingEvents": 0,
  "totalEvents": 8
}

Internals

In order to implement the described endpoints we needed a way to transform our data so that it would answer the desired questions in a performant manner.

The proposed solution was to stream our data, transform it and push it to a dedicated ElasticSearch index (one index per project). Then at query time we can run term aggregations in order to get the desired counts.

Document structure

An example of the ElasticSearch Document looks as follows:

{
  "@type": "http://schema.org/Person",
  "@id": "http://example.com/Anna",
  "properties": [
    {
      "path": "http://schema.org/address",
      "isInArray": false
    },
    {
      "dataType": "string",
      "path": "http://schema.org/address / http://schema.org/street",
      "isInArray": false
    },
    {
      "dataType": "numeric",
      "path": "http://schema.org/address / http://schema.org/zipcode",
      "isInArray": false
    },
    {
      "@id": "http://example.com/Robert",
      "path": "http://schema.org/brother",
      "isInArray": false
    },
    {
      "dataType": "string",
      "path": "http://schema.org/givenName",
      "isInArray": false
    },
    {
      "path": "http://schema.org/studies",
      "isInArray": true
    },
    {
      "dataType": "string",
      "path": "http://schema.org/studies / http://schema.org/name",
      "isInArray": true
    },
    {
      "path": "http://schema.org/studies",
      "isInArray": true
    },
    {
      "dataType": "string",
      "path": "http://schema.org/studies / http://schema.org/name",
      "isInArray": true
    }
  ],
  "relationshipCandidates": [
    {
      "found": true,
      "@id": "http://example.com/Robert",
      "path": "http://schema.org/brother",
      "isInArray": false
    }
  ],
  "relationships": [
    {
      "@id": "http://example.com/Robert",
      "@type": "http://schema.org/Person",
      "path": "http://schema.org/brother",
      "isInArray": false
    }
  ]
}

… where:

  • properties - Json Object Array: A flat collection of fields present on a resource.
  • relationshipCandidates - Json Object Array: A flat collection of fields present on a resource that could be potential candidates for relationships (they do have an @id).
  • relationships - Json Object Array: A flat collection of @id(s) that have been found in other resources.
  • path - String: The flat expanded path of a field present on a resource. A path of an embedded field will be encoded as follows: parent / child.
  • isInArray - Boolean: Flag to inform whether the current path (or its parent) is part of an array.
  • dataType - String: The type of the value present in the current path. Possible values are: string, numeric and boolean
  • found - Boolean: Flag to inform whether a path inside relationshipCandidates is now promoted to relationships.