Graph analytics

Graph analytics is a feature introduced by the graph-analytics plugin and rooted in the /v1/graph-analytics/{org_label}/{project_label} collection.

It runs for each project and it parses and breaks down non-deprecated resources to analyse their structure. For each of these resources, it extracts the following information:

  • Its properties: their path and the type of the associated value
  • Its relationships, that is to say the other resources in the same project it points to.
Authorization notes

When reading graph analytics, the caller must have resources/read permissions on the current path of the project or the ancestor paths.

Please visit Authentication & authorization section to learn more about it.

Note

The described endpoints are experimental and the responses structure might change in the future.

Fetch relationships

Obtains all the @type relationships and their counts - defined by nodes and edges - existing on a project.

The edges are the properties linking different nodes, and the nodes are the resources containing a certain @type.

GET /v1/graph-analytics/{org_label}/{project_label}/relationships

Example

Request
sourcecurl "http://localhost:8080/v1/graph-analytics/myorg/myproj/relationships"
Response
source{
  "@context": "https://bluebrain.github.io/nexus/contexts/relationships.json",
  "_edges": [
    {
      "_count": 2,
      "_path": [
        {
          "@id": "http://schema.org/brother",
          "_name": "brother"
        }
      ],
      "_source": "http://schema.org/Person",
      "_target": "http://schema.org/Person"
    }
  ],
  "_nodes": [
    {
      "@id": "http://schema.org/Person",
      "_count": 3,
      "_name": "Person"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/View",
      "_count": 2,
      "_name": "View"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/DiskStorage",
      "_count": 1,
      "_name": "DiskStorage"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/ElasticSearchView",
      "_count": 1,
      "_name": "ElasticSearchView"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/InProject",
      "_count": 1,
      "_name": "InProject"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/Project",
      "_count": 1,
      "_name": "Project"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/Resolver",
      "_count": 1,
      "_name": "Resolver"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/SparqlView",
      "_count": 1,
      "_name": "SparqlView"
    },
    {
      "@id": "https://bluebrain.github.io/nexus/vocabulary/Storage",
      "_count": 1,
      "_name": "Storage"
    }
  ]
}

Fetch properties

Obtains all the @type properties and their counts.

The different between properties and relationships is that properties are enclosed inside the same resource, while relationships are statements between different resources.

GET /v1/graph-analytics/{org_label}/{project_label}/properties/{type}

…where {type} is an IRI defining for which @type we want to retrieve the properties.

Example

Request
sourcecurl "http://localhost:8080/v1/graph-analytics/myorg/myproj/properties/schema:Person"
Response
source{
  "@context": "https://bluebrain.github.io/nexus/contexts/properties.json",
  "@id": "http://schema.org/Person",
  "_count": 3,
  "_name": "Person",
  "_properties": [
    {
      "@id": "http://schema.org/givenName",
      "_count": 3,
      "_name": "givenName"
    },
    {
      "@id": "http://schema.org/brother",
      "_count": 3,
      "_name": "brother"
    },
    {
      "@id": "http://schema.org/address",
      "_count": 3,
      "_name": "address",
      "_properties": [
        {
          "@id": "http://schema.org/zipcode",
          "_count": 3,
          "_name": "zipcode"
        },
        {
          "@id": "http://schema.org/street",
          "_count": 3,
          "_name": "street"
        }
      ]
    }
  ]
}

Fetch progress

GET /v1/graph-analytics/{org_label}/{project_label}/progress

It returns:

  • the dateTime of the latest consumed event (lastProcessedEventDateTime).
  • the number of consumed events (eventsCount).
  • the number of consumed resources (resourcesCount). A resource might be made of multiple events (create, update, deprecate), so this number will always be smaller or equal to eventsCount.

Example

Request
sourcecurl "http://localhost:8080/v1/graph-analytics/myorg/myproj/progress"
Response
source{
  "@context": "https://bluebrain.github.io/nexus/contexts/statistics.json",
  "delayInSeconds": 0,
  "discardedEvents": 0,
  "evaluatedEvents": 8,
  "failedEvents": 0,
  "lastEventDateTime": "2021-09-21T06:44:24.530Z",
  "lastProcessedEventDateTime": "2021-09-21T06:44:24.530Z",
  "processedEvents": 8,
  "remainingEvents": 0,
  "totalEvents": 8
}

Search

POST /v1/graph-analytics/{org_label}/{project_label}/_search
  {...}

Search documents that are in a given project’s Graph Analytics view.

The supported payload is defined on the ElasticSearch documentation.

Example

Request
sourcecurl -XPOST \
-H "Content-Type: application/json" \
"http://localhost:8080/v1/graph-analytics/myorg/myproj/_search" -d \
'{
  "query": {
    "term": {
      "@id": "https://example.com/person"
    }
  }
}'
Response
source{
  "hits": {
    "hits": [
      {
        "_id": "http://example.com/person",
        "_index": "delta_ga_myorg_myproj",
        "_score": 1.3121864,
        "_source": {
          "@id": "http://example.com/person",
          "@type": "http://schema.org/Person",
          "properties": [
            {
              "@id": "http://example.com/epfl",
              "dataType": "object",
              "isInArray": false,
              "path": "http://schema.org/worksFor"
            }
          ],
          "references": [
            {
              "@id": "http://example.com/epfl",
              "@type": [
                "http://schema.org/EducationalOrganization"
              ],
              "dataType": "object",
              "found": true,
              "isInArray": false,
              "path": "http://schema.org/worksFor"
            }
          ],
          "relationships": [
            {
              "@id": "http://example.com/epfl",
              "@type": [
                "http://schema.org/EducationalOrganization"
              ],
              "dataType": "object",
              "isInArray": false,
              "path": "http://schema.org/worksFor"
            }
          ],
          "remoteContexts": [
            {
              "@type": "ProjectRemoteContextRef",
              "iri": "https://bbp.epfl.ch/contexts/person",
              "resource": {
                "id": "https://bbp.epfl.ch/contexts/person",
                "project": "myorg/myproj",
                "rev": 1
              }
            }
          ],
          "_createdAt": "2023-08-08T15:49:14.081Z",
          "_createdBy": {
            "@type": "User",
            "realm": "internal",
            "subject": "delta"
          },
          "_deprecated": false,
          "_project": "myorg/myproj",
          "_rev": 1,
          "_updatedAt": "2023-08-08T15:49:14.081Z",
          "_updatedBy": {
            "@type": "User",
            "realm": "internal",
            "subject": "delta"
          }
        }
      }
    ],
    "max_score": 1.3121864,
    "total": {
      "relation": "eq",
      "value": 1
    }
  },
  "timed_out": false,
  "took": 0,
  "_shards": {
    "failed": 0,
    "skipped": 0,
    "successful": 1,
    "total": 1
  }
}

Internals

In order to implement the described endpoints we needed a way to transform our data so that it would answer the desired questions in a performant manner.

The proposed solution was to stream our data, transform it and push it to a dedicated ElasticSearch index (one index per project). Then at query time we can run term aggregations in order to get the desired counts.

Document structure

An example of the ElasticSearch Document looks as follows:

{
  "@id": "http://example.com/Anna",
  "@type": "http://schema.org/Person",
  "_project": "myorg/myproject",
  "_rev": 4,
  "_deprecated": false,
  "_createdAt": "2023-06-01T00:00:00Z",
  "_createdBy": { "@type": "User", "realm": "bbp",  "subject": "Bob" },
  "_updatedAt": "2023-06-12T00:00:00Z",
  "_updatedBy": { "@type": "User", "realm": "bbp",  "subject": "Alice" },
  "properties": [
    {
      "dataType": "object",
      "path": "http://schema.org/address",
      "isInArray": false
    },
    {
      "dataType": "string",
      "path": "http://schema.org/address / http://schema.org/street",
      "isInArray": false
    },
    {
      "dataType": "number",
      "path": "http://schema.org/address / http://schema.org/zipcode",
      "isInArray": false
    },
    {
      "dataType": "object",
      "@id": "http://example.com/Robert",
      "path": "http://schema.org/brother",
      "isInArray": false
    },
    {
      "dataType": "string",
      "path": "http://schema.org/givenName",
      "isInArray": false
    },
    {
      "dataType": "object",
      "path": "http://schema.org/studies",
      "isInArray": true
    },
    {
      "dataType": "string",
      "path": "http://schema.org/studies / http://schema.org/name",
      "isInArray": true
    }
  ],
  "references": [
    {
      "found": true,
      "@id": "http://example.com/Robert"
    }
  ],
  "relationships": [
    {
      "dataType": "object",
      "@id": "http://example.com/Robert",
      "@type": "http://schema.org/Person",
      "path": "http://schema.org/brother",
      "isInArray": false
    }
  ],
  "remoteContexts": [
    {
      "@type": "ProjectRemoteContextRef",
      "iri": "https://bbp.epfl.ch/contexts/person",
      "resource": {
        "id": "https://bbp.epfl.ch/contexts/person",
        "project": "myorg/myproject",
        "rev": 1
      }
    }
  ]
}

… where:

  • properties - Json Object Array: A flat collection of fields present on a resource.
  • references - Json Object Array: A flat collection of fields present on a resource that could be potential candidates for relationships (they do have an @id).
  • relationships - Json Object Array: A flat collection of @id(s) that have been found in other resources in the same project.
  • path - String: The flat expanded path of a field present on a resource. A path of an embedded field will be encoded as follows: parent / child.
  • isInArray - Boolean: Flag to inform whether the current path (or its parent) is part of an array.
  • dataType - String: The type of the value present in the current path. Possible values are: string, numeric and boolean
  • found - Boolean: Flag to inform whether an @id inside references has been resolved as a relationship.
  • remoteContexts - Json Object Array: A collection of remote contexts detected during the JSON-LD resolution for this resource. See the Resources - Fetch remote contexts operation to learn about the remote context types.