Quick Start

Overview

This example-driven tutorial presents 5 steps to get started with Blue Brain Nexus to build and query a simple knowledge graph. The goal is to go over some capabilities of Blue Brain Nexus enabling:

  • The creation of a project as a protected data space to work with
  • An easy ingestion of dataset and management of it’s lifecycle
  • Querying a dataset to retrieve various information
  • Sharing a dataset by making it public

For that we will work with the small version of the MovieLens dataset containing a set of movies (movies.csv) along with their ratings (ratings.csv) and tags (tags.csv) made by users. An overview of this dataset can be found here.

Note

Let’s get started.

Set up

Install and setup the Nexus CLI

Since the CLI is written in python, you may want to create a virtual environment for a clean set up. To do so, Conda can be used. If you don’t have it installed follow the instructions here.

conda create -n nexus-cli python=3.5
conda activate nexus-cli
pip install git+https://github.com/BlueBrain/nexus-cli

Create and select a ‘tutorial’ profile

To ease the usage of the CLI, we will create a profile named ‘tutorial’ storing locally various configurations such as the Nexus deployment url.

Command
nexus profiles create tutorial https://sandbox.bluebrainnexus.io/v1 && nexus profiles list
Full source at GitHub
Output
Profile created.
+------------+----------+-------------------------------------+-------+
| Profile    | Selected | URL                                 | Token |
+------------+----------+-------------------------------------+-------+
| tutorial   |   Yes    | https://sandbox.bluebrainnexus.io/v1         |  None |
+------------+----------+-------------------------------------+-------+
Full source at GitHub

Let select the tutorial profile we just created.

Command
nexus profiles select tutorial
Full source at GitHub
Output
Selected profile: tutorial
+------------+----------+-------------------------------------+-------+
| Profile    | Selected | URL                                 | Token |
+------------+----------+-------------------------------------+-------+
| tutorial   |   Yes    | https://sandbox.bluebrainnexus.io/v1         |  None |
+------------+----------+-------------------------------------+-------+
Full source at GitHub

Login

A bearer token is needed to authenticate to Nexus. For the purpose of this tutorial, you’ll login using your github account.

Note
  • If you don’t have a github account, please follow the instructions on this page to create one.

The following command will open a browser window from where you can login using your github account.

Command
nexus auth login
Full source at GitHub
Output
A browser window will now open, please login, copy your token and use the auth set-token command to store it in the CLI
Press ENTER to continue...
Full source at GitHub

From the opened web page, click on the login button on the right corner and follow the instructions.

login-ui

At the end you’ll see a token button on the right corner. Click on it to copy the token.

login-ui

The token can now be added to the tutorial profile. In the output of the following command you should see that the token column has now an expiry date.

Command
nexus auth set-token $TOKEN && nexus profiles list
Full source at GitHub
Output
+--------------+----------+-------------------------------------+-----------------------------+
| Profile      | Selected | URL                                 |            Token            |
+--------------+----------+-------------------------------------+-----------------------------+
| tutorial     |   Yes    | https://sandbox.bluebrainnexus.io/v1         | Expiry: YYYY-MM-DD HH:mm:s |
+--------------+----------+-------------------------------------+-----------------------------+
Full source at GitHub

Create a project

Projects in BlueBrain Nexus are spaces where data can be:

  • managed: created, updated, deprecated, validated, secured;
  • accessed: directly by ids or through various search interfaces;
  • shared: through fine grain Access Control List.

A project is always created within an organization just like a git repository is created in a github organization. Organizations can be understood as accounts hosting multiple projects.

Select an organization

Note

A public organization named tutorialnexus is already created for the purpose of this tutorial. All projects will be created under this organization.

The following command should list the organizations you have access to. The tutorialnexus organization should be listed and tagged as non-deprecated in the output.

Command
nexus orgs list
Full source at GitHub
Output
+----------------+-------------------+-----------------------------------------------------+------------+
| Name           | Description       | Id                                                  | Deprecated |
+----------------+-------------------+-----------------------------------------------------+------------+
| tutorialnexus  | Nexus sandbox     | https://sandbox.bluebrainnexus.io/v1/tutorialnexus     | False      |
Full source at GitHub

Let select the tutorialnexus organization.

Command
nexus orgs select tutorialnexus
Full source at GitHub
Output
tutorialnexus organization selected.
Full source at GitHub

Create a project

A project is created with a label and within an organization. The label should be made of alphanumerical characters and its length should be between 3 and 32 (it should match the regex: [a-zA-Z0-9-_]{3,32}).

Pick a label (hereafter referred to as $PROJECTLABEL) and create a project using the following command. It is recommended to use your username to avoid collision of projects labels within an organization.

Command
nexus projects create $PROJECTLABEL && nexus projects list
Full source at GitHub
Output
Project created (id: https://sandbox.bluebrainnexus.io/v1/projects/tutorialnexus/$PROJECTLABEL)
+---------------+-------------+------------------------------------------------------------------------+------------+
| Label         | Description | Id                                                                     | Deprecated |
+---------------+-------------+------------------------------------------------------------------------+------------+
| $PROJECTLABEL |             | https://sandbox.bluebrainnexus.io/v1/projects/tutorialnexus/$PROJECTLABEL | False      |
+---------------+-------------+------------------------------------------------------------------------+------------+
Full source at GitHub

By default, created projects are private meaning that only the project creator (you) has read and write access to it. We’ll see below how to make a project public.

The output of the previous command shows the list of projects you have read access to. The project you just created should be the only one listed at this point. Let select it.

Command
nexus projects select $PROJECTLABEL && nexus projects list
Full source at GitHub
Output
$PROJECTLABEL project selected
+---------------+-------------+------------------------------------------------------------------------+------------+
| Label         | Description | Id                                                                     | Deprecated |
+---------------+-------------+------------------------------------------------------------------------+------------+
| $PROJECTLABEL |             | https://sandbox.bluebrainnexus.io/v1/projects/tutorialnexus/$PROJECTLABEL | False      |
+---------------+-------------+------------------------------------------------------------------------+------------+
Full source at GitHub

We are all set to bring some data within the project we just created.

Ingest data

Download the dataset

The MovieLens dataset can be downloaded either directly on a browser or using a curl command as shown below.

The following command download, unzip the dataset in the folder ~/ml-latest-small and list the files. The downloaded MovieLens dataset is made of four csv files as shown in the output tab.

Command
cd ~ && curl -s -O http://files.grouplens.org/datasets/movielens/ml-latest-small.zip && unzip -qq ml-latest-small.zip && cd ml-latest-small && ls
Full source at GitHub
Output
README.txt	links.csv	movies.csv	ratings.csv	tags.csv
Full source at GitHub

Load the dataset

Let first load the movies and merge them with the links.

nexus resources create -f ~/ml-latest-small/movies.csv -t Movie --format csv --idcolumn movieId --mergewith ~/ml-latest-small/links.csv --mergeon movieId --max-connections 4

Then we can load the tags.

nexus resources create -f ~/ml-latest-small/tags.csv -t Tag --format csv --max-connections 50

And finally load the ratings. Loading 100837 resources might take some time and also it is not needed to load them all to follow this tutorial. The maximum number of concurrent connections (–max-connections) can be increased for better loading performance.

nexus resources create -f ~/ml-latest-small/ratings.csv -t Rating --format csv --max-connections 50

Access data

View data in Nexus Web

Nexus is deployed with a web application allowing to browse organizations, projects, data and schemas you have access to. You can go to the address https://sandbox.bluebrainnexus.io/web and browse the data you just loaded.

List data

The simplest way to accessed data within Nexus is by listing them. The following command lists 5 resources:

Command
nexus resources list --size 5
Full source at GitHub
Output
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
| Id                                                                                 | Type                                                                       | Revision | Deprecated |
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_1  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_9  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_12 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_7  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_8  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1        | False      |
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
Full source at GitHub

The full payload of the resources are not retrieved when listing them: only identifier, type as well as Nexus added metadata are. But the result list can be scrolled and each resource fetched by identifier.

Command
nexus resources fetch https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_1
Full source at GitHub
Output
{
  "@context": [
    {
      "@base": "https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/",
      "@vocab": "https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/"
    },
    "https://bluebrain.github.io/nexus/contexts/resource.json"
  ],
  "@id": "Movie_1",
  "@type": "Movie",
  "genres": "Adventure|Animation|Children|Comedy|Fantasy",
  "imdbId": 114709,
  "movieId": 1,
  "title": "Toy Story (1995)",
  "tmdbId": 862.0,
  "_self": "../../resource/https%3A%2F%2Fbbp.epfl.ch%2Fnexus%2Fv1%2Fresources%2Ftutorialnexus%2F$PROJECTLABEL%2F_%2FMovie_1",
  "_constrainedBy": "nxs:resource.json",
  "_project": "../../../../../projects/tutorialnexus/$PROJECTLABEL",
  "_createdAt": "2019-01-17T10:32:02.221Z",
  "_createdBy": "....",
  "_updatedAt": "2019-01-17T10:32:02.221Z",
  "_updatedBy": "....",
  "_rev": 1,
  "_deprecated": false
}
Full source at GitHub

Whenever a resource is created, Nexus injects some useful metadata. The table below details some of them:

Metadata Description Value Type
@id Generated resource identifier. The user can provide its own identifier. URI
@type The type of the resource if provided by the user. URI
_self The resource address within Nexus. It contains the resource management details such as the organization, the project and the schema. URI
_createdAt The resource creation date. DateTime
_createdBy The resource creator. DateTime

Note that Nexus uses JSON-LD as data exchange format.

Filters are available to list specific resources. For example a list of resources of type Rating can be retrieved by running the following command:

Command
nexus resources list --type Rating --size 5
Full source at GitHub
Output
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
| Id                                                                                 | Type                                                                       | Revision | Deprecated |
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_1  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_9  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_12 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_7  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1        | False      |
| https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_8  | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1        | False      |
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
Full source at GitHub
Listing with various filters using the CLI

As an exercise try to filter by tag to only retrieve resources of type Tag.

Query data

Listing is usually not enough to select specific subset of data. Data ingested within each project can be searched through two complementary search interfaces called views.

View Description
ElasticSearchView Exposes data in ElasticSearch a document oriented search engine and provide access to it using the ElasticSearch query language.
SparqlView Exposes data as a graph and allows to navigate and explore the data using the W3C Sparql query language.

Query data using the ElasticSearchView

The ElasticSearchView URL is available at the address [https://sandbox.bluebrainnexus.io/v1/views/tutorialnexus/$PROJECTLABEL/documents/_search].

Select queries
# Select 5 ratings sorted by creation date in descending order
nexus views query-es --data \
'{
     "size":5,
     "sort" : [
       {
        "_createdAt" : {"order" : "desc"}
       }
     ],
     "query": {
     	"terms" : {"@type":["https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating"]}
     }
 }'
Full source at GitHub
Graph navigation queries
# Not possible by default.
# Relationships handling needs to be made explicit to ElasticSearch through a mapping.

Full source at GitHub

Query data using the SparqlView

The SparqlView is available at the address [https://sandbox.bluebrainnexus.io/v1/views/tutorialnexus/$PROJECTLABEL/graph/sparql]. The following diagram shows how the MovieLens data is structured in the default Nexus SparqlView. Note that the ratings, tags and movies are joined by the movieId property.

Movielens-default_nexus_graph.png

Select queries
# Select 5 ratings sorted by creation date in descending order
nexus views query-sparql --data \
'
PREFIX vocab: <https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/>
PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>

Select ?userId ?moviedId ?rating ?createdAt
 WHERE  {

    ?ratingNode a vocab:Rating.
    ?ratingNode nxv:createdAt ?createdAt.
    ?ratingNode vocab:userId  ?userId.
    ?ratingNode vocab:movieId ?moviedId.
    ?ratingNode vocab:rating  ?rating.
}
ORDER BY DESC (?creationDate)
LIMIT 5'
Full source at GitHub
Graph navigation queries
# Average rating score for movies tagged as funny
nexus views query-sparql --data \
'
PREFIX vocab: <https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/>
PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>

Select (AVG(?ratingvalue) AS ?score)
 WHERE  {
    # Select movies
    ?movie a vocab:Movie.

    # Select their movieId values
    ?movie vocab:movieId ?movieId.

    # Keep movies with 'funny' tags
    ?tag a vocab:Tag.
    ?tag vocab:movieId ?movieId.
    ?tag vocab:tag ?tagvalue.
    FILTER(?tagvalue = "funny").
    
    # Keep movies with ratings
    ?rating a vocab:Rating.
    ?rating vocab:movieId ?ratingmovieId.
    FILTER(xsd:integer(?ratingmovieId) = xsd:integer(?movieId))
    ?rating vocab:rating ?ratingvalue.

}'
Full source at GitHub

Share data

Making a dataset public means granting read permissions to “anonymous” user.

$ nexus acls make-public

To check that the dataset is now public:

  • Ask the person next to you to list resources in your project.
  • Or create and select another profile named public-tutorial (following the instructions in the Set up. You should see the that the public-tutorial is selected and its corresponding token column is None.
Output
Selected profile: tutorial
+-------------------+----------+-------------------------------------+------------------+
| Profile           | Selected | URL                                 |       Token      |
+-------------------+----------+-------------------------------------+------------------+
| tutorial          |          | https://sandbox.bluebrainnexus.io/v1         |  Expiry: 2019... |
| public-tutorial   |   Yes    | https://sandbox.bluebrainnexus.io/v1         |       None       |
+-------------------+----------+-------------------------------------+------------------+
Full source at GitHub
  • Resources in your project should be listed with the command even though you are not authenticated.
Command
nexus resources list --size 5 -o tutorialnexus -p $PROJECTLABEL
Full source at GitHub