You are browsing the docs for Nexus v1.6.x, the latest release is available here
MovieLens Tutorial using the Nexus Python CLI
Overview
This example-driven tutorial presents 5 steps to get started with Blue Brain Nexus to build and query a simple knowledge graph. The goal is to go over some capabilities of Blue Brain Nexus enabling:
- The creation of a project as a protected data space to work in
- An easy ingestion of a dataset within a given project
- The listing and querying of a dataset
- Sharing a dataset by making it public
This quick start tutorial tutorial makes use of:
- an AWS deployment of Blue Brain Nexus available at https://sandbox.bluebrainnexus.io.
- Nexus Python CLI, a python Command Line Interface.
Let’s get started.
Set up
Install the Nexus Python CLI:
pip install git+https://github.com/BlueBrain/nexus-cli
Create a project
Projects in BlueBrain Nexus are spaces where data can be:
- managed: created, updated, deprecated, validated, secured;
- accessed: directly by ids or through various search interfaces;
- shared: through fine grain Access Control List.
A project is always created within an organization just like a git repository is created in a github organization. Organizations can be understood as accounts hosting multiple projects.
Select an organization
A public organization named tutorialnexus is already created for the purpose of this tutorial. All projects will be created under this organization.
The following command should list the organizations you have access to. The tutorialnexus organization should be listed and tagged as non-deprecated in the output.
- Command
-
source
nexus orgs list
- Output
-
source
+----------------+-------------------+-----------------------------------------------------+------------+ | Name | Description | Id | Deprecated | +----------------+-------------------+-----------------------------------------------------+------------+ | tutorialnexus | Nexus sandbox | https://sandbox.bluebrainnexus.io/v1/tutorialnexus | False |
Let select the tutorialnexus organization.
In case the tutorialnexus organization is not available, pick an organization label (value of $ORGLABEL) and create an organization using the following command:
- Command
-
source
nexus orgs create $ORGLABEL && nexus orgs select $ORGLABEL && nexus orgs list
- Output
-
source
Organization created (id: https://sandbox.bluebrainnexus.io/v1/orgs/$ORGLABEL) organization selected. +---------------+-------------+------------------------------------------------------------------------+------------+ | Label | Description | Id | Deprecated | +---------------+-------------+------------------------------------------------------------------------+------------+ | $ORGLABEL | | https://sandbox.bluebrainnexus.io/v1/orgs/$ORGLABEL | False | +---------------+-------------+------------------------------------------------------------------------+------------+
Create your own project
A project is created with a label and within an organization. The label should be made of alphanumerical characters and its length should be between 3 and 32 (it should match the regex: [a-zA-Z0-9-_]{3,32}).
Pick a label (hereafter referred to as $PROJECTLABEL) and create a project using the following command. It is recommended to use your username to avoid collision of projects labels within an organization.
- Command
-
source
nexus projects create $PROJECTLABEL && nexus projects list
- Output
-
source
Project created (id: https://sandbox.bluebrainnexus.io/v1/projects/tutorialnexus/$PROJECTLABEL) +---------------+-------------+------------------------------------------------------------------------+------------+ | Label | Description | Id | Deprecated | +---------------+-------------+------------------------------------------------------------------------+------------+ | $PROJECTLABEL | | https://sandbox.bluebrainnexus.io/v1/projects/tutorialnexus/$PROJECTLABEL | False | +---------------+-------------+------------------------------------------------------------------------+------------+
By default, created projects are private meaning that only the project creator (you) has read and write access to it. We’ll see below how to make a project public.
The output of the previous command shows the list of projects you have read access to. The project you just created should be the only one listed at this point. Let select it.
- Command
-
source
nexus projects select $PROJECTLABEL && nexus projects list
- Output
-
source
$PROJECTLABEL project selected +---------------+-------------+------------------------------------------------------------------------+------------+ | Label | Description | Id | Deprecated | +---------------+-------------+------------------------------------------------------------------------+------------+ | $PROJECTLABEL | | https://sandbox.bluebrainnexus.io/v1/projects/tutorialnexus/$PROJECTLABEL | False | +---------------+-------------+------------------------------------------------------------------------+------------+
We are all set to bring some data within the project we just created.
Ingest data
The CLI supports the ingestion of datasets in two formats: JSON and CSV.
Ingest JSON
Ingest JSON from a payload
- Command
-
source
nexus resources create -d \ '{ "movieId":"1", "title": "Toy Story (1995)", "genres": "Adventure|Animation|Children|Comedy|Fantasy" }'
- Note that ingesting a JSON array is not supported.
By default Nexus generates an identifier (in fact a URI) for a created resource as shown in the output of the above command. Furthermore, it is possible to provide:
- a specific identifier by setting the --id option
- and a type by setting the --type option
- Command
-
source
nexus resources create --id https://movies.com/movieId/1 \ --type https://schema.org/Movie -d \ '{ "movieId":"1", "title": "Toy Story (1995)", "genres": "Adventure|Animation|Children|Comedy|Fantasy" }'
- Output
-
source
Resource created (id: https://movies.com/movieId/1)
Identifiers and types can also be provided directly in the JSON payload using respectively: the @id and @type keys.
The created resource identified by https://movies.com/movieId/1 can then be fetched using the following command :
- Command
-
source
nexus resources fetch https://movies.com/movieId/1
- Output
-
source
{ "@context": [ { "@base": "https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/", "@vocab": "https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/" }, "https://bluebrain.github.io/nexus/contexts/resource.json" ], "@id": "https://movies.com/movieId/1", "@type": "https://schema.org/Movie", "genres": "Adventure|Animation|Children|Comedy|Fantasy", "movieId": "1", "title": "Toy Story (1995)", "_self": "https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/https%3A%2F%2Fmovies.com%2FmovieId%2F1", "_constrainedBy": "https://bluebrain.github.io/nexus/schemas/unconstrained.json", "_project": "https://sandbox.bluebrainnexus.io/v1/projects/tutorialnexus/$PROJECTLABEL", "_rev": 1, "_deprecated": false, "_createdAt": "2019-06-28T21:26:29.197Z" }
Ingest JSON from a file
A JSON payload can be ingested from a file.
nexus resources create --file /path/to/file.json
A directory (/path/to/dir) of JSON files can be ingested by using the following looping command:
find /path/to/dir -name '*.json' -exec nexus resources create --file {} \;
Ingested resources can be listed using the following command:
nexus resources list --size 10
Ingest CSV files
To illustrate how to load CSV files we will work with the small version of the MovieLens dataset containing a set of movies (movies.csv) along with their ratings (ratings.csv) and tags (tags.csv) made by users. An overview of this dataset can be found here.
Download the dataset
The MovieLens dataset can be downloaded either directly on a browser or using a curl command as shown below.
The following command download, unzip the dataset in the folder ~/ml-latest-small and list the files. The downloaded MovieLens dataset is made of four csv files as shown in the output tab.
- Command
-
source
cd ~ && curl -s -O http://files.grouplens.org/datasets/movielens/ml-latest-small.zip && unzip -qq ml-latest-small.zip && cd ml-latest-small && ls
- Output
-
source
README.txt links.csv movies.csv ratings.csv tags.csv
Load the dataset
Let first load the movies and merge them with the links.
nexus resources create -f ~/ml-latest-small/movies.csv -t Movie --format csv --idcolumn movieId --mergewith ~/ml-latest-small/links.csv --mergeon movieId --max-connections 4
Then we can load the tags.
nexus resources create -f ~/ml-latest-small/tags.csv -t Tag --format csv --max-connections 50
And finally load the ratings. Loading 100837 resources might take some time and also it is not needed to load them all to follow this tutorial. The maximum number of concurrent connections (–max-connections) can be increased for better loading performance.
nexus resources create -f ~/ml-latest-small/ratings.csv -t Rating --format csv --max-connections 50
Access data
View data in Nexus Web
Nexus is deployed with a web application allowing to browse organizations, projects, data and schemas you have access to. You can go to the address https://sandbox.bluebrainnexus.io/web and browse the data you just loaded.
List data
The simplest way to accessed data within Nexus is by listing them. The following command lists 5 resources:
- Command
-
source
nexus resources list --size 5
- Output
-
source
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+ | Id | Type | Revision | Deprecated | +------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+ | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_1 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_9 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_12 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_7 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_8 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Movie | 1 | False | +------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
The full payload of the resources are not retrieved when listing them: only identifier, type as well as Nexus added metadata are. But the result list can be scrolled and each resource fetched by identifier.
- Command
-
source
nexus resources fetch https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Movie_1
- Output
-
source
{ "@context": [ { "@base": "https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/", "@vocab": "https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/" }, "https://bluebrain.github.io/nexus/contexts/resource.json" ], "@id": "Movie_1", "@type": "Movie", "genres": "Adventure|Animation|Children|Comedy|Fantasy", "imdbId": 114709, "movieId": 1, "title": "Toy Story (1995)", "tmdbId": 862.0, "_self": "../../resource/https%3A%2F%2Fbbp.epfl.ch%2Fnexus%2Fv1%2Fresources%2Ftutorialnexus%2F$PROJECTLABEL%2F_%2FMovie_1", "_constrainedBy": "nxs:resource.json", "_project": "../../../../../projects/tutorialnexus/$PROJECTLABEL", "_createdAt": "2019-01-17T10:32:02.221Z", "_createdBy": "....", "_updatedAt": "2019-01-17T10:32:02.221Z", "_updatedBy": "....", "_rev": 1, "_deprecated": false }
Whenever a resource is created, Nexus injects some useful metadata. The table below details some of them:
Metadata | Description | Value Type |
---|---|---|
@id | Generated resource identifier. The user can provide its own identifier. | URI |
@type | The type of the resource if provided by the user. | URI |
_self | The resource address within Nexus. It contains the resource management details such as the organization, the project and the schema. | URI |
_createdAt | The resource creation date. | DateTime |
_createdBy | The resource creator. | DateTime |
Note that Nexus uses JSON-LD as data exchange format.
Filters are available to list specific resources. For example a list of resources of type Rating can be retrieved by running the following command:
- Command
-
source
nexus resources list --type Rating --size 5
- Output
-
source
+------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+ | Id | Type | Revision | Deprecated | +------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+ | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_1 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_9 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_12 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_7 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1 | False | | https://sandbox.bluebrainnexus.io/v1/resources/tutorialnexus/$PROJECTLABEL/_/Rating_8 | https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating | 1 | False | +------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------+------------+
As an exercise try to filter by tag to only retrieve resources of type Tag.
Query data
Listing is usually not enough to select specific subset of data. Data ingested within each project can be searched through two complementary search interfaces called views.
View | Description |
---|---|
ElasticSearchView | Exposes data in ElasticSearch a document oriented search engine and provide access to it using the ElasticSearch query language. |
SparqlView | Exposes data as a graph and allows to navigate and explore the data using the W3C SPARQL query language. |
Note that the following queries (ElasticSearch and SPARQL) contain the variable $PROJECTLABEL. It should be replaced by the current project. Please copy each query and use a text editor to replace $PROJECTLABEL.
Query data using the ElasticSearchView
The ElasticSearchView URL is available at the address https://sandbox.bluebrainnexus.io/v1/views/tutorialnexus/$PROJECTLABEL/documents/_search
.
- Select queries
-
source
# Select 5 ratings sorted by creation date in descending order nexus views query-es --data \ '{ "size":5, "sort" : [ { "_createdAt" : {"order" : "desc"} } ], "query": { "terms" : {"@type":["https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/Rating"]} } }'
- Graph navigation queries
-
source
# Not possible by default. # Relationships handling needs to be made explicit to ElasticSearch through a mapping.
Query data using the SparqlView
The SparqlView is available at the address https://sandbox.bluebrainnexus.io/v1/views/tutorialnexus/$PROJECTLABEL/graph/sparql]
. The following diagram shows how the MovieLens data is structured in the default Nexus SparqlView. Note that the ratings, tags and movies are joined by the movieId property.
- Select queries
-
source
# Select 5 ratings sorted by creation date in descending order nexus views query-sparql --data \ ' PREFIX vocab: <https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/> PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/> Select ?userId ?moviedId ?rating ?createdAt WHERE { ?ratingNode a vocab:Rating. ?ratingNode nxv:createdAt ?createdAt. ?ratingNode vocab:userId ?userId. ?ratingNode vocab:movieId ?moviedId. ?ratingNode vocab:rating ?rating. } ORDER BY DESC (?creationDate) LIMIT 5'
- Graph navigation queries
-
source
# Average rating score for movies tagged as funny nexus views query-sparql --data \ ' PREFIX vocab: <https://sandbox.bluebrainnexus.io/v1/vocabs/tutorialnexus/$PROJECTLABEL/> PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/> Select (AVG(?ratingvalue) AS ?score) WHERE { # Select movies ?movie a vocab:Movie. # Select their movieId values ?movie vocab:movieId ?movieId. # Keep movies with 'funny' tags ?tag a vocab:Tag. ?tag vocab:movieId ?movieId. ?tag vocab:tag ?tagvalue. FILTER(?tagvalue = "funny"). # Keep movies with ratings ?rating a vocab:Rating. ?rating vocab:movieId ?ratingmovieId. FILTER(xsd:integer(?ratingmovieId) = xsd:integer(?movieId)) ?rating vocab:rating ?ratingvalue. }'
Share data
Making a dataset public means granting read permissions to “anonymous” user.
$ nexus acls make-public
To check that the dataset is now public:
- Ask the person next to you to list resources in your project.
- Or create and select another profile named public-tutorial (following the instructions in the Set up. You should see the that the public-tutorial is selected and its corresponding token column is None.
- Output
-
source
Selected profile: tutorial +-------------------+----------+-------------------------------------+------------------+ | Profile | Selected | URL | Token | +-------------------+----------+-------------------------------------+------------------+ | tutorial | | https://sandbox.bluebrainnexus.io/v1 | Expiry: 2019... | | public-tutorial | Yes | https://sandbox.bluebrainnexus.io/v1 | None | +-------------------+----------+-------------------------------------+------------------+
- Resources in your project should be listed with the command even though you are not authenticated.
- Command
-
source
nexus resources list --size 5 -o tutorialnexus -p $PROJECTLABEL