v1.3 To v1.4 Migration

As part of the v1.4 release, cassandra schema has changed due to a dependency update. Additionally, the 3 existing keyspaces (iam, admin and kg) must be merged into a single keyspace (delta).

Steps

The migration steps are as follows:

  1. Stop the 3 services: iam, admin and kg.
  2. Backup the cassandra store.
  3. Delete all ElasticSearch indices:
    curl -XDELETE 'http://{elasticsearch_host}/kg_*'
    
  4. Delete all BlazeGraph namespaces:

    for i in `curl -s 'http://{blazegraph_host}/blazegraph/namespace?describe-each-named-graph=false' | grep sparqlEndpoint | grep -o --color "rdf:resource=\"[^\"]*" | sed 's/rdf:resource="//' | sed 's#/sparql$##' | grep -v kb | grep -v LBS`
       do curl -X DELETE "$i"
    done
    
  5. Drop the kg.projections_progress and the kg.projections_failures tables:

    drop table kg.projections_progress;
    drop table kg.projections_failures;
    
  6. Make sure to allocate enough space to Cassandra, at least twice the space used by the keyspaces iam, admin and kg. The data migration involves copying the messages tables.

  7. Create the delta keyspace and tables:
    CREATE KEYSPACE IF NOT EXISTS delta WITH replication = {'class': 'NetworkTopologyStrategy', '<your_dc_name>' : 3 };
    
    CREATE KEYSPACE IF NOT EXISTS delta_snapshot WITH replication = {'class': 'NetworkTopologyStrategy', '<your_dc_name>' : 3 }; 
    
    CREATE TABLE IF NOT EXISTS delta.messages (
      persistence_id text,
      partition_nr bigint,
      sequence_nr bigint,
      timestamp timeuuid,
      timebucket text,
      writer_uuid text,
      ser_id int,
      ser_manifest text,
      event_manifest text,
      event blob,
      meta_ser_id int,
      meta_ser_manifest text,
      meta blob,
      tags set<text>,
      PRIMARY KEY ((persistence_id, partition_nr), sequence_nr, timestamp))
      WITH gc_grace_seconds =864000
      AND compaction = {
        'class' : 'SizeTieredCompactionStrategy',
        'enabled' : true,
        'tombstone_compaction_interval' : 86400,
        'tombstone_threshold' : 0.2,
        'unchecked_tombstone_compaction' : false,
        'bucket_high' : 1.5,
        'bucket_low' : 0.5,
        'max_threshold' : 32,
        'min_threshold' : 4,
        'min_sstable_size' : 50
        };
    
    CREATE TABLE IF NOT EXISTS delta.tag_views (
      tag_name text,
      persistence_id text,
      sequence_nr bigint,
      timebucket bigint,
      timestamp timeuuid,
      tag_pid_sequence_nr bigint,
      writer_uuid text,
      ser_id int,
      ser_manifest text,
      event_manifest text,
      event blob,
      meta_ser_id int,
      meta_ser_manifest text,
      meta blob,
      PRIMARY KEY ((tag_name, timebucket), timestamp, persistence_id, tag_pid_sequence_nr))
      WITH gc_grace_seconds =864000
      AND compaction = {
        'class' : 'SizeTieredCompactionStrategy',
        'enabled' : true,
        'tombstone_compaction_interval' : 86400,
        'tombstone_threshold' : 0.2,
        'unchecked_tombstone_compaction' : false,
        'bucket_high' : 1.5,
        'bucket_low' : 0.5,
        'max_threshold' : 32,
        'min_threshold' : 4,
        'min_sstable_size' : 50
        };
    
    CREATE TABLE IF NOT EXISTS delta.tag_write_progress(
      persistence_id text,
      tag text,
      sequence_nr bigint,
      tag_pid_sequence_nr bigint,
      offset timeuuid,
      PRIMARY KEY (persistence_id, tag));
    
    CREATE TABLE IF NOT EXISTS delta.tag_scanning(
      persistence_id text,
      sequence_nr bigint,
      PRIMARY KEY (persistence_id));
    
    CREATE TABLE IF NOT EXISTS delta.metadata(
      persistence_id text PRIMARY KEY,
      deleted_to bigint,
      properties map<text,text>);
    
    CREATE TABLE IF NOT EXISTS delta.all_persistence_ids(
      persistence_id text PRIMARY KEY);
    
    CREATE TABLE IF NOT EXISTS delta.projections_progress (
        projection_id text PRIMARY KEY,
        progress text
    ) WITH bloom_filter_fp_chance = 0.01
        AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
        AND comment = ''
        AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
        AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
        AND crc_check_chance = 1.0
        AND dclocal_read_repair_chance = 0.1
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND min_index_interval = 128
        AND read_repair_chance = 0.0
        AND speculative_retry = '99PERCENTILE';
    
    CREATE TABLE IF NOT EXISTS delta.projections_failures (
        projection_id text,
        offset text,
        persistence_id text,
        sequence_nr bigint,
        value text,
        PRIMARY KEY (projection_id, offset, persistence_id, sequence_nr)
    ) WITH CLUSTERING ORDER BY (offset ASC, persistence_id ASC, sequence_nr ASC)
        AND bloom_filter_fp_chance = 0.01
        AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
        AND comment = ''
        AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
        AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
        AND crc_check_chance = 1.0
        AND dclocal_read_repair_chance = 0.1
        AND default_time_to_live = 0
        AND gc_grace_seconds = 864000
        AND max_index_interval = 2048
        AND memtable_flush_period_in_ms = 0
        AND min_index_interval = 128
        AND read_repair_chance = 0.0
        AND speculative_retry = '99PERCENTILE';
    
    CREATE TABLE IF NOT EXISTS delta_snapshot.snapshots (
      persistence_id text,
      sequence_nr bigint,
      timestamp bigint,
      ser_id int,
      ser_manifest text,
      snapshot_data blob,
      snapshot blob,
      meta_ser_id int,
      meta_ser_manifest text,
      meta blob,
      PRIMARY KEY (persistence_id, sequence_nr))
      WITH CLUSTERING ORDER BY (sequence_nr DESC) AND gc_grace_seconds =864000
      AND compaction = {
        'class' : 'SizeTieredCompactionStrategy',
        'enabled' : true,
        'tombstone_compaction_interval' : 86400,
        'tombstone_threshold' : 0.2,
        'unchecked_tombstone_compaction' : false,
        'bucket_high' : 1.5,
        'bucket_low' : 0.5,
        'max_threshold' : 32,
        'min_threshold' : 4,
        'min_sstable_size' : 50
        };
    
  8. Check and update the necessary configuration environment variables/JVM properties according to configuration migration guide
  9. Deploy the new service images (tag 1.4) for delta and run it with the MIGRATE_V13_TO_V14=true and REPAIR_FROM_MESSAGES=true environment variables. This will instruct the service to copy all the messages table from iam, admin and kg into the delta keyspace. Afterwards the tag_views table and their related tables will be properly initialized.

  10. Once the service is up and running, and you have verified the resources are accessible, you can proceed to delete the previous keyspaces:

    DROP KEYSPACE iam;
    DROP KEYSPACE admin;
    DROP KEYSPACE kg;
    

Configuration

This section provides a table with the equivalences in terms of configuration environment variables from 1.3 (and older versions) to 1.4. Only the most relevant environment configuration properties are described

There are 2 ways to inject your own configuration: - Using JVM properties as arguments when running the service: -D{property}. For example: -Dapp.instance.interface="127.0.0.1". - Setting FORCE_CONFIG_ style environment variables. In order to enable this style of configuration the JVM property -Dconfig.override_with_env_vars=true needs to be set. For example: CONFIG_FORCE_app_instance_interface="127.0.0.1".

Note that there has been one important change which cannot be mapped using the following table: the service account token.

In 1.4 release, the IAM_SA_TOKEN is not used. Instead, the JVM properties app.service-account-caller.realm, app.service-account-caller.subject and app.service-account-caller.groups should be set matching the identity of the service account. If these properties are not present, the service account identity will be set to anonymous.

Description Env. variable 1.3 property 1.4 Env. variable override 1.4
Default timeout for Actors interaction DEFAULT_ASK_TIMEOUT app.default-ask-timeout CONFIG_FORCE_app_default__ask__timeout
The service binding interface BIND_INTERFACE app.instance.interface CONFIG_FORCE_app_instance_interface
The service binding port BIND_PORT app.http.port CONFIG_FORCE_app_http_port
The service Uri Path prefix HTTP_PREFIX app.http.prefix CONFIG_FORCE_app_http_prefix
The service publicly exposed Uri PUBLIC_URI app.http.public-uri CONFIG_FORCE_app_http_public__uri
The service account realm (when different than anonymous) app.service-account-caller.realm CONFIG_FORCE_app_service__account__caller_realm
The service account subject (when different than anonymous) app.service-account-caller.subject CONFIG_FORCE_app_service__account__caller_subject
The service account groups (when desired). Array field app.service-account-caller.groups CONFIG_FORCE_app_service__account__caller_groups
The cassandra contact point (1) CASSANDRA_CONTACT_POINT1 datastax-java-driver.basic.contact-points.1
The cassandra contact point (2) CASSANDRA_CONTACT_POINT2 datastax-java-driver.basic.contact-points.2
The cassandra contact point (3) CASSANDRA_CONTACT_POINT3 datastax-java-driver.basic.contact-points.3
The cassandra username CASSANDRA_USERNAME datastax-java-driver.advanced.auth-provider.username CONFIG_FORCE_datastax__java__driver_advanced_auth__provider_username
The cassandra password CASSANDRA_PASSWORD datastax-java-driver.advanced.auth-provider.password CONFIG_FORCE_datastax__java__driver_advanced_auth__provider_password
The time after which actors without interaction are shutdown PASSIVATION_TIMEOUT app.cluster.passivation-timeout CONFIG_FORCE_app_cluster_passivation__timeout
The Caches (Akka Distributed Data) request timeout REPLICATION_TIMEOUT app.cluster.replication-timeout CONFIG_FORCE_app_cluster_replication__timeout
The number of cluster shards SHARDS app.cluster.shards CONFIG_FORCE_app_cluster_shards
The seeds to use for joining a cluster SEED_NODES app.cluster.seeds CONFIG_FORCE_app_cluster_seeds
The default page size when paginating on listings PAGINATION_DEFAULT_SIZE app.pagination.default-size CONFIG_FORCE_app_pagination_default__size
The maximum page size allowed when paginating on listings PAGINATION_MAX_SIZE app.pagination.size-limit CONFIG_FORCE_app_pagination_size__limit
The default indexing number of events taken per batch INDEXING_BATCH app.indexing.batch CONFIG_FORCE_app_indexing_batch
The default indexing amount of time to wait for the bach events INDEXING_BATCH_TIMEOUT app.indexing.batch-timeout CONFIG_FORCE_app_indexing_batch__timeout
The default indexing retry strategy INDEXING_RETRY_STRATEGY app.indexing.retry.strategy CONFIG_FORCE_app_indexing_retry_strategy
The default indexing first retry delay INDEXING_RETRY_INITIAL_DELAY app.indexing.retry.initial-delay CONFIG_FORCE_app_indexing_retry_initial__delay
The default indexing maximum retry delay INDEXING_RETRY_MAX_DELAY app.indexing.retry.max-delay CONFIG_FORCE_app_indexing_retry_max__delay
The default indexing maximum number of retries INDEXING_RETRY_MAX_RETRIES app.indexing.retry.max-retries CONFIG_FORCE_app_indexing_retry_max__retries
The default number of events after which the indexing progress is persisted INDEXING_PROGRESS_EVENTS app.indexing.progress.persist-after-processed CONFIG_FORCE_app_indexing_progress_persist__after__processed
The default amount of time after which the indexing progress is persisted INDEXING_PROGRESS_TIME app.indexing.progress.max-time-window CONFIG_FORCE_app_indexing_progress_max__time__window
Default number of shards for aggregates AGGREGATE_SHARDS app.aggregate.shards CONFIG_FORCE_app_aggregate_shards
The amount of time ACLs are kept in memory after being accessed ACLS_AGGREGATE_LAST_INTERACTION_PASSIVATION_TIMEOUT app.acls.aggregate.passivation.lapsed-since-last-interaction CONFIG_FORCE_app_acls_aggregate_passivation_lapsed__since__last__interaction
The amount of time permissions are kept in memory after being accessed PERMISSIONS_AGGREGATE_LAST_INTERACTION_PASSIVATION_TIMEOUT app.permissions.aggregate.passivation.lapsed-since-last-interaction CONFIG_FORCE_app_permissions_aggregate_passivation_lapsed__since__last__interaction
The amount of time Realms are kept in memory after being accessed REALMS_AGGREGATE_LAST_INTERACTION_PASSIVATION_TIMEOUT app.realms.aggregate.passivation.lapsed-since-last-interaction CONFIG_FORCE_app_realms_aggregate_passivation_lapsed__since__last__interaction
The amount of time Groups are kept in memory after being accessed GROUPS_CACHE_LAST_INTERACTION_STOP_TIMEOUT app.groups.invalidation.lapsed-since-last-interaction CONFIG_FORCE_app_groups_invalidation_lapsed__since__last__interaction
The amount of time Organizations are kept in memory after being accessed AGGREGATE_LAST_INTERACTION_PASSIVATION_TIMEOUT app.organizations.aggregate.passivation.lapsed-since-last-interaction CONFIG_FORCE_app_organizations_aggregate_passivation_lapsed__since__last__interaction
The amount of time Projects are kept in memory after being accessed AGGREGATE_LAST_INTERACTION_PASSIVATION_TIMEOUT app.projects.aggregate.passivation.lapsed-since-last-interaction CONFIG_FORCE_app_organizations_aggregate_passivation_lapsed__since__last__interaction
The path where the DiskStorage reads/writes files VOLUME_PATH app.storage.disk.volume CONFIG_FORCE_app_storage_disk_volume
The maximum allowed upload size (in bytes) for DiskStorage DISK_MAX_FILE_SIZE app.storage.disk.max-file-size CONFIG_FORCE_app_storage_disk_max__file__size
The default RemoteDiskStorage endpoint REMOTE_DISK_DEFAULT_ENDPOINT app.storage.remote-disk.default-endpoint CONFIG_FORCE_app_storage_remote__disk_default__endpoint
The default RemoteDiskStorage Uri Path prefix REMOTE_DISK_DEFAULT_ENDPOINT_PREFIX app.storage.remote-disk.default-endpoint-prefix CONFIG_FORCE_app_storage_remote__disk_default__endpoint__prefix
The default RemoteDiskStorage Bearer Token REMOTE_DISK_DEFAULT_CREDENTIALS app.storage.remote-disk.default-credentials CONFIG_FORCE_app_storage_remote__disk_default__credentials
The maximum allowed upload size (in bytes) for RemoteDiskStorage REMOTE_DISK_MAX_FILE_SIZE app.storage.remote-disk.max-file-size CONFIG_FORCE_app_storage_remote__disk_max__file__size
The maximum allowed upload size (in bytes) for S3Storage S3_MAX_FILE_SIZE app.storage.amazon.max-file-size CONFIG_FORCE_app_storage_amazon_max__file__size
The Blazegraph endpoint SPARQL_BASE_URI app.sparql.base CONFIG_FORCE_app_sparql_base
The Blazegraph query retry strategy QUERYING_SPARQL_RETRY_STRATEGY app.sparql.query.retry.strategy CONFIG_FORCE_app_sparql_query_retry_strategy
The Blazegraph query first retry delay QUERYING_SPARQL_RETRY_INITIAL_DELAY app.sparql.query.retry.initial-delay CONFIG_FORCE_app_sparql_query_retry_initial__delay
The Blazegraph query maximum retry delay QUERYING_SPARQL_RETRY_MAX_DELAY app.sparql.query.retry.max-delay CONFIG_FORCE_app_sparql_query_retry_max__delay
The Blazegraph query maximum number of retries QUERYING_SPARQL_RETRY_MAX_RETRIES app.sparql.query.retry.max-retries CONFIG_FORCE_app_sparql_query_retry_max__retries
The ElasticSearch endpoint ELASTIC_SEARCH_BASE_URI app.elastic-search.base CONFIG_FORCE_app_elastic__search_base
The ElasticSearch query retry strategy QUERYING_ELASTIC_SEARCH_RETRY_STRATEGY app.elastic-search.query.retry.strategy CONFIG_FORCE_app_elastic__search_query_retry_strategy
The ElasticSearch query first retry delay QUERYING_ELASTIC_SEARCH_RETRY_INITIAL_DELAY app.elastic-search.query.retry.initial-delay CONFIG_FORCE_app_elastic__search_query_retry_initial__delay
The ElasticSearch query maximum retry delay QUERYING_ELASTIC_SEARCH_RETRY_MAX_DELAY app.elastic-search.query.retry.max-delay CONFIG_FORCE_app_elastic__search_query_retry_max__delay
The ElasticSearch query maximum number of retries QUERYING_ELASTIC_SEARCH_RETRY_MAX_RETRIES app.elastic-search.query.retry.max-retries CONFIG_FORCE_app_elastic__search_query_retry_max__retries
The CompositeView allowed maximum number of sources COMPOSITE_MAX_SOURCES app.composite.max-sources CONFIG_FORCE_app_composite_max__sources
The CompositeView allowed maximum number of projections COMPOSITE_MAX_PROJECTIONS app.composite.max-projections CONFIG_FORCE_app_composite_max__projections
The CompositeView allowed minimum rebuild interval time COMPOSITE_MIN_REBUILD_INTERVAL app.composite.min-interval-rebuild CONFIG_FORCE_app_composite_min__interval__rebuild
The CompositeView password used to encrypt token COMPOSITE_TOKEN_PASSWORD app.composite.password CONFIG_FORCE_app_composite_password
The CompositeView salt used to encrypt token COMPOSITE_TOKEN_SALT app.composite.salt CONFIG_FORCE_app_composite_salt
The amount of time Archives are kept in memory after being accessed ARCHIVES_CACHE_INVALIDATE_AFTER app.archives.cache-invalidate-after CONFIG_FORCE_app_archives_cache__invalidate__after
The maximum number of resources allowed on an Archive ARCHIVES_MAX_RESOURCES app.archives.max-resources CONFIG_FORCE_app_archives_max__resources
The maximum allowed service payload size (except for files) AKKA_HTTP_SERVER_MAX_CONTENT_LENGTH akka.http.server.parsing.max-content-length CONFIG_FORCE_akka_http_server_parsing_max__content__length
The maximum allowed client payload size (used for storages) AKKA_HTTP_CLIENT_MAX_CONTENT_LENGTH akka.http.client.parsing.max-content-length CONFIG_FORCE_akka_http_client_parsing_max__content__length
The maximum allowed number of connections AKKA_HTTP_MAX_CONNECTIONS akka.http.host-connection-pool.max-connections CONFIG_FORCE_akka_http_host__connection__pool_max__connections
The maximum allowed concurrently opened requests AKKA_HTTP_MAX_OPEN_REQUESTS akka.http.host-connection-pool.max-open-requests CONFIG_FORCE_akka_http_host__connection__pool_max__open__requests
The log level for Akka logs AKKA_LOG_LEVEL akka.loglevel CONFIG_FORCE_akka_loglevel