CDP CLI for Replication Manager

Ashwin Harish
Engineering@Cloudera
3 min readAug 16, 2023

Replication Manager is a service in CDP Public Cloud that can be used to create replication policies to copy and migrate data from CDH clusters (HDFS, Hive, and HBase data) and CDP Private Cloud Base clusters (HDFS, Hive external tables, and HBase data) to CDP Public Cloud clusters. It also supports replicating HDFS data from cloud storage to classic clusters (CDH or CDP Private Cloud Base clusters), and Hive external tables to Data Hubs. The supported Public Cloud services include Amazon S3 and Microsoft Azure ADLS Gen2 (ABFS).

CDP CLI can be used to create and manage HDFS, and Hive replication policies in the replication manager. CDPCLI can be downloaded from the ‘Help’ tab of the Management Console of public cloud account as seen below

Creating HDFS and Hive policy

Steps to create HDFS and Hive policies on Replication Manager using CDP CLI

  1. Generate the policy definition JSON template using command cdp — profile <profile_name> replicationmanager create-policy — generate-cli-skeleton
  2. Populate parameters into the generate JSON and create the required policy definition
  3. Run the command cdp — profile <profile_name> replicationmanager create-policy — cli-input-json <policy definition> command to create a replication policy.

Template JSON for HDFS and Hive policy

{
"name": "string",
"type": "FS"|"HIVE",
"sourceDataset": {
"hdfsArguments": {
"path": "string",
"replicationStrategy": "DYNAMIC"|"STATIC",
"errorHandling": {
"skipChecksumChecks": true|false,
"skipListingChecksumChecks": true|false,
"abortOnError": true|false,
"abortOnSnapshotDiffFailures": true|false
},
"preserve": {
"blockSize": true|false,
"replicationCount": true|false,
"permissions": true|false,
"extendedAttributes": true|false
},
"deletePolicy": "KEEP_DELETED_FILES"|"DELETE_TO_TRASH"|"DELETE_PERMANENTLY",
"alers": {
"onFailure": true|false,
"onStart": true|false,
"onSuccess": true|false,
"onAbort": true|false
},
"exclusionFilters": [
"string", ...
]
},
"hiveArguments": {
"databasesAndTables": [
{
"database": "string",
"tablesIncludeRegex": "string",
"tablesExcludeRegex": "string"
},
...
],
"sentryPermissions": "INCLUDE"|"EXCLUDE",
"skipUrlPermissions": true|false,
"numThreads": integer
}
},
"frequencyInSec": integer,
"targetDataset": "string",
"cloudCredentials": "string",
"sourceCluster": "string",
"targetCluster": "string",
"startTime": "string",
"endTime": "string",
"distcpMaxMaps": integer,
"distcpMapBandwidth": integer,
"queueName": "string",
"tdeSameKey": true|false,
"description": "string",
"enableSnapshotBasedReplication": true|false,
"cloudEncryptionAlgorithm": "string",
"cloudEncryptionKey": "string",
"plugins": [
"string", ...
],
"hiveExternalTableBaseDirectory": "string",
"cmPolicySubmitUser": {
"userName": "string",
"sourceUser": "string"
}
}

Sample command for creating HDFS policy

> cdp --profile hrt_qa_int replicationmanager create-policy \
--cluster-crn crn:cdp:classicclusters:us-west-0:x0xxxx0x-xxxx-0x0x-x000-000000xx0000:cluster:00xxx0x0-xx00-000x-xxx0-000000xx00xx \
--policy-name basic_repl_CDPCLI_1662885927_pol_1662885927 \
--policy-definition '{
"type": "FS",
"name": "basic_repl_CDPCLI_1662885927_pol_1662885927",
"sourceCluster": "hatuitnxmn$Cluster 1",
"sourceDataset": {
"hdfsArguments": {
"path": "/user/hive/<path>/basic_repl_CDPCLI_1662885927"
}
},
"targetDataset": "abfs:/{filesystem}@{abfsaccount}.dfs.core.windows.net/{path}",
"frequencyInSec": 300,
"queueName": "default",
"cloudCredential": "<>_***_<>",
"enableSnapshotBasedReplication": false,
"cmPolicySubmitUser": {
"sourceUser": "<user>",
"userName": "<user>"
}
}'

Sample command for creating Hive policy

> cdp --profile hrt_qa_int replicationmanager create-policy \
--cluster-crn crn:cdp:datalake:x0xxxx0x-xxxx-0x0x-x000-000000xx0000:cluster:00xxx0x0-xx00-000x-xxx0-000000xx00xx \
--policy-name basic_hive_replication_cdpcli_1662885927_hive \
--policy-definition '{
"type": "HIVE",
"name": "basic_hive_replication_cdpcli_1662885927_hive",
"sourceCluster": "hatuitnxmn$Cluster 1",
"sourceDataset": {
"hiveArguments": {
"databasesAndTables": [
{
"database": "basic_hive_replication_cdpcli_1662885927",
"tablesIncludeRegex": ".*",
"tablesExcludeRegex": ""
}
],
"sentryPermissions": "EXCLUDE"
},
"hdfsArguments": {
"path": "abfs://{filesystem}@{abfsaccount}.dfs.core.windows.net/{path}",
"replicationStrategy": "DYNAMIC",
"errorHandling": {
"skipChecksumChecks": false,
"skipListingChecksumChecks": false,
"abortOnError": false,
"abortOnSnapshotDiffFailures": false
},
"preserve": {
"blockSize": true,
"replicationCount": true,
"permissions": true,
"extendedAttributes": false
},
"deletePolicy": "DELETE_PERMANENTLY",
"alert": {
"onFailure": false,
"onStart": false,
"onSuccess": false,
"onAbort": false
}
}
},
"targetCluster": "dmx-i8gjmb$dmx-i8gjmb",
"targetDataset": "basic_hive_replication_cdpcli_1662885927",
"frequencyInSec": 300,
"queueName": "<queue_name>",
"cloudCredential": "<>_***_<>",
"cmPolicySubmitUser": {
"sourceUser": "<user>",
"userName": "<user>"
},
"hiveExternalTableBaseDirectory": "abfs://{filesystem}@{abfsaccount}.dfs.core.windows.net/{path}"
}'

Managing HDFS and Hive policy

Operations that can be performed on the policy are as below

  • To suspend a running policy job, run the following command:
    cdp — profile <profile_name> replicationmanager suspend-policy — cluster-crn <target_cluster_crn> — policy-name <policy_name>
  • To activate a suspended policy job, run the following command:
    cdp — profile <profile_name> replicationmanager activate-policy — cluster-crn <target_cluster_crn> — policy-name <policy_name>
  • To delete a replicaion policy, run the following command:
    cdp — profile <profile_name> replicationmanager delete-policy — cluster-crn <target_cluster_crn> — policy-name <policy_name>

Additional CDP CLI commands

Table for list of additional CDPCLI commands for replicationmanager service

References

  1. Installing CDP Client — https://docs.cloudera.com/cdp-public-cloud/cloud/cli/topics/mc-installing-cdp-client.html
  2. CDPCLI for Replication Manager — https://docs.cloudera.com/replication-manager/cloud/reference/topics/rm-pc-cdpcli-overview.html

--

--