Replicating Kafka Instance Data

Create a Smart Connect task to copy data unidirectionally or bidirectionally between two Kafka instances.

Note

Data in the source Kafka instance is synchronized to the target Kafka instance in real time.

Notes and Constraints

  • This function is unavailable for single-node Kafka 3.x instances.

  • A maximum of 18 Smart Connect tasks can be created for an instance.

  • When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Using a VPC Endpoint Across VPCs or VPC Peering Connection.

  • After a Smart Connect task is created, task parameters cannot be modified.

Prerequisites

Procedure

  1. Log in to the console.

  2. Click image1 in the upper left corner to select a region.

    Note

    Select the region where your Kafka instance is located.

  3. Click Service List and choose Application > Distributed Message Service. The Kafka instance list is displayed.

  4. Click the desired Kafka instance to view its details.

  5. In the navigation pane, choose Smart Connect.

  6. On the displayed page, click Create Task.

  7. For Task Name, enter a unique Smart Connect task name. Naming rules: 4-64 characters and only letters, digits, hyphens (-), or underscores (_).

  8. For Task Type, select Copy Kafka data.

  9. For Start Immediately, specify whether to execute the task immediately after the task is created. By default, the task is executed immediately. If you disable this option, you can enable it later in the task list.

  10. In the Current Kafka area, set the instance alias. Naming rules: 1-20 characters and only letters, digits, hyphens (-), or underscores (_).

    The instance alias is used in the following scenarios:

    • If you enable Rename Topics and select Push or Both for Sync Direction, the alias of the current Kafka instance will be added to the topic names of the peer end Kafka instance. For example, if the alias of the current Kafka instance is A and the topic name of the peer end Kafka instance is test, the renamed topic will be A.test.

    • After a Smart Connect task of Kafka data replication is created, a topic named mm2-offset-syncs.peer end Kafka instance alias.internal is generated for the current Kafka instance. If the task has Sync Consumer Offset enabled and uses Pull or Both for Sync Direction, a topic named peer end Kafka instance alias.checkpoints.internal is also created for the current Kafka instance. The two topics are used to store internal data. If they are deleted, data replication will fail.

  11. In the Peer Kafka area, configure the following parameters.

    Table 1 Peer Kafka parameters

    Parameter

    Description

    Instance Alias

    Naming rules: 1-20 characters and only letters, digits, hyphens (-), or underscores (_).

    The instance alias is used in the following scenarios:

    • If you enable Rename Topics and select Pull or Both for Sync Direction, the alias of the peer end Kafka instance will be added to the topic names of the current Kafka instance. For example, if the alias of the peer end Kafka instance is B and the topic name of the current Kafka instance is test01, the renamed topic will be B.test01.

    • After a Smart Connect task of Kafka data replication is created, if the task has Sync Consumer Offset enabled and uses Push or Both for Sync Direction, a topic named current Kafka instance alias.checkpoints.internal is also created for the peer end Kafka instance. This topic is used to store internal data. If it is deleted, data replication will fail.

    Config Type

    Options:

    • Kafka address: Enter Kafka instance addresses.

    • Instance name: Select an existing Kafka instance.

    Instance Name

    Set this parameter when Config Type is set to Instance name.

    Select an existing Kafka instance from the drop-down list.

    The peer Kafka instance and the current Kafka instance must be in the same VPC. Otherwise, they cannot be identified.

    Kafka Address

    Set this parameter when Config Type is set to Kafka address.

    Enter the IP addresses and port numbers for connecting to the Kafka instance.

    When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Using a VPC Endpoint Across VPCs or VPC Peering Connection.

    Authentication

    Options:

    • SASL_SSL: The Kafka instance has enabled SASL_SSL, clients can connect to it with SASL and the data will be encrypted using the SSL certificate.

    • SASL_PLAINTEXT: The Kafka instance has enabled SASL_PLAINTEXT, clients can connect to it with SASL and the data will be transmitted in plaintext.

    • PLAINTEXT: The instance is not using authentication.

    Authentication Mechanism

    Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.

    • SCRAM-SHA-512: uses the hash algorithm to generate credentials for usernames and passwords to verify identities. SCRAM-SHA-512 is more secure than PLAIN.

    • PLAIN: a simple username and password verification mechanism.

    Username

    Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.

    Set in instance creation or user creation.

    Password

    Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.

    Set in instance creation or user creation.

    Important

    After a Smart Connect task is created, modifying the authentication method or mechanism, or password of the peer end instance causes the synchronization task to fail. In this case, delete the current task and create another one.

  12. In the Rules area, configure the following parameters.

    Table 2 Parameters for configuring data replication rules

    Parameter

    Description

    Sync Direction

    There are three synchronization directions:

    • Pull: Replicates data from the peer Kafka instance to the current Kafka instance.

    • Push: Replicates data from the current Kafka instance to the peer Kafka instance.

    • Both: Bidirectional replication of Kafka instance data on both ends.

    Topics

    Specify the topics whose data is to be replicated.

    • Regular expression: Use a regular expression to match topics.

    • Enter/Select: Enter topic names. To enter multiple topic names, press Enter after entering each topic name. You can also select topics from the drop-down list. A maximum of 20 topics can be entered or selected.

    Note

    Data of topics whose names end with "internal" (for example, topic.internal) will not be synchronized.

    Tasks

    Number of data replication tasks. The default value is 2. You are advised to use the default value.

    If Sync Direction is set to Both, the actual number of tasks will be twice the number of tasks you configure here.

    Rename Topics

    Add the alias of the source Kafka instance before the target topic name to form a new name of the target topic. For example, if the alias of the source instance is A and the target topic name is test, the renamed target topic will be A.test.

    If you select Both for Sync Direction, enable Rename Topics to prevent infinite replication.

    Add Source Header

    The target topic receives the replicated messages. The message header contains the message source.

    If you select Both for Sync Direction, Add Source Header is enabled by default to prevent infinite replication.

    Sync Consumer Offset

    Enable this option to synchronize the consumer offset to the target Kafka instance.

    Important

    NOTICE: After enabling Sync Consumer Offset, pay attention to the following:

    • The source and target Kafka instances cannot consume messages at the same time. Otherwise, the synchronized consumer offset will be abnormal.

    • The consumer offset is synchronized every minute. As a result, the consumer offset on the target end may be slightly smaller than that on the source end, and some messages are repeatedly consumed. The service logic of the consumer client must be able to handle repeated consumption.

    • The offset synchronized from the source end is not the same as the offset on the target end. Instead, there is a mapping relationship. If the consumer offset is maintained by the consumer client, the consumer client does not obtain the consumer offset from the target Kafka instance after switching consumption from the source Kafka instance to the target Kafka instance. As a result, the offset may be incorrect or the consumer offset may be reset.

    Replicas

    Number of topic replicas when a topic is automatically created in the peer instance. The value of this parameter cannot exceed the number of brokers in the peer instance.

    This parameter takes precedence over the default.replication.factor parameter set in the peer instance.

    Start Offset

    Options:

    • Minimum offset: dumping the earliest data

    • Maximum offset: dumping the latest data

    Compression

    Compression algorithm to use for copying messages.

    Topic Mapping

    Customize the target topic name.

    Maximum mappings: 20. Rename Topic and Topic Mapping cannot be configured at the same time.

    Important

    • When creating a bidirectional replication task, you must enable Rename Topics or Add Source Header to prevent infinite replication. If you specify the same topic for a pull task and a push task between two instances (forming bidirectional replication), and Rename Topics and Add Source Header are not enabled for the two tasks, data will be replicated infinitely.

    • If you create two or more tasks with the same configuration and enable Sync Consumer Offset for them, data will be repeatedly replicated and the consumer offset of the target topic will be abnormal.

  13. (Optional) In the lower right corner of the page, click Check to test the connectivity between the Kafka instances.

    If "Connectivity check passed." is displayed, the Kafka instances are connected.

  14. Click Create. The Smart Connect task list page is displayed. The message "Task xxx was created successfully." is displayed in the upper right corner of the page.

    Important

    • After a Smart Connect task of Kafka data replication is created, a topic named mm2-offset-syncs.peer end Kafka instance alias.internal is generated for the current Kafka instance. If the task has Sync Consumer Offset enabled and uses Pull or Both for Sync Direction, a topic named peer end Kafka instance alias.checkpoints.internal is also created for the current Kafka instance. The two topics are used to store internal data. If they are deleted, data replication will fail.

    • After a Smart Connect task of Kafka data replication is created, if the task has Sync Consumer Offset enabled and uses Push or Both for Sync Direction, a topic named current Kafka instance alias.checkpoints.internal is also created for the peer end Kafka instance. This topic is used to store internal data. If it is deleted, data replication will fail.