section> Computing
  • Auto Scaling
  • Bare Metal Server
  • Dedicated Host
  • Elastic Cloud Server
  • FunctionGraph
  • Image Management Service
Network
  • Direct Connect
  • Domain Name Service
  • Elastic IP
  • Elastic Load Balancing
  • Enterprise Router
  • NAT Gateway
  • Private Link Access Service
  • Secure Mail Gateway
  • Virtual Private Cloud
  • Virtual Private Network
  • VPC Endpoint
Storage
  • Cloud Backup and Recovery
  • Cloud Server Backup Service
  • Elastic Volume Service
  • Object Storage Service
  • Scalable File Service
  • Storage Disaster Recovery Service
  • Volume Backup Service
Application
  • API Gateway (APIG)
  • Application Operations Management
  • Application Performance Management
  • Distributed Message Service (for Kafka)
  • Simple Message Notification
Data Analysis
  • Cloud Search Service
  • Data Lake Insight
  • Data Warehouse Service
  • DataArts Studio
  • MapReduce Service
  • ModelArts
  • Optical Character Recognition
Container
  • Application Service Mesh
  • Cloud Container Engine
  • Cloud Container Instance
  • Software Repository for Containers
Databases
  • Data Replication Service
  • Distributed Cache Service
  • Distributed Database Middleware
  • Document Database Service
  • GeminiDB
  • Relational Database Service
  • TaurusDB
Management & Deployment
  • Cloud Create
  • Cloud Eye
  • Cloud Trace Service
  • Config
  • Log Tank Service
  • Resource Formation Service
  • Tag Management Service
Security Services
  • Anti-DDoS
  • Cloud Firewall
  • Database Security Service
  • Dedicated Web Application Firewall
  • Host Security Service
  • Identity and Access Management
  • Key Management Service
  • Web Application Firewall
Other
  • Enterprise Dashboard
  • Marketplace
  • Price Calculator
  • Status Dashboard
APIs
  • REST API
  • API Usage Guidelines
  • Endpoints
Development and Automation
  • SDKs
  • Drivers and Tools
  • Terraform
  • Ansible
  • Cloud Create
Architecture Center
  • Best Practices
  • Blueprints
IaaSComputingAuto ScalingBare Metal ServerDedicated HostElastic Cloud ServerFunctionGraphImage Management ServiceNetworkDirect ConnectDomain Name ServiceElastic IPElastic Load BalancingEnterprise RouterNAT GatewayPrivate Link Access ServiceSecure Mail GatewayVirtual Private CloudVirtual Private NetworkVPC EndpointStorageCloud Backup and RecoveryCloud Server Backup ServiceElastic Volume ServiceObject Storage ServiceScalable File ServiceStorage Disaster Recovery ServiceVolume Backup ServicePaaSApplicationAPI Gateway (APIG)Application Operations ManagementApplication Performance ManagementDistributed Message Service (for Kafka)Simple Message NotificationData AnalysisCloud Search ServiceData Lake InsightData Warehouse ServiceDataArts StudioMapReduce ServiceModelArtsOptical Character RecognitionContainerApplication Service MeshCloud Container EngineCloud Container InstanceSoftware Repository for ContainersDatabasesData Replication ServiceDistributed Cache ServiceDistributed Database MiddlewareDocument Database ServiceGeminiDBRelational Database ServiceTaurusDBManagementManagement & DeploymentCloud CreateCloud EyeCloud Trace ServiceConfigLog Tank ServiceResource Formation ServiceTag Management ServiceSecuritySecurity ServicesAnti-DDoSCloud FirewallDatabase Security ServiceDedicated Web Application FirewallHost Security ServiceIdentity and Access ManagementKey Management ServiceWeb Application FirewallOtherOtherEnterprise DashboardMarketplacePrice CalculatorStatus Dashboard

Data Lake Insight

  • SQL Jobs
  • Flink OpenSource SQL Jobs
  • Flink Jar Jobs
    • Stream Ecosystem
    • Flink Jar Job Examples
    • Writing Data to OBS Using Flink Jar
  • Spark Jar Jobs
  • Change History
  • Developer Guide
  • Flink Jar Jobs
  • Writing Data to OBS Using Flink Jar

Writing Data to OBS Using Flink Jar¶

Overview¶

DLI allows you to use a custom JAR package to run Flink jobs and write data to OBS. This section describes how to write processed Kafka data to OBS. You need to modify the parameters in the example Java code based on site requirements.

Environment Preparations¶

Development tools such as IntelliJ IDEA and other development tools, JDK, and Maven have been installed and configured.

Note

  • For details about how to configure the pom.xml file of the Maven project, see "POM file configurations" in Java Example Code.

  • Ensure that you can access the public network in the local compilation environment.

Constraints¶

  • In the left navigation pane of the DLI console, choose Global Configuration > Service Authorization. On the page displayed, select Tenant Administrator(Global service) and click Update.

  • The bucket to which data is written must be an OBS bucket created by a main account.

Java Example Code¶

  • POM file configurations

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <parent>
            <artifactId>Flink-demo</artifactId>
            <version>1.0-SNAPSHOT</version>
        </parent>
        <modelVersion>4.0.0</modelVersion>
    
        <artifactId>flink-kafka-to-obs</artifactId>
    
        <properties>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
            <!--Flink version-->
            <flink.version>1.12.2</flink.version>
            <!--JDK version -->
            <java.version>1.8</java.version>
            <!--Scala 2.11 -->
            <scala.binary.version>2.11</scala.binary.version>
            <slf4j.version>2.13.3</slf4j.version>
            <log4j.version>2.10.0</log4j.version>
            <maven.compiler.source>8</maven.compiler.source>
            <maven.compiler.target>8</maven.compiler.target>
        </properties>
    
        <dependencies>
            <!-- flink -->
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-java</artifactId>
                <version>${flink.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
                <version>${flink.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
                <version>${flink.version}</version>
                <scope>provided</scope>
            </dependency>
    
            <!--  kafka  -->
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-kafka_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
    
            <!--  logging  -->
            <dependency>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-slf4j-impl</artifactId>
                <version>${slf4j.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-api</artifactId>
                <version>${log4j.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-core</artifactId>
                <version>${log4j.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.logging.log4j</groupId>
                <artifactId>log4j-jcl</artifactId>
                <version>${log4j.version}</version>
                <scope>provided</scope>
            </dependency>
        </dependencies>
    
        <build>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>3.3.0</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                    <configuration>
                        <archive>
                            <manifest>
                                <mainClass>com.dli.FlinkKafkaToObsExample</mainClass>
                            </manifest>
                        </archive>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                </plugin>
            </plugins>
            <resources>
                <resource>
                    <directory>../main/config</directory>
                    <filtering>true</filtering>
                    <includes>
                        <include>**/*.*</include>
                    </includes>
                </resource>
            </resources>
        </build>
    </project>
    
  • Example code

    import org.apache.flink.api.common.serialization.SimpleStringEncoder;
    import org.apache.flink.api.common.serialization.SimpleStringSchema;
    import org.apache.flink.api.java.utils.ParameterTool;
    import org.apache.flink.contrib.streaming.state.RocksDBStateBackend;
    import org.apache.flink.core.fs.Path;
    import org.apache.flink.runtime.state.filesystem.FsStateBackend;
    import org.apache.flink.streaming.api.datastream.DataStream;
    import org.apache.flink.streaming.api.environment.CheckpointConfig;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink;
    import org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.DateTimeBucketAssigner;
    import org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.OnCheckpointRollingPolicy;
    import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
    import org.apache.kafka.clients.consumer.ConsumerConfig;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    import java.util.Properties;
    
    /**
     * @author xxx
     * @date 6/26/21
     */
    public class FlinkKafkaToObsExample {
        private static final Logger LOG = LoggerFactory.getLogger(FlinkKafkaToObsExample.class);
    
        public static void main(String[] args) throws Exception {
            LOG.info("Start Kafka2OBS Flink Streaming Source Java Demo.");
            ParameterTool params = ParameterTool.fromArgs(args);
            LOG.info("Params: " + params.toString());
    
            // Kafka connection address
            String bootstrapServers;
            // Kafka consumer group
            String kafkaGroup;
            // Kafka topic
            String kafkaTopic;
            // Consumption policy. This policy is used only when the partition does not have a checkpoint or the checkpoint expires.
            // If a valid checkpoint exists, consumption continues from this checkpoint.
            // When the policy is set to LATEST, the consumption starts from the latest data. This policy will ignore the existing data in the stream.
            // When the policy is set to EARLIEST, the consumption starts from the earliest data. This policy will obtain all valid data in the stream.
            String offsetPolicy;
            // OBS file output path, in the format of obs://bucket/path.
            String outputPath;
            // Checkpoint output path, in the format of obs://bucket/path.
            String checkpointPath;
    
            bootstrapServers = params.get("bootstrap.servers", "xxxx:9092,xxxx:9092,xxxx:9092");
            kafkaGroup = params.get("group.id", "test-group");
            kafkaTopic = params.get("topic", "test-topic");
            offsetPolicy = params.get("offset.policy", "earliest");
            outputPath = params.get("output.path", "obs://bucket/output");
           checkpointPath = params.get("checkpoint.path", "obs://bucket/checkpoint");
    
            try {
                //Create an execution environment.
                StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment();
                streamEnv.setParallelism(4);
                RocksDBStateBackend rocksDbBackend = new RocksDBStateBackend(checkpointPath, true);
                RocksDBStateBackend rocksDbBackend = new RocksDBStateBackend(new FsStateBackend(checkpointPath), true);
                streamEnv.setStateBackend(rocksDbBackend);
                // Enable Flink checkpointing mechanism. If enabled, the offset information will be synchronized to Kafka.
                streamEnv.enableCheckpointing(300000);
                // Set the minimum interval between two checkpoints.
                streamEnv.getCheckpointConfig().setMinPauseBetweenCheckpoints(60000);
                // Set the checkpoint timeout duration.
                streamEnv.getCheckpointConfig().setCheckpointTimeout(60000);
                // Set the maximum number of concurrent checkpoints.
                streamEnv.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
                // Retain checkpoints when a job is canceled.
                streamEnv.getCheckpointConfig().enableExternalizedCheckpoints(
                        CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
    
                // Source: Connect to the Kafka data source.
                Properties properties = new Properties();
                properties.setProperty("bootstrap.servers", bootstrapServers);
                properties.setProperty("group.id", kafkaGroup);
                properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, offsetPolicy);
                String topic = kafkaTopic;
    
                // Create a Kafka consumer.
                FlinkKafkaConsumer<String> kafkaConsumer =
                        new FlinkKafkaConsumer<>(topic, new SimpleStringSchema(), properties);
                /**
                 * Read partitions from the offset submitted by the consumer group (specified by group.id in the consumer attribute) in Kafka brokers.
                 * If the partition offset cannot be found, set it by using the auto.offset.reset parameter.
                 * For details, see https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/connectors/datastream/kafka/.
                 */
                kafkaConsumer.setStartFromGroupOffsets();
    
                // Add Kafka to the data source.
                DataStream<String> stream = streamEnv.addSource(kafkaConsumer).setParallelism(3).disableChaining();
    
                // Create a file output stream.
                final StreamingFileSink<String> sink = StreamingFileSink
                        // Specify the file output path and row encoding format.
                        .forRowFormat(new Path(outputPath), new SimpleStringEncoder<String>("UTF-8"))
                        // Specify the file output path and bulk encoding format. Files are output in parquet format.
                        //.forBulkFormat(new Path(outputPath), ParquetAvroWriters.forGenericRecord(schema))
                        // Specify a custom bucket assigner.
                        .withBucketAssigner(new DateTimeBucketAssigner<>())
                        // Specify the rolling policy.
                        .withRollingPolicy(OnCheckpointRollingPolicy.build())
                        .build();
    
                // Add sink for DIS Consumer data source
                stream.addSink(sink).disableChaining().name("obs");
    
                // stream.print();
                streamEnv.execute();
            } catch (Exception e) {
                LOG.error(e.getMessage(), e);
            }
        }
    }
    
    Table 1 Parameter description¶

    Parameter

    Description

    Example

    bootstrap.servers

    Kafka connection address

    IP address of the Kafka service 1:9092, IP address of the Kafka service 2:9092, IP address of the Kafka service 3:9092

    group.id

    Kafka consumer group

    test-group

    topic

    Kafka consumption topic

    test-topic

    offset.policy

    Kafka offset policy

    earliest

    output.path

    OBS path to which data will be written

    obs://bucket/output

    checkpoint.path

    Checkpoint OBS path

    obs://bucket/checkpoint

Compiling and Running the Application¶

After the application is developed, upload the JAR package to DLI by referring to Flink Jar Job Examples and check whether related data exists in the OBS path.

  • Prev
  • Next
last updated: 2025-06-16 14:07 UTC - commit: 2d6c283406071bb470705521bc41e86fa3400203
Edit pageReport Documentation Bug
Page Contents
  • Writing Data to OBS Using Flink Jar
    • Overview
    • Environment Preparations
    • Constraints
    • Java Example Code
    • Compiling and Running the Application
© T-Systems International GmbH
  • Contact
  • Data privacy
  • Disclaimer of Liabilities
  • Imprint