• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. Using MRS
  5. Using Flume
  6. Introduction

Introduction

Process

The process for collecting logs using Flume is as follows:

  1. Install the Flume client.
  2. Configure the Flume server and client parameters.
  3. Collect and query logs using the Flume client.
  4. Stop and uninstall the Flume client.

Flume Client

A Flume client consists of the source, channel, and sink. The source sends the data to the channel, and then the sink transmits the data from the channel to the external device.

Table 1 Module description

Module

Description

Source

A source receives or generates data and sends the data to one or more channels. Sources can work in either data-driven or polling mode.

Typical sources include:

  • Syslog and Netcat, which are integrated in the system to receive data
  • Exec and SEQ that generate event data automatically
  • Avro that is used for communication between agents

A source must be associated with at least one channel.

Channel

A channel is used to buffer data between a source and a sink. After the sink transmits the data to the next channel or the destination, the cache is deleted automatically.

The persistency of the channels varies with the channel types:

  • Memory channel: no persistency
  • File channel: persistency implemented based on write-ahead logging (WAL)
  • JDBC channel: persistency implemented based on the embedded database

Channels support the transaction feature to ensure simple sequential operations. A channel can work with sources and sinks of any quantity.

Sink

A sink transmits data to the next hop or destination. After the transmission is complete, it deletes the data from the channel.

Typical sinks include:

  • HDFS and Kafka that store data to the destination
  • Null sink that automatically consumes the data
  • Avro that is used for communication between agents

A sink must be associated with at least one channel.

A Flume client can have multiple sources, channels, and sinks. A source can send data to multiple channels, and then multiple sinks send the data out of the client.

Multiple Flume clients can be cascaded. That is, a sink can send data to the source of another client.

Supplementary Information

  1. What are the reliability measures of Flume?
    • The transaction mechanism is implemented between sources and channels, and between channels and sinks.
    • The sink processor supports failover and load balancing.
      The following is an example of the load balancing configuration:
      server.sinkgroups=g1
      server.sinkgroups.g1.sinks=k1 k2
      server.sinkgroups.g1.processor.type=load_balance
      server.sinkgroups.g1.processor.backoff=true
      server.sinkgroups.g1.processor.selector=random
  1. What are the precautions for the aggregation and cascading of multiple Flume clients?
    • Use the Avro or Thrift protocol for cascading.
    • When the aggregation end contains multiple nodes, evenly distribute the clients to these nodes. Do not connect all the clients to a single node.