• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. Using MRS
  5. Using Spark from Scratch

Using Spark from Scratch

This section describes how to use Spark to submit a sparkPi job. SparkPi, a typical Spark job, is used to calculate the value of pi (π).

Procedure

  1. Prepare the sparkPi program.

    The open source Spark example program contains the sparkPi program. You can download the Spark example program at https://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

    Decompress the Spark example program to obtain the spark-examples_2.11-2.1.0.jar file in the spark-2.1.0-bin-hadoop2.7/examples/jars directory. The spark-examples_2.11-2.1.0.jar example program contains the sparkPi program.

  2. Upload data to OBS.

    1. Log in to the OBS console.
    2. Click Create Bucket to create a bucket and name it. The name must be unique; otherwise the bucket cannot be created. Here name sparkPi will be used as an example.
    3. In the sparkpi bucket, click Create Folder to create the programoutput, and log folders, as shown in Figure 1.
      Figure 1 Folder list
    4. Go to the program folder, click  to select the program package downloaded in Step 1, and click Upload, as shown in Figure 2.
      Figure 2 Program list

  3. Log in to the MRS management console. In the navigation tree on the left, choose Clusters > Active Clusters and click the cluster named mrs_20160907. The mrs_20160907 cluster was created in section Creating a Cluster.
  4. Submit a sparkPi job.

    1. Select Job Management. On the Job tab page, click Create to go to the Create Job page, as shown in Figure 3.

      Only when the mrs_20160907 cluster is in the running state can jobs be submitted.

      Figure 3 Creating a Spark job

      Table 1 describes parameters for job configuration. The following is a job configuration example:

      • Type: Select Spark.
      • Name: For example, job_spark.
      • Program Path:

        Set the path to the address that stores the program on OBS. Replace the bucket name and program name with the names of the bucket and program that you created in 2.c. For example, s3a://sparkpi/program/spark-examples_2.11-2.1.0.jar.

      • Parameter:

        Indicate the main class of the program to be executed, for example, org.apache.spark.examples.SparkPi 10.

      • Export to:

        Set the path to the address that stores the job output files on OBS. Replace the bucket name and output name with the names of the bucket and file folder that you created in 2.c. For example, s3a://sparkpi/output.

      • Log path:

        Set the path to the address that stores the job log files on OBS. Replace the bucket name and log name with the names of the bucket and file folder that you created in 2.c. For example, s3a://sparkpi/log.

      A job will be executed immediately after being created successfully.

      Table 1 Job configuration information

      Parameter

      Description

      Type

      Job type

      Possible types include:

      • MapReduce
      • Spark
      • Spark Script
      • Hive Script
      NOTE:

      To add jobs of the Spark and Hive types, you need to select Spark and Hive components when creating a cluster and the cluster must be in the running state. Spark Script jobs support Spark SQL only, and Spark supports Spark Core and Spark SQL.

      Name

      Job name

      This parameter consists of 1 to 64 characters, including letters, digits, hyphens (-), or underscores (_). It cannot be null.

      NOTE:

      Identical job names are allowed but not recommended.

      Program Path

      Address of the JAR file of the program for executing jobs

      NOTE:

      When configuring this parameter, click OBS or HDFS, specify the file path, and click OK.

      This parameter cannot be null.

      This parameter must meet the following requirements:

      • A maximum of 1023 characters are allowed, but special characters (*?<">|\) are not allowed. The address cannot be empty or full of spaces.
      • The path varies depending on the file system:
        • OBS: The path must start with s3a://, for example, s3a://wordcount/program/hadoop-mapreduce-examples-2.7.x.jar.
        • HDFS: The path must start with /user.
      • Spark Script must end with .sql; MapReduce and Spark must end with .jar. sql and jar are case-insensitive.

      Parameter

      Key parameter for executing jobs

      This parameter is assigned by an internal function. MRS is only responsible for inputting the parameter. Separate parameters with spaces.

      Format: package name.class name

      A maximum of 2047 characters are allowed, but special characters (;|&>',<$) are not allowed. This parameter can be empty.

      NOTE:

      When you enter parameters containing sensitive information, for example, a password for login, you can add an at sign (@) before the parameters to encrypt the parameter values and prevent persistence of sensitive information in the form of plaintext. Therefore, when you view job information on the MRS management console, sensitive information will be displayed as asterisks (*).

      Example: username=admin @password=admin_123

      Import from

      Address for inputting data

      NOTE:

      When configuring this parameter, click OBS or HDFS, specify the file path, and click OK.

      The path varies depending on the file system:
      • OBS: The path must start with s3a://.
      • HDFS: The path must start with /user.

      A maximum of 1023 characters are allowed, but special characters (*?<">|\) are not allowed. This parameter can be empty.

      Export to

      Address for outputting data

      NOTE:

      When configuring this parameter, click OBS or HDFS, specify the file path, and click OK.

      The path varies depending on the file system:
      • OBS: The path must start with s3a://.
      • HDFS: The path must start with /user.

      A maximum of 1023 characters are allowed, but special characters (*?<">|\) are not allowed. This parameter can be empty.

      Log path

      Address for storing job logs that record job running status

      NOTE:

      When configuring this parameter, click OBS or HDFS, specify the file path, and click OK.

      The path varies depending on the file system:
      • OBS: The path must start with s3a://.
      • HDFS: The path must start with /user.

      A maximum of 1023 characters are allowed, but special characters (*?<">|\) are not allowed. This parameter can be empty.

  5. View the job execution results.

    1. Go to the Job Management tab page. On the Job tab page, check whether the jobs are complete.

      The job operation takes a while. After the jobs are complete, refresh the job list, as shown in Figure 4.

      Figure 4 Job list

      You cannot execute a successful or failed job, but can add or copy the job. After setting job parameters, you can submit the job again.

    2. Go to the OBS directory and query job output information.

      In the sparkpi > output directory of OBS, you can query and download the job output files.

    3. Go to the OBS directory and check the detailed job execution results.

      In the sparkpi > log directory of OBS, you can query and download the job execution logs by job ID, as shown in Figure 5.

      Figure 5 Log list

  6. Terminate a cluster.

    For details, see Terminating a Cluster in the User Guide.