How Do I Configure atop and kdump on Linux ECSs for Performance Analysis?

Scenarios

This section describes how you can configure atop and kdump on Linux ECSs for performance analysis.

The method for configuring atop varies with the OS version.

atop

kdump

Introduction to atop

atop is a monitor for Linux that can report the activity of all processes and resource consumption by all processes at regular intervals. It shows system-level activity related to the CPU, memory, disks, and network layers for every process. It also logs system and process activities daily and saves the logs in disks for long-term analysis.

Configuring atop for CentOS 8 or RHEL 8

  1. Run the following command to download the atop package:

    # wget https://www.atoptool.nl/download/atop-2.6.0-1.el8.x86_64.rpm

  2. Run the following command to install the package:

    # rpm -ivh atop-2.6.0-1.el8.x86_64.rpm

  3. Run the following command to modify the configuration file of atop:

    # vi /etc/default/atop

    Modify the following parameters, save the modification, and exit:

    • Change the value of LOGINTERVAL to, for example, 15. The default value of LOGINTERVAL is 600, in seconds.

    • Change the value of LOGGENERATIONS to, for example, 3. The default retention period of atop logs is 28 days.

      LOGINTERVAL=15
      LOGGENERATIONS=3
      
  1. Run the following command to restart atop:

    # systemctl restart atop

  2. Run the following command to check the status of atop. If active (running) is displayed in the output, atop is running properly.

    # systemctl status atop

    atop.service - Atop advanced performance monitor
    Loaded: loaded (/usr/lib/systemd/system/atop.service; enabled; vendor preset: disabled)
    Active: active (running) since Sat 2021-06-19 14:46:10 CST; 8s ago
    Docs: man:atop(1)
    Process: 6391 ExecStartPost=/usr/bin/find ${LOGPATH} -name atop_* -mtime +${LOGGENERATIONS} -exec rm -v {} ; (code=exited, status=0/SUCCESS)
    Process: 6388 ExecStartPre=/bin/sh -c test -n "$LOGGENERATIONS" -a "$LOGGENERATIONS" -eq "$LOGGENERATIONS" (code=exited, status=0/SUCCESS)
    Process: 6387 ExecStartPre=/bin/sh -c test -n "$LOGINTERVAL" -a "$LOGINTERVAL" -eq "$LOGINTERVAL" (code=exited, status=0/SUCCESS)
    Main PID: 6390 (atop)
    Tasks: 1 (limit: 23716)
    Memory: 4.1M
    CGroup: /system.slice/atop.service
             └─6390 /usr/bin/atop -w /var/log/atop/atop_20210619 15
    
    Jun 19 14:46:10 ecs-centos8 systemd[1]: atop.service: Succeeded.
    Jun 19 14:46:10 ecs-centos8 systemd[1]: Stopped Atop advanced performance monitor.
    Jun 19 14:46:10 ecs-centos8 systemd[1]: Starting Atop advanced performance monitor...
    Jun 19 14:46:10 ecs-centos8 systemd[1]: Started Atop advanced performance monitor.
    

Configuring atop for CentOS 7, RHEL 7, or EulerOS

  1. Run the following command to download the atop package:

    # wget https://www.atoptool.nl/download/atop-2.6.0-1.el7.x86_64.rpm

    Upload the atop-2.6.0-1.el7.x86_64.rpm package to the target ECS.

  2. Run the following command to install atop:

    # rpm -ivh atop-2.6.0-1.el7.x86_64.rpm --nodeps

  3. Run the following command to modify the configuration file of atop:

    # vi /etc/default/atop

    Modify the following parameters, save the modification, and exit:

    • Change the value of LOGINTERVAL to, for example, 15. The default value of LOGINTERVAL is 600, in seconds.

    • Change the value of LOGGENERATIONS to, for example, 3. The default retention period of atop logs is 28 days.

    LOGINTERVAL=15
    LOGGENERATIONS=3
    
  4. Run the following command to start atop:

    # systemctl start atop

  5. Run the following command to check the status of atop. If active (running) is displayed in the output, atop is running properly.

    # systemctl status atop

    atop will sample system performance data based on the specified interval and save the data to the /var/log/atop/ directory.

    atop.service - Atop advanced performance monitor
    Loaded: loaded (/usr/lib/systemd/system/atop.service; enabled; vendor preset: disabled)
    Active: active (running) since Sat 2021-06-19 11:49:47 CST; 2h 27min ago
    Docs: man:atop(1)
    Process: 8231 ExecStartPost=/usr/bin/find ${LOGPATH} -name atop_* -mtime +${LOGGENERATIONS} -exec rm -v {} ; (code=exited, status=0/SUCCESS)
    Process: 8225 ExecStartPre=/bin/sh -c test -n "$LOGGENERATIONS" -a "$LOGGENERATIONS" -eq "$LOGGENERATIONS" (code=exited, status=0/SUCCESS)
    Process: 8223 ExecStartPre=/bin/sh -c test -n "$LOGINTERVAL" -a "$LOGINTERVAL" -eq "$LOGINTERVAL" (code=exited, status=0/SUCCESS)
    Main PID: 8229 (atop)
    CGroup: /system.slice/atop.service
             └─8229 /usr/bin/atop -w /var/log/atop/atop_20210619 15
    
    Jun 19 11:49:47 ecs-centos7 systemd[1]: Stopped Atop advanced performance monitor.
    Jun 19 11:49:47 ecs-centos7 systemd[1]: Starting Atop advanced performance monitor...
    Jun 19 11:49:47 ecs-centos7 systemd[1]: Started Atop advanced performance monitor.
    

Configuring atop for SUSE 12 or SUSE 15

  1. Run the following command to download the atop source package:

    # wget https://www.atoptool.nl/download/atop-2.6.0-1.src.rpm

  2. Run the following command to install the package:

    # rpm -ivh atop-2.6.0-1.src.rpm

  3. Run the following command to install atop dependencies.

    # zypper -n install rpm-build ncurses-devel zlib-devel

  4. Run the following command to compile atop:

    # cd /usr/src/packages/SPECS

    # rpmbuild -bb atop-2.6.0.spec

  5. Run the following command to install atop:

    # cd /usr/src/packages/RPMS/x86_64

    # rpm -ivh atop-2.6.0-1.x86_64.rpm

  6. Run the following command to modify the configuration file of atop:

    # vi /etc/default/atop

    Modify the following parameters, save the modification, and exit:

    • Change the value of LOGINTERVAL to, for example, 15. The default value of LOGINTERVAL is 600, in seconds.

    • Change the value of LOGGENERATIONS to, for example, 3. The default retention period of atop logs is 28 days.

    LOGINTERVAL=15
    LOGGENERATIONS=3
    
  1. Run the following command to restart atop:

    # systemctl restart atop

  2. Run the following command to check the status of atop. If active (running) is displayed in the output, atop is running properly.

    # systemctl status atop

    atop.service - Atop advanced performance monitor
    Loaded: loaded (/usr/lib/systemd/system/atop.service; enabled; vendor preset: disabled)
    Active: active (running) since Sat 2021-06-19 16:50:01 CST; 6s ago
    Docs: man:atop(1)
    Process: 2242 ExecStartPost=/usr/bin/find ${LOGPATH} -name atop_* -mtime +${LOGGENERATIONS} -exec rm -v {} ; (code=exited, status=0/SUCCESS)
    Process: 2240 ExecStartPre=/bin/sh -c test -n "$LOGGENERATIONS" -a "$LOGGENERATIONS" -eq "$LOGGENERATIONS" (code=exited, status=0/SUCCESS)
    Process: 2239 ExecStartPre=/bin/sh -c test -n "$LOGINTERVAL" -a "$LOGINTERVAL" -eq "$LOGINTERVAL" (code=exited, status=0/SUCCESS)
    Main PID: 2241 (atop)
    Tasks: 1 (limit: 4915)
    CGroup: /system.slice/atop.service
             └─2241 /usr/bin/atop -w /var/log/atop/atop_20210619 15
    
    Jun 19 16:50:01 ecs-suse15 systemd[1]: Starting Atop advanced performance monitor...
    Jun 19 16:50:01 ecs-suse15 systemd[1]: Started Atop advanced performance monitor.
    

Configuring atop Using Its Source Package for SUSE, Fedora, Debian or Ubuntu

  1. Download the atop source package.

    # wget https://www.atoptool.nl/download/atop-2.6.0.tar.gz

  1. Decompress the source package.

    # tar -zxvf atop-2.6.0.tar.gz

  2. Query the systemctl version.

    # systemctl --version

    If the version is 220 or later, go to the next step.

    Otherwise, delete parameter --now from the Makefile of atop.

    # vi atop-2.6.0/Makefile

    Delete parameter --now following the systemctl command.

    then   /bin/systemctl disable  atop     2> /dev/null; \
            /bin/systemctl disable  atopacct 2> /dev/null; \
            /bin/systemctl daemon-reload;                   \
            /bin/systemctl enable   atopacct;          \
            /bin/systemctl enable   atop;              \
            /bin/systemctl enable   atop-rotate.timer; \
    
  3. Install atop dependencies.

    • SUSE 12 or SUSE 15

      # zypper -n install make gcc zlib-devel ncurses-devel

    • Fedora

      # yum install make gcc zlib-devel ncurses-devel -y

    • Debian 9, Debian 10, or Ubuntu

      # apt install make gcc zlib1g-dev libncurses5-dev libncursesw5-dev -y

  4. Compile and install atop.

    # cd atop-2.6.0

    # make systemdinstall

  5. Modify the configuration file of atop.

    # vi /etc/default/atop

    Make the following modifications, save the file, and exit.

    • Change the value of LOGINTERVAL to, for example, 15. The default value of LOGINTERVAL is 600, in seconds.

    • Change the value of LOGGENERATIONS to, for example, 3. The default retention period of atop logs is 28 days.

      LOGOPTS=""
      LOGINTERVAL=15
      LOGGENERATIONS=3
      LOGPATH=/var/log/atop
      
  6. Restart atop.

    # systemctl restart atop

  7. Run the following command to check the status of atop. If active (running) is displayed in the output, atop is running properly.

    # systemctl status atop

    atop.service - Atop advanced performance monitor
       Loaded: loaded (/lib/systemd/system/atop.service; enabled)
       Active: active (running) since Sun 2021-07-25 19:29:40 CST; 4s ago
         Docs: man:atop(1)
      Process: 5192 ExecStartPost=/usr/bin/find ${LOGPATH} -name atop_* -mtime +${LOGGENERATIONS} -exec rm -v {} ; (code=exited, status=0/SUCCESS)
      Process: 5189 ExecStartPre=/bin/sh -c test -n "$LOGGENERATIONS" -a "$LOGGENERATIONS" -eq "$LOGGENERATIONS" (code=exited, status=0/SUCCESS)
      Process: 5188 ExecStartPre=/bin/sh -c test -n "$LOGINTERVAL" -a "$LOGINTERVAL" -eq "$LOGINTERVAL" (code=exited, status=0/SUCCESS)
     Main PID: 5191 (atop)
       CGroup: /system.slice/atop.service
               └─5191 /usr/bin/atop -w /var/log/atop/atop_20210725 15
    
    Jul 25 19:29:40 atop systemd[1]: Starting Atop advanced performance monitor...
    Jul 25 19:29:40 atop systemd[1]: Started Atop advanced performance monitor.
    

Precautions for Configuring kdump

The method for configuring kdump described in this section applies to KVM ECSs running EulerOS or CentOS 7.x. For details, see Documentation for kdump.

Introduction to kdump

kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. In the event of a kernel crash, kdump boots another Linux kernel and uses it to export an image of RAM, which is known as vmcore and can be used to debug and determine the cause of the crash.

Configuring kdump

  1. Run the following command to check whether kexec-tools is installed:

    # rpm -q kexec-tools

    If it is not installed, run the following command to install it:

    # yum install -y kexec-tools

  2. Run the following command to enable kdump to run at system startup:

    # systemctl enable kdump

  3. Configure the parameters for the crash kernel to reserve the memory for the capture kernel.

    Check whether the parameters are configured.

    # grep crashkernel /proc/cmdline

    If the command output is displayed, this parameter has been configured.

    Edit the /etc/default/grub file to configure the following parameters:

    GRUB_TIMEOUT=5
    GRUB_DEFAULT=saved
    GRUB_DISABLE_SUBMENU=true
    GRUB_TERMINAL_OUTPUT="console"
    GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=rhel00/root rd.lvm.lv=rhel00/swap
    rhgb quiet"
    GRUB_DISABLE_RECOVERY="true"
    

    Locate parameter GRUB_CMDLINE_LINUX and add crashkernel=auto after it.

  4. Run the following command for the configuration to take effect:

    # grub2-mkconfig -o /boot/grub2/grub.cfg

  5. Open the /etc/kdump.conf file, locate parameter path, and add /var/crash after it.

    path  /var/crash
    

    By default, the file is saved in the /var/crash directory.

    You can save the file to another directory, for example, /home/kdump. Then add /home/kdump after parameter path:

    path  /home/kdump
    

    Note

    There must be enough space in the specified path for storing the vmcore file. It is recommended that the available space be greater than or equal to the RAM size. You can also store the vmcore file on a shared device such as SAN or NFS.

  6. Set the vmcore dump level.

    Add the following content to file /etc/kdump.conf. If the content already exists, skip this step.

    core_collector makedumpfile -d 31 -c
    

    where

    -c indicates compressing the vmcore file.

    -d indicates leaving out irrelevant data. Generally, the value following -d is 31, which is calculated based on the following values. You can adjust the value if needed.

    zero pages   = 1
    cache pages   = 2
    cache private = 4
    user  pages   = 8
    free  pages   = 16
    
  7. Run the following command to restart the system for the configurations to take effect:

    # reboot

Checking Whether kdump Configurations Have Taken Effect

  1. Run the following command and check whether crashkernel=auto is displayed:

    # cat /proc/cmdline |grep crashkernel

    BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.44.5.10.h142.x86_64 root=UUID=6407d6ac-c761-43cc-a9dd-1383de3fc995 ro crash_kexec_post_notifiers softlockup_panic=1 panic=3 reserve_kbox_mem=16M nmi_watchdog=1 rd.shell=0 fsck.mode=auto fsck.repair=yes net.ifnames=0 spectre_v2=off nopti noibrs noibpb crashkernel=auto LANG=en_US.UTF-8
    
  2. Run the following command and check whether the configuration in the output is correct:

    # grep core_collector /etc/kdump.conf |grep -v ^"#"

    core_collector makedumpfile -l --message-level 1 -d 31
    
  3. Run the following command and check whether the path configuration in the output is correct:

    # grep path /etc/kdump.conf |grep -v ^"#"

    path /var/crash
    
  4. Run the following command and check whether the value of Active in the output is active (exited):

    # systemctl status kdump

    ● kdump.service - Crash recovery kernel arming
    Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
    Active: active (exited) since Tue 2019-04-09 19:30:24 CST; 8min ago
    Process: 495 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
    Main PID: 495 (code=exited, status=0/SUCCESS)
    CGroup: /system.slice/system-hostos.slice/kdump.service
    
  5. Run the following test command:

    # echo c > /proc/sysrq-trigger

    After the command is executed, kdump will be triggered, the system will be restarted, and the generated vmcore file will be saved to the path specified by path.

  6. Run the following command to check whether the vmcore file has been generated in the specified path, for example, /var/crash/:

    # ll /var/crash/