Overview of HDFS File System Directories¶
This section describes the directory structure in HDFS, as shown in the following table.
Path | Type | Function | Whether the Directory Can Be Deleted | Deletion Consequence |
---|---|---|---|---|
/tmp/spark/sparkhive-scratch | Fixed directory | Stores temporary files of metastore sessions in Spark JDBCServer. | No | Failed to run the task. |
/tmp/sparkhive-scratch | Fixed directory | Stores temporary files of metastore session that are executed using Spark CLI. | No | Failed to run the task. |
/tmp/carbon/ | Fixed directory | Stores the abnormal data in this directory if abnormal CarbonData data exists during data import. | Yes | Error data is lost. |
/tmp/Loader-${Job name}_${MR job ID} | Temporary directory | Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed. | No | Failed to run the Loader HBase Bulkload job. |
/tmp/logs | Fixed directory | Stores the collected MR task logs. | Yes | MR task logs are lost. |
/tmp/archived | Fixed directory | Archives the MR task logs on HDFS. | Yes | MR task logs are lost. |
/tmp/hadoop-yarn/staging | Fixed directory | Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs. | No | Services are running improperly. |
/tmp/hadoop-yarn/staging/history/done_intermediate | Fixed directory | Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed. | No | MR task logs are lost. |
/tmp/hadoop-yarn/staging/history/done | Fixed directory | The periodic scanning thread periodically moves the done_intermediate log file to the done directory. | No | MR task logs are lost. |
/tmp/mr-history | Fixed directory | Stores the historical record files that are pre-loaded. | No | Historical MR task log data is lost. |
/tmp/hive | Fixed directory | Stores Hive temporary files. | No | Failed to run the Hive task. |
/tmp/hive-scratch | Fixed directory | Stores temporary data (such as session information) generated during Hive running. | No | Failed to run the current task. |
/user/{user}/.sparkStaging | Fixed directory | Stores temporary files of the SparkJDBCServer application. | No | Failed to start the executor. |
/user/spark/jars | Fixed directory | Stores running dependency packages of the Spark executor. | No | Failed to start the executor. |
/user/loader | Fixed directory | Stores dirty data of Loader jobs and data of HBase jobs. | No | Failed to execute the HBase job. Or dirty data is lost. |
/user/loader/etl_dirty_data_dir | ||||
/user/loader/etl_hbase_putlist_tmp | ||||
/user/loader/etl_hbase_tmp | ||||
/user/mapred | Fixed directory | Stores Hadoop-related files. | No | Failed to start Yarn. |
/user/hive | Fixed directory | Stores Hive-related data by default, including the depended Spark lib package and default table data storage path. | No | User data is lost. |
/user/omm-bulkload | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
/user/hbase | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
/sparkJobHistory | Fixed directory | Stores Spark event log data. | No | The History Server service is unavailable, and the task fails to be executed. |
/flume | Fixed directory | Stores data collected by Flume from HDFS. | No | Flume runs improperly. |
/mr-history/tmp | Fixed directory | Stores logs generated by MapReduce jobs. | Yes | Log information is lost. |
/mr-history/done | Fixed directory | Stores logs managed by MR JobHistory Server. | Yes | Log information is lost. |
/tenant | Created when a tenant is added. | Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the /tenant directory based on the tenant name. For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. You can customize the storage path. | No | The tenant account is unavailable. |
/apps{1~5}/ | Fixed directory | Stores the Hive package used by WebHCat. | No | Failed to run the WebHCat tasks. |
/hbase | Fixed directory | Stores HBase data. | No | HBase user data is lost. |
/hbaseFileStream | Fixed directory | Stores HFS files. | No | The HFS file is lost and cannot be restored. |
/ats/active | Fixed directory | HDFS path used to store the timeline data of running applications. | No | Failed to run the tez task after the directory deletion. |
/ats/done | Fixed directory | HDFS path used to store the timeline data of completed applications. | No | Automatically created after the deletion. |
/flink | Fixed directory | Stores the checkpoint task data. | No | Failed to run tasks after the deletion. |
Path | Type | Function | Whether the Directory Can Be Deleted | Deletion Consequence |
---|---|---|---|---|
/tmp/spark2x/sparkhive-scratch | Fixed directory | Stores temporary files of metastore session in Spark2x JDBCServer. | No | Failed to run the task. |
/tmp/sparkhive-scratch | Fixed directory | Stores temporary files of metastore sessions that are executed in CLI mode using Spark2x CLI. | No | Failed to run the task. |
/tmp/logs/ | Fixed directory | Stores container log files. | Yes | Container log files cannot be viewed. |
/tmp/carbon/ | Fixed directory | Stores the abnormal data in this directory if abnormal CarbonData data exists during data import. | Yes | Error data is lost. |
/tmp/Loader-${Job name}_${MR job ID} | Temporary directory | Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed. | No | Failed to run the Loader HBase Bulkload job. |
/tmp/hadoop-omm/yarn/system/rmstore | Fixed directory | Stores the ResourceManager running information. | Yes | Status information is lost after ResourceManager is restarted. |
/tmp/archived | Fixed directory | Archives the MR task logs on HDFS. | Yes | MR task logs are lost. |
/tmp/hadoop-yarn/staging | Fixed directory | Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs. | No | Services are running improperly. |
/tmp/hadoop-yarn/staging/history/done_intermediate | Fixed directory | Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed. | No | MR task logs are lost. |
/tmp/hadoop-yarn/staging/history/done | Fixed directory | The periodic scanning thread periodically moves the done_intermediate log file to the done directory. | No | MR task logs are lost. |
/tmp/mr-history | Fixed directory | Stores the historical record files that are pre-loaded. | No | Historical MR task log data is lost. |
/tmp/hive-scratch | Fixed directory | Stores temporary data (such as session information) generated during Hive running. | No | Failed to run the current task. |
/user/{user}/.sparkStaging | Fixed directory | Stores temporary files of the SparkJDBCServer application. | No | Failed to start the executor. |
/user/spark2x/jars | Fixed directory | Stores running dependency packages of the Spark2x executor. | No | Failed to start the executor. |
/user/loader | Fixed directory | Stores dirty data of Loader jobs and data of HBase jobs. | No | Failed to execute the HBase job. Or dirty data is lost. |
/user/loader/etl_dirty_data_dir | ||||
/user/loader/etl_hbase_putlist_tmp | ||||
/user/loader/etl_hbase_tmp | ||||
/user/oozie | Fixed directory | Stores dependent libraries required for Oozie running, which needs to be manually uploaded. | No | Failed to schedule Oozie. |
/user/mapred/hadoop-mapreduce-3.1.1.tar.gz | Fixed files | Stores JAR files used by the distributed MR cache. | No | The MR distributed cache function is unavailable. |
/user/hive | Fixed directory | Stores Hive-related data by default, including the depended Spark lib package and default table data storage path. | No | User data is lost. |
/user/omm-bulkload | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
/user/hbase | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
/spark2xJobHistory2x | Fixed directory | Stores Spark2x eventlog data. | No | The History Server service is unavailable, and the task fails to be executed. |
/flume | Fixed directory | Stores data collected by Flume from HDFS. | No | Flume runs improperly. |
/mr-history/tmp | Fixed directory | Stores logs generated by MapReduce jobs. | Yes | Log information is lost. |
/mr-history/done | Fixed directory | Stores logs managed by MR JobHistory Server. | Yes | Log information is lost. |
/tenant | Created when a tenant is added. | Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the /tenant directory based on the tenant name. For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. You can customize the storage path. | No | The tenant account is unavailable. |
/apps{1~5}/ | Fixed directory | Stores the Hive package used by WebHCat. | No | Failed to run the WebHCat tasks. |
/hbase | Fixed directory | Stores HBase data. | No | HBase user data is lost. |
/hbaseFileStream | Fixed directory | Stores HFS files. | No | The HFS file is lost and cannot be restored. |