Notes and Constraints

Browser Constraints

The following table lists the recommended browser for logging in to DataArts Studio.

Table 1 Browser compatibility

Browser

Recommended Version

Recommended OS

Remarks

Google Chrome

115, 114, and 113

Windows 10

The resolution ranges from 1366x768 px to 1920x1080 px. 1920x1080 px is the optimal resolution for the best display of the console.

Use Constraints

Before using DataArts Studio, you must read and understand the following restrictions:

Table 2 Restrictions for using DataArts Studio

Component

Restriction

Public

  1. DataArts Studio is a one-stop platform that provides data integration, development, and governance capabilities. DataArts Studio has no storage or computing capability and relies on the data lake base.

  2. Only one DataArts Studio instance can be bound to an enterprise project. If an enterprise project already has an instance, no more instance can be added.

  3. Different components of DataArts Studio support different data sources. You need to select a data lake foundation based on your service requirements. For details about the data lakes supported by DataArts Studio, see "Management Center" > "Data Sources Supported by DataArts Studio" in DataArts Studio User Guide.

Management Center

  1. Due to the constraints of Management Center, other components (such as DataArts Architecture, DataArts Quality, and DataArts Catalog) do not support databases or tables whose names contain Chinese characters or periods (.).

  2. You are advised to use different CDM clusters for a data connection agent in Management Center and a CDM migration job. If an agent and CDM job use the same cluster, they may contend for resources during peak hours, resulting in service unavailability.

  3. If you use the same CDM cluster as the agent for multiple connections to MRS clusters with Kerberos authentication enabled, jobs will fail. You are advised to plan multiple CDM clusters based on service requirements.

  4. If a CDM cluster functions as the agent for a data connection in Management Center, the cluster supports a maximum of 200 concurrent active threads. If multiple data connections share an agent, a maximum of 200 SQL, Shell, and Python scripts submitted through the connections can run concurrently. Excess tasks will be queued. You are advised to plan multiple agents based on the workload.

  5. A maximum of 200 data connections can be created in a workspace.

  6. The concurrency restriction for APIs in Management Center is 100 QPS.

DataArts Migration

  1. You can enable automatic backup and restoration of CDM jobs. Backups of CDM jobs are stored in OBS buckets. For details, see DataArts Migration > Job Management > Job Configuration Management in DataArts Studio User Guide.

  2. There is no quota limit for CDM jobs. However, it is recommended that the number of jobs be less than or equal to twice the number of vCPUs in the CDM cluster. Otherwise, job performance may be affected.

  3. The DataArts Migration cluster is deployed in standalone mode. A cluster fault may cause service and data loss. You are advised to use the CDM Job node of DataArts Factory to invoke CDM jobs and select two CDM clusters to improve reliability. For details, see DataArts Factory > Nodes > CDM Job in DataArts Studio User Guide.

  4. If changes occur in the connected data source (for example, the MRS cluster capacity is expanded), you need to edit and save the connection.

  5. If you have uploaded an updated version of a driver, you must restart the CDM cluster for the new driver to take effect.

  6. The number of concurrent extraction tasks for a job ranges from 1 to 300, and the total number of concurrent extraction tasks for a cluster ranges from 1 to 1,000. The maximum number of concurrent extraction tasks for a cluster depends on the CDM cluster specifications. You are advised to set the maximum number of concurrent extraction tasks to no larger than twice the number of vCPUs. The number of concurrent extraction tasks for a job should not exceed that for a cluster. If the number of concurrent extraction tasks is too large, memory overflow may occur. Exercise caution when changing the maximum number of concurrent extraction tasks.

For more constraints on DataArts Migration, see "DataArts Migration" > "Constraints" in DataArts Studio User Guide.

DataArts Factory

  1. You can enable backup of assets such as scripts and jobs to OBS buckets. For details, see DataArts Factory > O&M and Scheduling > Managing Backups in DataArts Studio User Guide.

  2. The execution history of scripts, jobs, and nodes is stored in OBS buckets. If no OBS bucket is available, you cannot view the execution history.

  3. Resources from an HDFS can be used only by MRS Spark, MRS Flink Job, and MRS MapReduce nodes.

  4. A workspace can contain a maximum of 10,000 scripts, a maximum of 5,000 script directories, and a maximum of 10 directory levels.

  5. A workspace can contain a maximum of 10,000 jobs, a maximum of 5,000 job directories, and a maximum of 10 directory levels.

  6. A maximum of 1,000 execution results can be displayed for RDS SQL, DWS SQL, Hive SQL, DLI SQL, and Spark SQL scripts, and the data volume is less than 3 MB. If the number of execution results exceeds 1,000, you can dump them. A maximum of 10,000 execution results can be dumped.

  7. Only data of the last six months can be displayed on the Monitor Instance and Monitor PatchData pages.

  8. Only notification records of the last 30 days can be displayed.

  9. The download records age out every seven days. When aged out, download records and the data dumped to OBS are both deleted.

DataArts Architecture

  1. DataArts Architecture supports ER modeling and dimensional modeling (only star models).

  2. The maximum size of a file to be imported is 4 MB. A maximum of 3,000 metrics can be imported. A maximum of 500 tables can be exported at a time.

  3. Lookup tables and data standards cannot be created in their root directories.

  4. The quotas for the objects in a workspace are as follows:

    • Subjects: 5,000

    • Data standard directories: 500; data standards: 20,000

    • Business metrics: 100,000

    • Atomic, derivative, and compound metrics: 5,000 for each

  5. The quotas for different custom objects are as follows:

    • Custom subjects: 10

    • Custom tables: 30

    • Custom attributes: 10

    • Custom business metrics: 50

DataArts Quality

  1. The execution duration of data quality jobs depends on the data engine. If the data engine does not have sufficient resources, the execution of data quality jobs may be slow.

  2. A maximum of 50 rules can be configured for a data quality job. If necessary, you can create multiple quality jobs.

  3. By default, a maximum of 1,000 SQL statements associated with a quality job of a data connection can be executed concurrently. Excess SQL statements will be queued. The value ranges from 10 to 1000.

  4. By default, a maximum of 10,000 SQL statements associated with a quality job in a region can be executed concurrently. Excess SQL statements will be queued.

  5. In the Instance Running Status and Instance Alarm Status areas on the Dashboard page on the Metric Monitoring page, data of the last seven days is displayed. On the Alarms, Scenarios, and Metrics pages, data of the last seven, 15, or 30 days can be displayed.

  6. In the Quantity Changes area on the Dashboard page on the Quality Monitoring page, data of 30 days can be displayed. In the Alarm Trend by Severity and Rule Quantity Trend areas, data of the last seven days can be displayed.

  7. Quality reports are generated in batches on the T+1 day and retained for 90 days.

  8. If you export a quality report to OBS, the report is exported to the OBS path for storing job logs configured for the workspace. The exported record is retained for three months.

DataArts Catalog

  1. A maximum of 100 metadata collection tasks can be created in a workspace.

  2. Metadata collection tasks can be obtained through DDL SQL statements of the engine. You are not advised to collect more than 1,000 tables through a single task. If necessary, you can create multiple collection tasks. In addition, you need to set the scheduling time and frequency properly based on your requirements to avoid heavy access and connection pressure on the engine. The recommended settings are as follows:

    • If your service requires a metadata validity period of one day, set the scheduling period to max(one day, one-off collection period). This rule also applies to other scenarios.

    • If your service mainly runs in the daytime, set a scheduling time in the night during which the metadata collection has the minimum impact on the data source. This rule also applies to other scenarios.

  3. Only the jobs that are scheduled and executed in DataArts Factory generate data lineages. Tested jobs do not generate data lineages.

  4. Historical data connections of the last seven days, 15 days, or 30 days can be displayed on the Dashboard page on the Metadata Collection page.

DataArts DataService

  1. The shared edition is designed only for development and testing. You are advised to use the exclusive edition which is superior to the shared edition.

  2. A maximum of five DataArts DataService Exclusive clusters can be created in a DataArts Studio instance. Each cluster must be associated with a workspace and cannot belong to multiple workspaces.

  3. After a DataArts DataService Exclusive cluster is created, its specifications cannot be modified, and its version cannot be upgraded.

  4. The maximum number of DataArts DataService Exclusive APIs that can be created in a DataArts Studio instance is the quota of DataArts DataService Exclusive APIs (5,000 by default) or the total API quotas of the clusters in the instance, whichever is smaller. For example, if the quota of DataArts DataService Exclusive APIs for a DataArts Studio instance is 5,000, and two clusters whose API quotas are 500 and 2,000 respectively have been created in the instance, a maximum of 2,500 DataArts DataService Exclusive APIs can be created in the instance.

  5. The maximum number of DataArts DataService Exclusive APIs that can be created in a workspace is the quota of DataArts DataService Exclusive APIs (configured in the workspace information) or the total API quotas of the clusters in the instance, whichever is smaller. For example, if the quota of DataArts DataService Exclusive APIs for a workspace is 800, and two clusters whose API quotas are both 500 have been created in the workspace, a maximum of 800 DataArts DataService Exclusive APIs can be created in the workspace.

  6. A maximum of 1,000 applications can be created in a workspace.

  7. A maximum of 500 throttling policies can be created in a workspace.

  8. DataArts DataService allows you to trace and save events. For each event, DataArts DataService records information such as the date, description, and time source (a cluster). Events are retained for 30 days.

  9. From the log of a DataArts DataService Exclusive cluster, you can only obtain the last 100 access records of the cluster, evenly from all nodes of the cluster.

  10. In the APIs Called, APIs Published, Top 5 APIs by Call Rate, Top 5 APIs by Call Duration, and Top 5 APIs by Call Quantity areas on the Overview page, data of the last 12 hours, one day, seven days, or 30 days can be displayed. The total number of API calls is the sum of the number of APIs made in the last seven days (excluding the current day).