• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Glossary

Glossary

Term

Description

A – E

ACID

Atomicity, Consistency, Isolation, and Durability (ACID). These are a set of properties of database transactions in a DBMS.

cluster ring

A cluster ring consists of several physical servers. The primary-standby-secondary relationships among its DNs do not involve external DNs. That is, none of the primary, standby, or secondary counterparts of DNs belonging to the ring are deployed in other rings. A ring is the smallest unit used for scaling.

Bgwriter

A background write thread created when the database starts. The thread pushes dirty pages in the database to a permanent device (such as a disk).

bit

The smallest unit of information handled by a computer. One bit is expressed as a 1 or a 0 in a binary numeral, or as a true or a false logical condition. A bit is physically represented by an element such as high or low voltage at one point in a circuit, or a small spot on a disk that is magnetized in one way or the other. A single bit conveys little information a human would consider meaningful. A group of eight bits, however, makes up a byte, which can be used to represent many types of information, such as a letter of the alphabet, a decimal digit, or other character.

Bloom filter

Bloom filter is a space-efficient binary vectorized data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, in other words, a query returns either "possibly in set (possible error)" or "definitely not in set". In the cases, Bloom filter sacrificed the accuracy for time and space.

CCN

Control Coordinator Node (CCN) centrally, dynamically manages load in DWS. It is responsible for determining, queuing, and scheduling complex operations in each CN for dynamic load management.

CIDR

Classless Inter-Domain Routing (CIDR). Whereas classful network design for IPv4 sized the network prefix as one or more 8-bit groups, resulting in the blocks of Class A (8-bit), B (16-bit), or C (24-bit) addresses, CIDR allocates address space on any address bit boundary. A CIDR address is in the format of IP address/Number of bits in a network ID. For example, in 192.168.23.35/21, 21 indicates that the first 21 bits are the network prefix and others are the host ID.

Cgroup

A control group (Cgroup), also called a priority group (PG) in DWS. The Cgroup is a kernel feature of SUSE Linux and Red Hat that can limit, account for, and isolate the resource usage of a collection of processes.

CLI

Command-line interface (CLI). Users use the CLI to interact with applications. Its input and output are based on texts. Commands are entered through keyboards or similar devices and are compiled and executed by applications. The results are displayed in text or graphic forms on the terminal interface.

CM

Cluster Manager (CM) manages and monitors the running status of functional units and physical resources in the distributed system, ensuring stable running of the entire system.

CMS

The Cluster Management Service (CMS) component manages the cluster status.

CN

The Coordinator Node (CN) stores database metadata, splits query tasks and supports their execution, then compiles the query results returned from DNs.

CU

Compression Unit (CU) is the smallest storage unit in a column-storage table.

core file

A file that is created when memory overwriting, assertion failures, or access to invalid memory occurs in a process, causing it to fail. This file is then used for further analysis.

A core file stores memory dump data, and supports binary mode and specified ports. The name of a core file consists of the word "core" and the OS process ID.

The core file is available regardless of the type of platform.

core dump

When a program stops abnormally, the core dump, memory dump, or system dump records the state of the working memory of the program at that point in time. The states of key programs are often dumped at the same time. For example, information about processor registers, including program metrics, stack pointers, memory management, other processors, and OS flags are often dumped at the same time. A core dump is often used for diagnostics and debugging computer programs. A core dump is often used to assist diagnosis and computer program debugging.

DBA

A database administrator (DBA) instructs or executes database maintenance operations.

DBLINK

An object of the path from one database to another. A remote database object can be queried with dblink.

DBMS

Database Management System (DBMS) is a piece of system management software that allows users to access information in a database. This is a collection of programs that allows you to access, manage, and query data in a database. A DBMS can be classified as memory DBMS or disk DBMS based on the location of the data.

DCL

Data control language (DCL)

DDL

Data definition language (DDL)

DML

Data manipulation language (DML)

DN

The Data Node (DN) performs table data storage and query operations.

ETCD

The Editable Text Configuration Daemon (ETCD) is a distributed key-value storage system used for configuration sharing and service discovery (registration and search).

ETL

Extract-Transform-Load refers to the process of data transmission from the source to the target database.

backup

A backup, or the process of backing up, refers to the copying and archiving of computer data in case of data loss.

backup and restoration

A collection of concepts, procedures, and strategies to protect data loss caused by invalid media or misoperations.

standby server

A node in the DWS HA solution. It functions as a backup of the primary server. If the primary server is behaving abnormally, the standby server is promoted to primary, ensuring data service continuity.

crash

A crash (or system crash) is an event in which a computer or a program (such as a software application or an OS) ceases to function properly. Often the program will exit after encountering this type of error. The program experiencing the crash can hang or freeze until a crash reporting service reports the crash and any details relating to it. If the program is a critical part of the OS kernel, the entire computer may crash (possibly resulting in a fatal system error).

encoding

Encoding is representing data and information using code so that it can be processed and analyzed by a computer. Characters, digits, and other objects can be converted into digital code, or information and data can be converted into the required electrical pulse signals based on predefined rules.

encoding technology

A technology that presents data using a specific set of characters, which can be identified by computer hardware and software.

table

A set of columns and rows. Each column is referred to as a field. The value in each field represents a data type. For example, if a table contains people's names, cities, and states, it has three columns: Name, City, and State. In every row in the table, the Name column contains a name, the City column contains a city, and the State column contains a state.

tablespace

A tablespace is a logical storage structure that contains tables, indexes, large objects, and long data. A tablespace provides an abstract layer between physical data and logical data, and provides storage space for all database objects. When you create a table, you can specify which tablespace it belongs to.

concurrency control

A DBMS service that ensures data integrity when multiple transactions are concurrently executed in a multi-user environment. In a multi-threaded environment, DWS concurrency control ensures that database operations are safe and all database transactions remain consistent at any given time.

query

A request sent to the database with the purpose of updating, modifying, querying, or deleting information.

query operator

An iterator or a query tree node, which is a basic unit for the execution of a query. Execution of a query can be split into one or more query operators. Common query operators include scan, join, and aggregation.

query fragment

Each query task can be split into one or more query fragments. Each query fragment consists of one or more query operators and can independently run on a node. Query fragments exchange data through data flow operators.

durability

One of the ACID features of database transactions. Durability indicates that transactions that have been committed will permanently survive and not be rolled back.

stored procedure

A group of SQL statements compiled into a single execution plan and stored in a large database system. Users can specify a name and parameters (if any) for a stored procedure to execute the procedure.

OS

An operating system (OS) is loaded by a bootstrap program to a computer to manage other programs in the computing applications on a computer or similar device.

secondary server

To ensure high cluster availability, the primary server synchronizes logs to the secondary standby server if data synchronization between the primary and standby servers fails. If the primary server suddenly breaks down, the standby server is promoted to primary and synchronizes logs from the secondary server for the duration of the breakdown.

BLOB

Binary large object (BLOB) is a collection of binary data stored in a database, such as videos, audio, and images.

segment

A segment in the database indicates a part containing one or more regions. Region is the smallest range of a database and consists of data blocks. One or more segments comprise a tablespace.

F – J

failover

Automatic switchover from a faulty node to its standby node. Reversely, automatic switchback from the standby node to the primary node is called failback.

FDW

A foreign data wrapper (FDW) is a SQL interface provided by Postgres. It is used to access big data objects stored in remote data so that DBAs can integrate data from unrelated data sources and store them in public schema in the database.

freeze

An operation automatically performed by the AutoVacuum Worker process when transaction IDs are exhausted. DWS records transaction IDs in the row heading. When a transaction reads a row, the transaction ID in row headings and the actual transaction ID are compared to determine whether this row is explicit. Transaction IDs are integers containing no symbols. If exhausted, transaction IDs are re-calculated outside of the integer range, causing the explicit rows to become implicit. To prevent such a problem, the freeze operation marks a transaction ID as a special ID. Rows marked with these special transaction IDs are explicit to all transactions.

GDB

As a GNU debugger, GDB allows you to see what is going on 'inside' another program while it executes or what another program was doing the moment that it crashed. GDB can perform four main kinds of things (make PDK functions stronger) to help you catch bugs in the act:

  • Starts your program, specifying anything that might affect its behavior.
  • Stops a program in a specific condition.
  • Checks what happens when a program stops.
  • Changes things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

GDS

Gauss Data ServiceGeneral Data Service. To import data to DWS, you need to deploy the tool on the server where the source data is stored so that DNs can use this tool to obtain data.

GNU

The GNU Project was publicly announced on September 27, 1983 by Richard Stallman, aiming at building an OS composed wholly of free software. GNU is a recursive acronym for "GNU's Not Unix!". Stallman announced that GNU should be pronounced as Guh-NOO. Technically, GNU is similar to Unix in design, a widely used commercial OS. However, GNU is free software and contains no Unix code.

gsql

DWS interactive terminal. This enables you to interactively type in queries, issue them to DWS, and view the query results. Queries can also be entered from files. gsql supports many meta commands and shell-like commands, allowing you to conveniently compile scripts and automate tasks.

GTM

Global Transaction Manager (GTM) manages the status of transactions.

GUC

Grand unified configuration (GUC) includes parameters for running databases, the values of which determine database system behavior.

HA

High availability (HA) is a solution in which two modules operate in primary/standby mode to achieve high availability. This solution helps to minimize the duration of service interruptions caused by routine maintenance (planned) or sudden system breakdowns (unplanned), improving the system and application usability.

HBA

Host-based authentication (HBA) allows the host to authenticate a certain number of the total system users. It can apply to all users on a system or a subset using the Match directive. This type of authentication can be useful for managing computing clusters and other fairly homogenous pools of machines. In all, three files on the server and one on the client must be modified to prepare for host-based authentication.

HDFS

Hadoop Distributed File System (HDFS) is a subproject of Apache Hadoop. HDFS is highly fault tolerant and is designed to run on low-end hardware. The HDFS provides high-throughput access to large data sets and is ideal for applications having large data sets.

server

A combination of hardware and software designed for providing clients with services. This word alone refers to the computer running the server OS, or the software or dedicated hardware providing services.

advanced package

Logical and functional stored procedures and functions provided by DWS.

isolation

One of the ACID features of database transactions. Isolation means that the operations inside a transaction and data used are isolated from other concurrent transactions. The concurrent transactions do not affect each other.

relational database

A database created using a relational model. It processes data using methods of set algebra.

archive thread

A thread started when the archive function is enabled on a database. The thread archives database logs to a specified path.

failover

The automatic substitution of a functionally equivalent system component for a failed one. The system component can be a processor, server, network, or database.

environment variable

An environment variable defines the part of the environment in which a process runs. For example, it can define the part of the environment as the main directory, command search path, terminal that is in use, or the current time zone.

checkpoint

A mechanism that stores data in the database memory to disks at a certain time. DWS periodically stores the data of committed transactions and data of uncommitted transactions to disks. The data and redo logs can be used for database restoration if a database restarts or breaks down.

encryption

A function hiding information content during data transmission to prevent the unauthorized use of the information.

node

Cluster nodes (or nodes) are physical and virtual severs that make up the DWS cluster environment.

error correction

  

A technique that automatically detects and corrects errors in software and data streams to improve system stability and reliability.

process

An instance of a computer program that is being executed. A process may be made up of multiple threads of execution. Other processes cannot use a thread occupied by the process.

PITR

Point-In-Time Recovery (PITR) is a backup and restoration feature of DWS. Data can be restored to a specified point in time if backup data and WAL logs are normal.

record

In a relational database, a record corresponds to data in each row of a table.

cluster

A cluster is an independent system consisting of servers and other resources, ensuring high availability. In certain conditions, clusters can implement load balancing and concurrent processing of transactions.

K – O

LLVM

LLVM is short for Low Level Virtual Machine. Low Level Virtual Machine (LLVM) is a compiler framework written in C++ and is designed to optimize the compile-time, link-time, run-time, and idle-time of programs that are written in arbitrary programming languages. It is open to developers and compatible with existing scripts.

LVS

Linux Virtual Server (LVS), a virtual server cluster system, is used for balancing the load of a cluster.

MPP

Massive Parallel Processing (MPP) refers to cluster architecture that consists of multiple machines. The architecture is also called a cluster system.

MVCC

Multi-Version Concurrency Control (MVCC) is a protocol that allows a tuple to have multiple versions, on which different query operations can be performed. A basic advantage is that read and write operations do not conflict.

NameNode

The NameNode is the centerpiece of a Hadoop file system, managing the namespace of the file system and client access to files.

OBS

Object Storage Service (OBS), an object-based cloud storage service.

OLAP

Online analytical processing (OLAP) is the most important application in the database warehouse system. It is dedicated to complex analytical operations, helps decision makers and executives to make decisions, and rapidly and flexibly processes complex queries involving a great amount of data based on analysts' requirements. In addition, the OLAP provides decision makers with query results that are easy to understand, allowing them to learn the operating status of the enterprise. These decision makers can then produce informed and accurate solutions based on the query results.

OM

Operations Management (OM) provides management interfaces and tools for routine maintenance and configuration management of the cluster.

ORC

Optimized Row Columnar (ORC) is a widely used file format for structured data in a Hadoop system. It was introduced from the Hadoop HIVE project.

client

A computer or program that connects to or requests the services of another computer or program.

free space management

A mechanism for managing free space in a table. This mechanism enables the database system to record free space in each table and establish an easy-to-search data structure, accelerating operations (such as INSERT) performed on the free space.

junk tuple

A tuple that is deleted using the DELETE and UPDATE statements. When deleting a tuple, DWS only marks the tuples that are to be cleared. The Vacuum thread will then periodically clear these junk tuples.

column

An equivalent concept of "field". A database table consists of one or more columns. Together they describe all attributes of a record in the table.

logical node

Multiple logical nodes can be installed on the same physical node. A logical node is a database instance.

schema

Collection of database objects, including logical structures, such as tables, views, sequences, stored procedures, synonyms, indexes, clusters, and database links.

schema file

A SQL file that determines the database structure.

P – T

page

Minimum memory unit for row storage in the DWS relationship object structure. The default size of a page is 8 KB.

PARQUET

A widely used file format for structured data in a Hadoop system. It was introduced from the Cloudera Impala project.

PostgreSQL

An open-source DBMS developed by volunteers all over the world. PostgreSQL is not controlled by any companies or individuals. Its source code can be used for free.

Postgres-XC

Postgres-XC is an open source PostgreSQL cluster to provide write-scalable, synchronous, multi-master PostgreSQL cluster solution.

Postmaster

A thread started when the database service is started. It listens to connection requests from other nodes in the cluster or from clients.

After receiving and accepting a connection request from the standby server, the primary server creates a WAL Sender thread to interact with the standby server.

RHEL

Red Hat Enterprise Linux (RHEL)

redo log

A log that contains information required for performing an operation again in a database. If a database is faulty, redo logs can be used to restore the database to its original state.

SCTP

The Stream Control Transmission Protocol (SCTP) is a transport-layer protocol defined by Internet Engineering Task Force (IETF) in 2000. The protocol ensures the reliability of datagram transport based on unreliable service transmission protocols by transferring SCN narrowband signaling over IP network.

session

A task created by a database for a connection when an application attempts to connect to the database. Sessions are managed by the session manager. They execute initial tasks to perform all user operations.

shared-nothing architecture

A distributed computing architecture, in which none of the nodes share CPUs or storage resources. This architecture has good scalability.

SLES

SUSE Linux Enterprise Server (SLES) is an enterprise Linux OS provided by SUSE.

SMP

Symmetric multiprocessing (SMP) lets multiple CPUs run on a computer and share the same memory and bus. To ensure an SMP system achieves high performance, an OS must support multi-tasking and multi-thread processing. In databases, SMP means to concurrently execute queries using the multi-thread technology, efficiently using all CPU resources and improving query performance.

SQL

Structure Query Language (SQL) is a standard database query language. It consists of DDL, DML, and DCL.

SSL

Secure Socket Layer (SSL) is a network security protocol introduced by Netscape. SSL is a security protocol based on the TCP and IP communications protocols and uses the public key technology. SSL supports a wide range of networks and provides three basic security services, all of which use the public key technology. SSL ensures the security of service communication through the network by establishing a secure connection between the client and server and then sending data through this connection.

convergence ratio

Downlink to uplink bandwidth ratio of a switch. A high convergence ratio indicates a highly converged traffic environment and severe packet loss.

trace

A way of logging to record information about the way a program is executed. This information is typically used by programmers for debugging purposes. System administrators and technical support can diagnose common problems by using software monitoring tools and based on this information.

full backup

Backup of the entire database cluster.

full synchronization

A data synchronization mechanism specified in the DWS HA solution. Used to synchronize all data from the primary server to a standby server.

log file

A file to which a computer system writes a record of its activities.

transaction

A logical unit of work performed within a DBMS against a database. A transaction consists of a limited database operation sequence, and must have ACID features.

data

A representation of facts or directives for manual or automatic communication, explanation, or processing. Data includes constants, variables, arrays, and strings.

data distribution

A mode in which table data is split and stored on each database instance in a distributed system. Table data can be distributed in hash, replication, or random mode. In hash mode, a hash value is calculated based on the value of a specified column in a tuple, and then the target storage location of the tuple is determined based on the mapping between nodes and hash values. In replication mode, tuples are replicated to all nodes. In random mode, data is randomly distributed to the nodes.

data distribution

A mode in which table data is split and stored on each database instance in a distributed system. Table data can be distributed in hash, replication, or random mode. In hash mode, a hash value is calculated based on the value of a specified column in a tuple, and then the target storage location of the tuple is determined based on the mapping between nodes and hash values. In replication mode, tuples are replicated to all nodes. In random mode, data is randomly distributed to the nodes.

data partitioning

A division of a logical database or its constituent elements into multiple parts (partitions) whose data does not overlap based on specified ranges. Data is mapped to storage locations based on the value ranges of specific columns in a tuple.

database

A collection of data that is stored together and can be accessed, managed, and updated. Data in a view in the database can be classified into the following types: numerals, full text, digits, and images.

DB instance

A DWS process and the database files that it controls. DWS installs multiple database instances on one physical node. The GTM, CM, CN, and DN installed on cluster nodes are all database instances. A database instance is also called a logical node.

database HA

DWS provides a highly reliable HA solution. Every logical node in DWS is identified as a primary or standby node. Only one DWS node is identified as primary at a time. When the HA system is deployed for the first time, the primary server synchronizes all data from each standby server (full synchronization). The HA system then synchronizes only data that is new or has been modified from each standby server (incremental synchronization). When the HA system is running, the primary server can receive data read and write operation requests and the standby servers only synchronize logs.

database file

A binary file that stores user data and the data inside the database system.

data flow operator

An operator that exchanges data among query fragments. By their input/output relationships, data flows can be categorized into Gather flows, Broadcast flows, and Redistribution flows. Gather combines multiple query fragments of data into one. Broadcast forwards the data of one query fragment to multiple query fragments. Redistribution reorganizes the data of multiple query fragments and then redistributes the reorganized data to multiple query fragments.

data dictionary

A reserved table within a database which is used to store information about the database itself. The information includes database design information, stored procedure information, user rights, user statistics, database process information, database increase statistics, and database performance statistics.

deadlock

Unresolved contention for the use of resources.

index

An ordered data structure in the database management system. An index accelerates querying and the updating of data in database tables.

statistics

Information that is automatically collected by databases, including table-level information (number of tuples and number of pages) and column-level information (column value range distribution histogram). Statistics in databases are used to estimate the cost of execution plans to find the plan with the lowest cost.

projection

Projection is a unary operation. It uses specified attribute values from tables to form a new table and is represented by ΠA(R), where A indicates the table containing attribute names (column names) and R indicates the table name.

stop word

In computing, stop words are words which are filtered out before or after processing of natural language data (text), saving storage space and improving search efficiency.

U – Z

vacuum

A thread that is periodically started up by a database to clear junk tuples. Multiple VACUUM threads can be started concurrently by setting a parameter.

verbose

VERBOSE specifies the information to be displayed.

WAL

Write-ahead logging (WAL) is a standard method for logging a transaction. Corresponding logs must be written into a permanent device before a data file (carrier for a table and index) is modified.

WAL Receiver

A thread created by the standby server during database duplication. The thread is used to receive data and commands from the primary server and to tell the primary server that the data and commands have been acknowledged. Only one WAL Receiver thread can run on one standby server.

WAL Sender

A thread created on the primary server when the primary server has received a connection request from a standby server during database replication. This thread is used to send data and commands to standby servers and to receive responses from the standby servers. Multiple WAL Sender threads may run on one primary server. Each WAL Sender thread corresponds to a connection request initiated by a standby server.

WAL Writer

A thread for writing redo logs that are created when a database is started. This thread is used to write logs in the memory to a permanent device, such as a disk.

WLM

The WorkLoad Manager (WLM) controls and allocates system resources in DWS.

Xlog

A transaction log. A logical node can have only one Xlog file.

xDR

X detailed record. It refers to detailed records on the user and signaling plans and can be categorized into charging data records (CDRs), user flow data records (UFDRs), transaction detail records (TDRs), and service detail records (SDRs).

network backup

Network backup provides a comprehensive and flexible data protection solution to Microsoft Windows, UNIX, and Linux platforms. Network backup can back up, archive, and restore files, folders, directories, volumes, and partitions on a computer.

physical node

A physical machine or device.

system catalog

A table storing meta information about the database. The meta information includes user tables, indexes, columns, functions, and the data types in a database.

selection

Selection is a unary operation on a table. It uses rows selected from a table based on given conditions to form a new table and is represented by σF(R), where σ is the selection operator, F indicates a conditional expression, and R indicates a table.

compression

Data compression, source coding, or bit-rate reduction involves encoding information that uses fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying and removing unnecessary or unimportant information. The process of reducing the size of a data file is commonly referred as data compression, although its formal name is source coding (coding done at the source of the data, before it is stored or transmitted).

consistency

One of the ACID features of database transactions. Consistency is a database status. In such a status, data in the database must comply with integrity constraints.

metadata

Data that provides information about other data. Metadata describes the source, size, format, or other characteristics of data. In database columns, metadata explains the content of a data warehouse.

atomicity

One of the ACID features of database transactions. Atomicity means that a transaction is composed of an indivisible unit of work. All operations performed in a transaction must either be committed or uncommitted. If an error occurs during transaction execution, the transaction is rolled back to the state when it was not committed.

dirty page

A page that has been modified and is not written to a permanent device.

incremental backup

Incremental backup stores all files changed since the last valid backup.

incremental synchronization

A data synchronization mechanism in the DWS HA solution. Only data modified since the last synchronization is synchronized to the standby server.

primary server

A node that receives data read and write operations in the DWS HA system and works with all standby servers. At any time, only one node in the HA system is identified as the primary server.

thesaurus

Standardized words or phrases that express document themes and are used for indexing and retrieval.

dump file

A specific type of the trace file. A dump file contains diagnostic data during the event response, whereas a trace file contains continuously generated diagnostic data.

resource pool

A resource pool is a resource allocation mechanism provided by DWS. By binding a user to a resource pool, you can limit the priority of the jobs executed by the user and resources available to the jobs.

minimum restoration point

A data consistency assurance method provided by DWS. During startup, DWS checks consistency between the latest WAL logs and the minimum restoration point. If the record location of the minimum restoration point is greater than that of the latest WAL logs, the database fails to start.