Using the Python Library PyGreSQL to Connect to a Cluster

After creating a data warehouse cluster and using the third-party function library PyGreSQL to connect to the cluster, you can use Python to access GaussDB(DWS) and perform various operations on data tables.

Preparations Before Connecting to a Cluster

  • An EIP has been bound to the data warehouse cluster.

  • You have obtained the administrator username and password for logging in to the database in the data warehouse cluster.

    MD5 algorithms may by vulnerable to collision attacks and cannot be used for password verification. Currently, GaussDB(DWS) uses the default security design. By default, MD5 password verification is disabled, and this may cause failures of connections from open source clients. You are advised to set password_encryption_type to 1. For details, see "Modifying Database Parameters" in User Guide.

    Note

    • For security purposes, GaussDB(DWS) no longer uses MD5 to store password digests by default. As a result, the open-source drives and clients may fail to connect to the database. To use the MD5 algorithm used in an open-source protocol, you must modify your password policy and create a new user, or change the password of an existing user.

    • The database stores the hash digest of passwords instead of password text. During password verification, the system compares the hash digest with the password digest sent from the client (salt operations are involved). If you change your cryptographic algorithm policy, the database cannot generate a new hash digest for your existing password. For connectivity purposes, you must manually change your password or create a new user. The new password will be encrypted using the hash algorithm and stored for authentication in the next connection.

  • You have obtained the public network address, including the IP address and port number in the data warehouse cluster. For details, see Obtaining the Cluster Connection Address.

  • You have installed the third-party function library PyGreSQL.

    Download address: http://www.pygresql.org/download/index.html

  • For details about the installation and deployment operations, see http://www.pygresql.org/contents/install.html

    Note

    • In CentOS and Red Hat OS, run the following yum command:

      yum install PyGreSQL
      
    • PyGreSQL depends on the libpq dynamic library of PostgreSQL (32-bit or 64-bit version, whichever matches the PyGreSQL bit version). In Linux, you can run the yum command and do not need to install the library. Before using PyGreSQL in Windows, you need to install libpq in either of the following ways:

      • Install PostgreSQL and configure the libpq, ssl, and crypto dynamic libraries in the environment variable PATH.

      • Install psqlodbc and use the libpq, ssl, and crypto dynamic libraries carried by the PostgreSQL ODBC driver.

Constraints

PyGreSQL is a PostgreSQL-based client interface, and its functions are not fully supported by GaussDB(DWS). For details, see Table 1.

Note

The following APIs are supported based on Python 3.8.5 and PyGreSQL 5.2.4.

Table 1 PyGreSQL APIs supported by DWS

PyGreSQL

Yes

Remarks

Module functions and constants

connect - Open a PostgreSQL connection

Y

-

get_pqlib_version - get the version of libpq

Y

-

get/set_defhost - default server host [DV]

Y

-

get/set_defport - default server port [DV]

Y

-

get/set_defopt - default connection options [DV]

Y

-

get/set_defbase - default database name [DV]

Y

-

get/set_defuser - default database user [DV]

Y

-

get/set_defpasswd - default database password [DV]

Y

-

escape_string - escape a string for use within SQL

Y

-

escape_bytea - escape binary data for use within SQL

Y

-

unescape_bytea - unescape data that has been retrieved as text

Y

-

get/set_namedresult - conversion to named tuples

Y

-

get/set_decimal - decimal type to be used for numeric values

Y

-

get/set_decimal_point - decimal mark used for monetary values

Y

-

get/set_bool - whether boolean values are returned as bool objects

Y

-

get/set_array - whether arrays are returned as list objects

Y

-

get/set_bytea_escaped - whether bytea data is returned escaped

Y

-

get/set_jsondecode - decoding JSON format

Y

-

get/set_cast_hook - fallback typecast function

Y

-

get/set_datestyle - assume a fixed date style

Y

-

get/set_typecast - custom typecasting

Y

-

cast_array/record - fast parsers for arrays and records

Y

-

Type helpers

Y

-

Module constants

Y

-

Connection - The connection object

query - execute a SQL command string

Y

-

send_query - executes a SQL command string asynchronously

Y

-

query_prepared - execute a prepared statement

Y

-

prepare - create a prepared statement

Y

-

describe_prepared - describe a prepared statement

Y

-

reset - reset the connection

Y

-

poll - completes an asynchronous connection

Y

-

cancel - abandon processing of current SQL command

Y

-

close - close the database connection

Y

-

transaction - get the current transaction state

Y

-

parameter - get a current server parameter setting

Y

-

date_format - get the currently used date format

Y

-

fileno - get the socket used to connect to the database

Y

-

set_non_blocking - set the non-blocking status of the connection

Y

-

is_non_blocking - report the blocking status of the connection

Y

-

getnotify - get the last notify from the server

N

The database does not support listen/notify.

inserttable - insert a list into a table

Y

Use double quotation marks ("") to quote \n in the copy command.

get/set_notice_receiver - custom notice receiver

Y

-

putline - write a line to the server socket [DA]

Y

-

getline - get a line from server socket [DA]

Y

-

endcopy - synchronize client and server [DA]

Y

-

locreate - create a large object in the database [LO]

N

Operations related to large objects

getlo - build a large object from given oid [LO]

N

Operations related to large objects

loimport - import a file to a large object [LO]

N

Operations related to large objects

Object attributes

Y

-

The DB wrapper class

Initialization

Y

-

pkey - return the primary key of a table

Y

-

get_databases - get list of databases in the system

Y

-

get_relations - get list of relations in connected database

Y

-

get_tables - get list of tables in connected database

Y

-

get_attnames - get the attribute names of a table

Y

-

has_table_privilege - check table privilege

Y

-

get/set_parameter - get or set run-time parameters

Y

-

begin/commit/rollback/savepoint/release - transaction handling

Y

-

get - get a row from a database table or view

Y

-

insert - insert a row into a database table

Y

-

update - update a row in a database table

Y

-

upsert - insert a row with conflict resolution

Y

-

query - execute a SQL command string

Y

-

query_formatted - execute a formatted SQL command string

Y

-

query_prepared - execute a prepared statement

Y

-

prepare - create a prepared statement

Y

-

describe_prepared - describe a prepared statement

Y

-

delete_prepared - delete a prepared statement

Y

-

clear - clear row values in memory

Y

-

delete - delete a row from a database table

Y

A tuple must have unique key or primary key.

truncate - quickly empty database tables

Y

-

get_as_list/dict - read a table as a list or dictionary

Y

-

escape_literal/identifier/string/bytea - escape for SQL

Y

-

unescape_bytea - unescape data retrieved from the database

Y

-

encode/decode_json - encode and decode JSON data

Y

-

use_regtypes - determine use of regular type names

Y

-

notification_handler - create a notification handler

N

The database does not support listen/notify.

Attributes of the DB wrapper class

Y

-

Query methods

getresult - get query values as list of tuples

Y

-

dictresult/dictiter - get query values as dictionaries

Y

-

namedresult/namediter - get query values as named tuples

Y

-

scalarresult/scalariter - get query values as scalars

Y

-

one/onedict/onenamed/onescalar - get one result of a query

Y

-

single/singledict/singlenamed/singlescalar - get single result of a query

Y

-

listfields - list fields names of previous query result

Y

-

fieldname, fieldnum - field name/number conversion

Y

-

fieldinfo - detailed info about query result fields

Y

-

ntuples - return number of tuples in query object

Y

-

memsize - return number of bytes allocated by query result

Y

-

LargeObject - Large Objects

open - open a large object

N

Operations related to large objects

close - close a large object

N

Operations related to large objects

read, write, tell, seek, unlink - file-like large object handling

N

Operations related to large objects

size - get the large object size

N

Operations related to large objects

export - save a large object to a file

N

Operations related to large objects

Object attributes

N

Operations related to large objects

The Notification Handler

Instantiating the notification handler

N

The database does not support listen/notify.

Invoking the notification handler

N

The database does not support listen/notify.

Sending notifications

N

The database does not support listen/notify.

Auxiliary methods

N

The database does not support listen/notify.

pgdb

Module functions and constants

connect - Open a PostgreSQL connection

Y

-

get/set/reset_typecast - Control the global typecast functions

Y

-

Module constants

Y

-

Errors raised by this module

Y

-

Connection - The connection object

close - close the connection

Y

-

commit - commit the connection

Y

-

rollback - roll back the connection

Y

-

cursor - return a new cursor object

Y

-

Attributes that are not part of the standard

Y

-

Cursor - The cursor object

description - details regarding the result columns

Y

-

rowcount - number of rows of the result

Y

-

close - close the cursor

Y

-

execute - execute a database operation

Y

-

executemany - execute many similar database operations

Y

-

callproc - Call a stored procedure

Y

-

fetchone - fetch next row of the query result

Y

-

fetchmany - fetch next set of rows of the query result

Y

-

fetchall - fetch all rows of the query result

Y

-

arraysize - the number of rows to fetch at a time

Y

-

Methods and attributes that are not part of the standard

Y

-

Type - Type objects and constructors

Type constructors

Y

-

Type objects

Y

-

Using the Third-Party Function Library PyGreSQL to Connect to a Cluster (Linux)

  1. Log in to the Linux environment as user root.

  2. Run the following command to create the python_dws.py file:

    vi python_dws.py
    

    Copy and paste the following content to the python_dws.py file:

    #!/usr/bin/env python3
    # _*_ encoding:utf-8 _*_
    
    from __future__ import print_function
    
    import pg
    
    
    def create_table(connection):
        print("Begin to create table")
        try:
            connection.query("drop table if exists test;"
                             "create table test(id int, name text);")
        except pg.InternalError as e:
            print(e)
        else:
            print("Table created successfully")
    
    
    def insert_data(connection):
        print("Begin to insert data")
        try:
            connection.query("insert into test values(1,'number1');")
            connection.query("insert into test values(2,'number2');")
            connection.query("insert into test values(3,'number3');")
        except pg.InternalError as e:
            print(e)
        else:
            print("Insert data successfully")
    
    
    def update_data(connection):
        print("Begin to update data")
        try:
            result = connection.query("update test set name = 'numberupdated' where id=1;")
            print("Total number of rows updated :", result)
            result = connection.query("select * from test order by 1;")
            rows = result.getresult()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Update, Operation done successfully")
    
    
    def delete_data(connection):
        print("Begin to delete data")
        try:
            result = connection.query("delete from test where id=3;")
            print("Total number of rows deleted :", result)
            result = connection.query("select * from test order by 1;")
            rows = result.getresult()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Delete,Operation done successfully")
    
    
    def select_data(connection):
        print("Begin to select data")
        try:
            result = connection.query("select * from test order by 1;")
            rows = result.getresult()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1])
        except pg.InternalError as e:
            print(e)
            print("select failed")
        else:
            print("Operation done successfully")
    
    
    if __name__ == '__main__':
        try:
            conn = pg.DB(host='10.154.70.231',
                         port=8000,
                         dbname='gaussdb', # Database to be connected
                         user='dbadmin',
                         passwd='password')  # Database user password
        except pg.InternalError as ex:
            print(ex)
            print("Connect database failed")
        else:
            print("Opened database successfully")
            create_table(conn)
            insert_data(conn)
            select_data(conn)
            update_data(conn)
            delete_data(conn)
            conn.close()
    

    Alternatively, use the dbapi interface.

    #!/usr/bin/python
    # -*- coding: UTF-8 -*-
    
    from __future__ import print_function
    
    import pg
    import pgdb
    
    
    def create_table(connection):
        print("Begin to create table")
        try:
            cursor = connection.cursor()
            cursor.execute("drop table if exists test;"
                           "create table test(id int, name text);")
            connection.commit()
        except pg.InternalError as e:
            print(e)
        else:
            print("Table created successfully")
            cursor.close()
    
    
    def insert_data(connection):
        print("Begin to insert data")
        try:
            cursor = connection.cursor()
            cursor.execute("insert into test values(1,'number1');")
            cursor.execute("insert into test values(2,'number2');")
            cursor.execute("insert into test values(3,'number3');")
            connection.commit()
        except pg.InternalError as e:
            print(e)
        else:
            print("Insert data successfully")
            cursor.close()
    
    
    def update_data(connection):
        print("Begin to update data")
        try:
            cursor = connection.cursor()
            cursor.execute("update test set name = 'numberupdated' where id=1;")
            connection.commit()
            print("Total number of rows updated :", cursor.rowcount)
            cursor.execute("select * from test;")
            rows = cursor.fetchall()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Update, Operation done successfully")
    
    
    def delete_data(connection):
        print("Begin to delete data")
        try:
            cursor = connection.cursor()
            cursor.execute("delete from test where id=3;")
            connection.commit()
            print("Total number of rows deleted :", cursor.rowcount)
            cursor.execute("select * from test;")
            rows = cursor.fetchall()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Delete,Operation done successfully")
    
    
    def select_data(connection):
        print("Begin to select data")
        try:
            cursor = connection.cursor()
            cursor.execute("select * from test;")
            rows = cursor.fetchall()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
            print("select failed")
        else:
            print("Operation done successfully")
            cursor.close()
    
    
    if __name__ == '__main__':
        try:
            conn = pgdb.connect(host='10.154.70.231',
                                          port='8000',
                                          database='gaussdb', # Database to be connected
                                          user='dbadmin',
                                          password='password') # Database user password
        except pg.InternalError as ex:
            print(ex)
            print("Connect database failed")
        else:
            print("Opened database successfully")
            create_table(conn)
            insert_data(conn)
            select_data(conn)
            update_data(conn)
            delete_data(conn)
            conn.close()
    
  3. Change the public network address, cluster port number, database name, database username, and database password in the python_dws.py file based on the actual cluster information.

    Note

    The PyGreSQL API does not provide the connection retry capability. You need to implement the retry processing in the service code.

    conn = pgdb.connect(host='10.154.70.231',
                                  port='8000',
                                  database='gaussdb', # Database to be connected
                                  user='dbadmin',
                                  password='password') # Database user password
    
  4. Run the following command to connect to the cluster using the third-party function library PyGreSQL:

    python python_dws.py
    

Using the Third-Party Function Library PyGreSQL to Connect to a Cluster (Windows)

  1. In the Windows operating system, click the Start button, enter cmd in the search box, and click cmd.exe in the result list to open the command-line interface (CLI).

  2. In the CLI, run the following command to create the python_dws.py file:

    type nul> python_dws.py
    

    Copy and paste the following content to the python_dws.py file:

    #!/usr/bin/env python3
    # _*_ encoding:utf-8 _*_
    
    from __future__ import print_function
    
    import pg
    
    
    def create_table(connection):
        print("Begin to create table")
        try:
            connection.query("drop table if exists test;"
                             "create table test(id int, name text);")
        except pg.InternalError as e:
            print(e)
        else:
            print("Table created successfully")
    
    
    def insert_data(connection):
        print("Begin to insert data")
        try:
            connection.query("insert into test values(1,'number1');")
            connection.query("insert into test values(2,'number2');")
            connection.query("insert into test values(3,'number3');")
        except pg.InternalError as e:
            print(e)
        else:
            print("Insert data successfully")
    
    
    def update_data(connection):
        print("Begin to update data")
        try:
            result = connection.query("update test set name = 'numberupdated' where id=1;")
            print("Total number of rows updated :", result)
            result = connection.query("select * from test order by 1;")
            rows = result.getresult()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Update, Operation done successfully")
    
    
    def delete_data(connection):
        print("Begin to delete data")
        try:
            result = connection.query("delete from test where id=3;")
            print("Total number of rows deleted :", result)
            result = connection.query("select * from test order by 1;")
            rows = result.getresult()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Delete,Operation done successfully")
    
    
    def select_data(connection):
        print("Begin to select data")
        try:
            result = connection.query("select * from test order by 1;")
            rows = result.getresult()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1])
        except pg.InternalError as e:
            print(e)
            print("select failed")
        else:
            print("Operation done successfully")
    
    
    if __name__ == '__main__':
        try:
            conn = pg.DB(host='10.154.70.231',
                         port=8000,
                         dbname='gaussdb', # Database to be connected
                         user='dbadmin',
                         passwd='password')  # Database user password
        except pg.InternalError as ex:
            print(ex)
            print("Connect database failed")
        else:
            print("Opened database successfully")
            create_table(conn)
            insert_data(conn)
            select_data(conn)
            update_data(conn)
            delete_data(conn)
            conn.close()
    

    Alternatively, use the dbapi interface.

    #!/usr/bin/python
    # -*- coding: UTF-8 -*-
    
    from __future__ import print_function
    
    import pg
    import pgdb
    
    
    def create_table(connection):
        print("Begin to create table")
        try:
            cursor = connection.cursor()
            cursor.execute("drop table if exists test;"
                           "create table test(id int, name text);")
            connection.commit()
        except pg.InternalError as e:
            print(e)
        else:
            print("Table created successfully")
            cursor.close()
    
    
    def insert_data(connection):
        print("Begin to insert data")
        try:
            cursor = connection.cursor()
            cursor.execute("insert into test values(1,'number1');")
            cursor.execute("insert into test values(2,'number2');")
            cursor.execute("insert into test values(3,'number3');")
            connection.commit()
        except pg.InternalError as e:
            print(e)
        else:
            print("Insert data successfully")
            cursor.close()
    
    
    def update_data(connection):
        print("Begin to update data")
        try:
            cursor = connection.cursor()
            cursor.execute("update test set name = 'numberupdated' where id=1;")
            connection.commit()
            print("Total number of rows updated :", cursor.rowcount)
            cursor.execute("select * from test;")
            rows = cursor.fetchall()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Update, Operation done successfully")
    
    
    def delete_data(connection):
        print("Begin to delete data")
        try:
            cursor = connection.cursor()
            cursor.execute("delete from test where id=3;")
            connection.commit()
            print("Total number of rows deleted :", cursor.rowcount)
            cursor.execute("select * from test;")
            rows = cursor.fetchall()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
        else:
            print("After Delete,Operation done successfully")
    
    
    def select_data(connection):
        print("Begin to select data")
        try:
            cursor = connection.cursor()
            cursor.execute("select * from test;")
            rows = cursor.fetchall()
            for row in rows:
                print("id = ", row[0])
                print("name = ", row[1], "\n")
        except pg.InternalError as e:
            print(e)
            print("select failed")
        else:
            print("Operation done successfully")
            cursor.close()
    
    
    if __name__ == '__main__':
        try:
            conn = pgdb.connect(host='10.154.70.231',
                                          port='8000',
                                          database='gaussdb', # Database to be connected
                                          user='dbadmin',
                                          password='password') # Database user password
        except pg.InternalError as ex:
            print(ex)
            print("Connect database failed")
        else:
            print("Opened database successfully")
            create_table(conn)
            insert_data(conn)
            select_data(conn)
            update_data(conn)
            delete_data(conn)
            conn.close()
    
  3. Change the public network address, cluster port number, database name, database username, and database password in the python_dws.py file based on the actual cluster information.

    The PyGreSQL API does not provide the connection retry capability. You need to implement the retry processing in the service code.

    conn = pgdb.connect(host='10.154.70.231',
                                  port='8000',
                                  database='gaussdb', # Database to be connected
                                  user='dbadmin',
                                  password='password') # Database user password
    
  4. Run the following command to connect to the cluster using the third-party function library PyGreSQL:

    python python_dws.py