Configuration Examples¶
Text search configuration specifies the following components required for converting a document into a tsvector:
A parser, decomposes a text into tokens.
Dictionary list, converts each token into a lexeme.
Each time when the to_tsvector or to_tsquery function is invoked, a text search configuration is required to specify a processing procedure. The GUC parameter default_text_search_config specifies the default text search configuration, which will be used if the text search function does not explicitly specify a text search configuration.
GaussDB(DWS) provides some predefined text search configurations. You can also create user-defined text search configurations. In addition, to facilitate the management of text search objects, multiple gsql meta-commands are provided to display information about text search objects.
Procedure¶
Create a text search configuration ts_conf by copying the predefined text search configuration english.
CREATE TEXT SEARCH CONFIGURATION ts_conf ( COPY = pg_catalog.english ); CREATE TEXT SEARCH CONFIGURATION
Create a Synonym dictionary.
Assume that the definition file pg_dict.syn of the Synonym dictionary contains the following contents:
postgres pg pgsql pg postgresql pg
Run the following statement to create the Synonym dictionary:
CREATE TEXT SEARCH DICTIONARY pg_dict ( TEMPLATE = synonym, SYNONYMS = pg_dict, FILEPATH = 'obs://bucket01/obs.example.com accesskey=xxxxx secretkey=xxxxx region=xx-xx-xx' );
Create an Ispell dictionary english_ispell (the dictionary definition file is from the open source dictionary).
CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english, FILEPATH = 'obs://bucket01/obs.example.com accesskey=xxxxx secretkey=xxxxx region=xx-xx-xx' );
Modify the text search configuration ts_conf and change the dictionary list for tokens of certain types. For details about token types, see Parsers.
ALTER TEXT SEARCH CONFIGURATION ts_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_dict, english_ispell, english_stem;
In the text search configuration, set non-index or set the search for tokens of certain types.
ALTER TEXT SEARCH CONFIGURATION ts_conf DROP MAPPING FOR email, url, url_path, sfloat, float;
Use the text retrieval commissioning function ts_debug() to test the text search configuration ts_conf.
SELECT * FROM ts_debug('ts_conf', ' PostgreSQL, the highly scalable, SQL compliant, open source object-relational database management system, is now undergoing beta testing of the next version of our software. ');
You can set the default text search configuration of the current session to ts_conf. This setting is valid only for the current session.
\dF+ ts_conf Text search configuration "public.ts_conf" Parser: "pg_catalog.default" Token | Dictionaries -----------------+------------------------------------- asciihword | pg_dict,english_ispell,english_stem asciiword | pg_dict,english_ispell,english_stem file | simple host | simple hword | pg_dict,english_ispell,english_stem hword_asciipart | pg_dict,english_ispell,english_stem hword_numpart | simple hword_part | pg_dict,english_ispell,english_stem int | simple numhword | simple numword | simple uint | simple version | simple word | pg_dict,english_ispell,english_stem SET default_text_search_config = 'public.ts_conf'; SET SHOW default_text_search_config; default_text_search_config ---------------------------- public.ts_conf (1 row)