String Functions¶
Function | Return Type | Description |
---|---|---|
string1 || string2 | STRING | Returns the concatenation of string1 and string2. |
CHAR_LENGTH(string) CHARACTER_LENGTH(string) | INT | Returns the number of characters in the string. |
UPPER(string) | STRING | Returns the string in uppercase. |
LOWER(string) | STRING | Returns the string in lowercase. |
POSITION(string1 IN string2) | INT | Returns the position (start from 1) of the first occurrence of string1 in string2; returns 0 if string1 cannot be found in string2. |
TRIM([ BOTH | LEADING | TRAILING ] string1 FROM string2) | STRING | Returns a string that removes leading and/or trailing characters string2 from string1. |
LTRIM(string) | STRING | Returns a string that removes the left whitespaces from the specified string. For example, LTRIM(' This is a test String.') returns "This is a test String.". |
RTRIM(string) | STRING | Returns a string that removes the right whitespaces from the specified string. For example, RTRIM('This is a test String. ') returns "This is a test String.". |
REPEAT(string, integer) | STRING | Returns a string that repeats the base string integer times. For example, REPEAT('This is a test String.', 2) returns "This is a test String.This is a test String.". |
REGEXP_REPLACE(string1, string2, string3) | STRING | Returns a string from string1 with all the substrings that match a regular expression string2 consecutively being replaced with string3. For example, REGEXP_REPLACE('foobar', 'oo|ar', '') returns "fb". REGEXP_REPLACE('ab\ab', '\\', 'e') returns "abeab". |
OVERLAY(string1 PLACING string2 FROM integer1 [ FOR integer2 ]) | STRING | Returns a string that replaces integer2 characters of STRING1 with STRING2 from position integer1. The default value of integer2 is the length of string2. For example, OVERLAY('This is an old string' PLACING ' new' FROM 10 FOR 5) returns "This is a new string". |
SUBSTRING(string FROM integer1 [ FOR integer2 ]) | STRING | Returns a substring of the specified string starting from position integer1 with length integer2 (to the end by default). If integer2 is not configured, the substring from integer1 to the end is returned by default. |
REPLACE(string1, string2, string3) | STRING | Returns a new string which replaces all the occurrences of string2 with string3 (non-overlapping) from string1. For example, REPLACE('hello world', 'world', 'flink') returns "hello flink"; REPLACE('ababab', 'abab', 'z') returns "zab". REPLACE('ab\\ab', '\\', 'e') returns "abeab". |
REGEXP_EXTRACT(string1, string2[, integer]) | STRING | Returns a string from string1 which extracted with a specified regular expression string2 and a regex match group index integer. Returns NULL, if the parameter is NULL or the regular expression is invalid. For example, REGEXP_EXTRACT('foothebar', 'foo(.*?)(bar)', 2)" returns "bar". |
INITCAP(string) | STRING | Returns a new form of STRING with the first character of each word converted to uppercase and the rest characters to lowercase. |
CONCAT(string1, string2,...) | STRING | Returns a string that concatenates string1, string2, …. For example, CONCAT('AA', 'BB', 'CC') returns "AABBCC". |
CONCAT_WS(string1, string2, string3,...) | STRING | Returns a string that concatenates string2, string3, … with a separator string1. The separator is added between the strings to be concatenated. Returns NULL if string1 is NULL. If other arguments are NULL, this function automatically skips NULL arguments. For example, CONCAT_WS('~', 'AA', NULL, 'BB', '', 'CC') returns "AA~BB~~CC". |
LPAD(string1, integer, string2) | STRING | Returns a new string from string1 left-padded with string2 to a length of integer characters. If any argument is NULL, NULL is returned. If integer is negative, NULL is returned. If the length of string1 is shorter than integer, returns string1 shortened to integer characters. For example, LPAD(Symbol,4,Symbol) returns "Symbol hi". LPAD('hi',1,'??') returns "h". |
RPAD(string1, integer, string2) | STRING | Returns a new string from string1 right-padded with string2 to a length of integer characters. If any argument is NULL, NULL is returned. If integer is negative, NULL is returned. If the length of string1 is shorter than integer, returns string1 shortened to integer characters. For example, RPAD('hi',4,'??') returns "hi??". RPAD('hi',1,'??') returns "h". |
FROM_BASE64(string) | STRING | Returns the base64-decoded result from string. Returns NULL if string is NULL. For example, FROM_BASE64('aGVsbG8gd29ybGQ=') returns "hello world". |
TO_BASE64(string) | STRING | Returns the base64-encoded result from string; f string is NULL. Returns NULL if string is NULL. For example, TO_BASE64(hello world) returns "aGVsbG8gd29ybGQ=". |
ASCII(string) | INT | Returns the numeric value of the first character of string. Returns NULL if string is NULL. For example, ascii('abc') returns 97. ascii(CAST(NULL AS VARCHAR)) returns NULL. |
CHR(integer) | STRING | Returns the ASCII character having the binary equivalent to integer. If integer is larger than 255, we will get the modulus of integer divided by 255 first, and returns CHR of the modulus. Returns NULL if integer is NULL. chr(97) returns a. chr(353) Return a. |
DECODE(binary, string) | STRING | Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is NULL, the result will also be NULL. |
ENCODE(strinh1, string2) | STRING | Encodes the string1 into a BINARY using the provided string2 character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is NULL, the result will also be NULL. |
INSTR(string1, string2) | INT | Returns the position of the first occurrence of string2 in string1. Returns NULL if any argument is NULL. |
LEFT(string, integer) | STRING | Returns the leftmost integer characters from the string. Returns EMPTY String if integer is negative. Returns NULL if any argument is NULL. |
RIGHT(string, integer) | STRING | Returns the rightmost integer characters from the string. Returns EMPTY String if integer is negative. Returns NULL if any argument is NULL. |
LOCATE(string1, string2[, integer]) | INT | Returns the position of the first occurrence of string1 in string2 after position integer. Returns 0 if not found. The value of integer defaults to 0. Returns NULL if any argument is NULL. |
PARSE_URL(string1, string2[, string3]) | STRING | Returns the specified part from the URL. Valid values for string2 include 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'AUTHORITY', 'FILE', and 'USERINFO'. Returns NULL if any argument is NULL. If string2 is QUERY, the key in QUERY can be specified as string3. Example: The parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'. |
REGEXP(string1, string2) | BOOLEAN | Performs a regular expression search on the specified string and returns a BOOLEAN value indicating whether the specified match pattern is found. If it is found, TRUE is returned. string1 indicates the specified string, and string2 indicates the regular expression. Returns NULL if any argument is NULL. |
REVERSE(string) | STRING | Returns the reversed string. Returns NULL if any argument is NULL. Note Note that backquotes must be added to this function, for example, `REVERSE`. |
SPLIT_INDEX(string1, string2, integer1) | STRING | Splits string1 by the delimiter string2, returns the integerth (zero-based) string of the split strings. Returns NULL if integer is negative. Returns NULL if integer is negative. Returns NULL if any argument is NULL. |
STR_TO_MAP(string1[, string2, string3]]) | MAP | Returns a map after splitting the string1 into key/value pairs using delimiters. The default value of string2 is ','. The default value of string3 is '='. |
SUBSTR(string[, integer1[, integer2]]) | STRING | Returns a substring of string starting from position integer1 with length integer2. If integer2 is not specified, the string is truncated to the end. |
JSON_VAL(STRING json_string, STRING json_path) | STRING | Returns the value of the specified json_path from the json_string. For details about how to use the functions, see JSON_VAL Function. Note The following rules are listed in descending order of priority.
|
JSON_VAL Function¶
Syntax
STRING JSON_VAL(STRING json_string, STRING json_path)
Parameter | Data Types | Description |
---|---|---|
json_string | STRING | JSON object to be parsed |
json_path | STRING | Path expression for parsing the JSON string For the supported expressions, see Table 3. |
Expression | Description |
---|---|
$ | Root node in the path |
[] | Access array elements |
* | Array wildcard |
. | Access child elements |
Example
Test input data.
Test the data source kafka. The message content is as follows:
{name:James,age:24,sex:male,grade:{math:95,science:[80,85],english:100}} {name:James,age:24,sex:male,grade:{math:95,science:[80,85],english:100}]
Use JSON_VAL in SQL statements.
CREATE TABLE kafkaSource ( `message` string ) WITH ( 'connector' = 'kafka', 'topic' = '<yourSourceTopic>', 'properties.bootstrap.servers' = '<yourKafkaAddress1>:<yourKafkaPort>,<yourKafkaAddress2>:<yourKafkaPort>', 'properties.group.id' = '<yourGroupId>', 'scan.startup.mode' = 'latest-offset', "format" = "csv", "csv.field-delimiter" = "\u0001", "csv.quote-character" = "''" ); CREATE TABLE kafkaSink( message1 STRING, message2 STRING, message3 STRING, message4 STRING, message5 STRING, message6 STRING ) WITH ( 'connector' = 'kafka', 'topic' = '<yourSinkTopic>', 'properties.bootstrap.servers' = '<yourKafkaAddress1>:<yourKafkaPort>,<yourKafkaAddress2>:<yourKafkaPort>', "format" = "json" ); insert into kafkaSink select JSON_VAL(message,""), JSON_VAL(message,"$.name"), JSON_VAL(message,"$.grade.science"), JSON_VAL(message,"$.grade.science[*]"), JSON_VAL(message,"$.grade.science[1]"),JSON_VAL(message,"$.grade.dddd") from kafkaSource;
Check the output result of the Kafka topic in the sink.
{"message1":null,"message2":"swq","message3":"[80,85]","message4":"[80,85]","message5":"85","message6":null} {"message1":null,"message2":null,"message3":null,"message4":null,"message5":null,"message6":null}