Overview

Table 1 lists the string functions supported by DLI.

Table 1 String functions

Syntax

Value Type

Description

ascii(string <str>)

BIGINT

Returns the numeric value of the first character in a string.

concat(array<T> <a>, array<T> <b>[,...]), concat(string <str1>, string <str2>[,...])

ARRAY or STRING

Returns a string concatenated from multiple input strings. This function can take any number of input strings.

concat_ws(string <separator>, string <str1>, string <str2>[,...]), concat_ws(string <separator>, array<string> <a>)

ARRAY or STRUCT

Returns a string concatenated from multiple input strings that are separated by specified separators.

char_matchcount(string <str1>, string <str2>)

BIGINT

Returns the number of characters in str1 that appear in str2.

encode(string <str>, string <charset>)

BINARY

Returns strs encoded in charset format.

find_in_set(string <str1>, string <str2>)

BIGINT

Returns the position (stating from 1) of str1 in str2 separated by commas (,).

get_json_object(string <json>, string <path>)

STRING

Parses the JSON object in a specified JSON path. The function will return NULL if the JSON object is invalid.

instr(string <str>, string <substr>)

INT

Returns the index of substr that appears earliest in str. Returns NULL if either of the arguments are NULL and returns 0 if substr does not exist in str. Note that the first character in str has index 1.

instr1(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]])

BIGINT

Returns the position of str2 in str1.

initcap(string A)

STRING

Converts the first letter of each word of a string to upper case and all other letters to lower case.

keyvalue(string <str>,[string <split1>,string <split2>,] string <key>)

STRING

Splits str by split1, converts each group into a key-value pair by split2, and returns the value corresponding to the key.

length(string <str>)

BIGINT

Returns the length of a string.

lengthb(string <str>)

STRING

Returns the length of a specified string in bytes.

levenshtein(string A, string B)

INT

Returns the Levenshtein distance between two strings, for example, levenshtein('kitten','sitting') = 3.

locate(string <substr>, string <str>[, bigint <start_pos>])

BIGINT

Returns the position of substr in str.

lower(string A) , lcase(string A)

STRING

Converts all characters of a string to the lower case.

lpad(string <str1>, int <length>, string <str2>)

STRING

Returns a string of a specified length. If the length of the given string (str1) is shorter than the specified length (length), the given string is left-padded with str2 to the specified length.

ltrim([<trimChars>,] string <str>)

STRING

Trims spaces from the left hand side of a string.

parse_url(string urlString, string partToExtract [, string keyToExtract])

STRING

Returns the specified part of a given URL. Valid values of partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'.

When the second parameter is set to QUERY, the third parameter can be used to extract the value of a specific parameter. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'.

printf(String format, Obj... args)

STRING

Prints the input in a specific format.

regexp_count(string <source>, string <pattern>[, bigint <start_position>])

BIGINT

Returns the number of substrings that match a specified pattern in the source, starting from the start_position position.

regexp_extract(string <source>, string <pattern>[, bigint <groupid>])

STRING

Matches the string source based on the pattern grouping rule and returns the string content that matches groupid.

replace(string <str>, string <old>, string <new>)

STRING

Replaces the substring that matches a specified string in a string with another string.

  • For Spark 2.4.5: regexp_replace(string <source>, string <pattern>, string <replace_string>)

  • For Spark 3.3.1: regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])

STRING

  • For Spark 2.4.5: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string.

  • For Spark 3.3.1: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string.

regexp_replace1(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])

STRING

Replaces the substring that matches pattern for the occurrence time in the source string with the specified string replace_string and returns the result string.

regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]])

BIGINT

Returns the start or end position of the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.

regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]])

STRING

Returns the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.

repeat(string <str>, bigint <n>)

STRING

Repeats a string for N times.

reverse(string <str>)

STRING

Returns a string in reverse order.

rpad(string <str1>, int <length>, string <str2>)

STRING

Right-pads str1 with str2 to the specified length.

rtrim([<trimChars>, ]string <str>),

rtrim(trailing [<trimChars>] from <str>)

STRING

Trims spaces from the right hand side of a string.

soundex(string <str>)

STRING

Returns the soundex string from str, for example, soundex('Miller') = M460.

space(bigint <n>)

STRING

Returns a specified number of spaces.

substr(string <str>, bigint <start_position>[, bigint <length>]), substring(string <str>, bigint <start_position>[, bigint <length>])

STRING

Returns the substring of str, starting from start_position and with a length of length.

substring_index(string <str>, string <separator>, int <count>)

STRING

Truncates the string before the count separator of str. If the value of count is positive, the string is truncated from the left. If the value of count is negative, the string is truncated from the right.

split_part(string <str>, string <separator>, bigint <start>[, bigint <end>])

STRING

Splits a specified string based on a specified separator and returns a substring from the start to end position.

translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)

STRING

Translates the input string by replacing the characters or string specified by from with the characters or string specified by to. For example, replaces bcd in abcde with BCD using translate("abcde", "bcd", "BCD").

trim([<trimChars>,]string <str>),

trim([BOTH] [<trimChars>] from <str>)

STRING

Trims spaces from both ends of a string.

upper(string A), ucase(string A)

STRING

Converts all characters of a string to the upper case.