Overview¶

Table 1 lists the string functions supported by DLI.

**Table 1** String functions¶
Syntax	Value Type	Description
ascii(string <str>)	BIGINT	Returns the numeric value of the first character in a string.
concat(array<T> <a>, array<T> <b>[,...]), concat(string <str1>, string <str2>[,...])	ARRAY or STRING	Returns a string concatenated from multiple input strings. This function can take any number of input strings.
concat_ws(string <separator>, string <str1>, string <str2>[,...]), concat_ws(string <separator>, array<string> <a>)	ARRAY or STRUCT	Returns a string concatenated from multiple input strings that are separated by specified separators.
char_matchcount(string <str1>, string <str2>)	BIGINT	Returns the number of characters in str1 that appear in str2.
encode(string <str>, string <charset>)	BINARY	Returns strs encoded in charset format.
find_in_set(string <str1>, string <str2>)	BIGINT	Returns the position (stating from 1) of str1 in str2 separated by commas (,).
get_json_object(string <json>, string <path>)	STRING	Parses the JSON object in a specified JSON path. The function will return NULL if the JSON object is invalid.
instr(string <str>, string <substr>)	INT	Returns the index of substr that appears earliest in str. Returns NULL if either of the arguments are NULL and returns 0 if substr does not exist in str. Note that the first character in str has index 1.
instr1(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]])	BIGINT	Returns the position of str2 in str1.
initcap(string A)	STRING	Converts the first letter of each word of a string to upper case and all other letters to lower case.
keyvalue(string <str>,[string <split1>,string <split2>,] string <key>)	STRING	Splits str by split1, converts each group into a key-value pair by split2, and returns the value corresponding to the key.
length(string <str>)	BIGINT	Returns the length of a string.
lengthb(string <str>)	STRING	Returns the length of a specified string in bytes.
levenshtein(string A, string B)	INT	Returns the Levenshtein distance between two strings, for example, levenshtein('kitten','sitting') = 3.
locate(string <substr>, string <str>[, bigint <start_pos>])	BIGINT	Returns the position of substr in str.
lower(string A) , lcase(string A)	STRING	Converts all characters of a string to the lower case.
lpad(string <str1>, int <length>, string <str2>)	STRING	Returns a string of a specified length. If the length of the given string (str1) is shorter than the specified length (length), the given string is left-padded with str2 to the specified length.
ltrim([<trimChars>,] string <str>)	STRING	Trims spaces from the left hand side of a string.
parse_url(string urlString, string partToExtract [, string keyToExtract])	STRING	Returns the specified part of a given URL. Valid values of partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. When the second parameter is set to QUERY, the third parameter can be used to extract the value of a specific parameter. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'.
printf(String format, Obj... args)	STRING	Prints the input in a specific format.
regexp_count(string <source>, string <pattern>[, bigint <start_position>])	BIGINT	Returns the number of substrings that match a specified pattern in the source, starting from the start_position position.
regexp_extract(string <source>, string <pattern>[, bigint <groupid>])	STRING	Matches the string source based on the pattern grouping rule and returns the string content that matches groupid.
replace(string <str>, string <old>, string <new>)	STRING	Replaces the substring that matches a specified string in a string with another string.
For Spark 2.4.5: regexp_replace(string <source>, string <pattern>, string <replace_string>) For Spark 3.3.1: regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])	STRING	For Spark 2.4.5: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string. For Spark 3.3.1: Replaces the substring that matches the pattern for the occurrence time in the source string and the substring that matches the pattern later with the specified string replace_string and returns the result string.
regexp_replace1(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])	STRING	Replaces the substring that matches pattern for the occurrence time in the source string with the specified string replace_string and returns the result string.
regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]])	BIGINT	Returns the start or end position of the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.
regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]])	STRING	Returns the substring that matches a specified pattern for the occurrence time, starting from start_position in the source string.
repeat(string <str>, bigint <n>)	STRING	Repeats a string for N times.
reverse(string <str>)	STRING	Returns a string in reverse order.
rpad(string <str1>, int <length>, string <str2>)	STRING	Right-pads str1 with str2 to the specified length.
rtrim([<trimChars>, ]string <str>), rtrim(trailing [<trimChars>] from <str>)	STRING	Trims spaces from the right hand side of a string.
soundex(string <str>)	STRING	Returns the soundex string from str, for example, soundex('Miller') = M460.
space(bigint <n>)	STRING	Returns a specified number of spaces.
substr(string <str>, bigint <start_position>[, bigint <length>]), substring(string <str>, bigint <start_position>[, bigint <length>])	STRING	Returns the substring of str, starting from start_position and with a length of length.
substring_index(string <str>, string <separator>, int <count>)	STRING	Truncates the string before the count separator of str. If the value of count is positive, the string is truncated from the left. If the value of count is negative, the string is truncated from the right.
split_part(string <str>, string <separator>, bigint <start>[, bigint <end>])	STRING	Splits a specified string based on a specified separator and returns a substring from the start to end position.
translate(string\|char\|varchar input, string\|char\|varchar from, string\|char\|varchar to)	STRING	Translates the input string by replacing the characters or string specified by from with the characters or string specified by to. For example, replaces bcd in abcde with BCD using translate("abcde", "bcd", "BCD").
trim([<trimChars>,]string <str>), trim([BOTH] [<trimChars>] from <str>)	STRING	Trims spaces from both ends of a string.
upper(string A), ucase(string A)	STRING	Converts all characters of a string to the upper case.

last updated: 2025-02-13 16:25 UTC - commit: 17367e42c9c12af57bae86cbe2bad77e571dc82c