本文共 15472 字,大约阅读时间需要 51 分钟。
为了方便开发人员使用函数,Hive提供了大量的内置函数,包括数学函数,集合函数,类型转换函数,条件函数,字符函数,聚合函数,表生成函数,日期函数
返回值类型 | 方法名 | 描述 |
---|---|---|
DOUBLE | round(DOUBLE a) | 返回对a四舍五入的最大整数值,类型为double |
DOUBLE | round(DOUBLE a, INT n) | 对a取小数位,返回保留n位小数, |
BIGINT | floor(DOUBLE a) | 向下取整 |
BIGINT | ceil(DOUBLE a)/ceiling(double a) | 向上取整 |
DOUBLE | rand(), rand(INT seed) | 返回一个随机数 seed是种子,种子一样随机数一样 |
DOUBLE | exp(DOUBLE a), exp(DECIMAL a) | 返回e的a幂次方, a可为小数 |
DOUBLE | pow(double a,double b) | a的b次方 |
DOUBLE | sqrt(double a) | 开根号,若a为负数返回null |
DOUBLE | abs(double a) | 取绝对值 |
int | pmod(int a,int b) | a对b取余 |
演示如下:
hive> select round(10.234);OK10.0Time taken: 0.07 seconds, Fetched: 1 row(s)hive> select round(10.234,2);OK10.23Time taken: 0.043 seconds, Fetched: 1 row(s)hive> select floor(10.3);OK10Time taken: 0.088 seconds, Fetched: 1 row(s)hive> select floor(-10.3);OK-11Time taken: 0.042 seconds, Fetched: 1 row(s)hive> select ceil(10.3);OK11Time taken: 0.048 seconds, Fetched: 1 row(s)hive> select ceiling(10.3);OK11Time taken: 0.05 seconds, Fetched: 1 row(s)hive> select rand();OK0.4474707641445139Time taken: 0.066 seconds, Fetched: 1 row(s)hive> select rand(10);OK0.7304302967434272Time taken: 0.045 seconds, Fetched: 1 row(s)hive> select exp(3);OK20.085536923187668Time taken: 0.054 seconds, Fetched: 1 row(s)hive> select pow(3,2);OK9.0Time taken: 0.072 seconds, Fetched: 1 row(s)hive> select sqrt(10);OK3.1622776601683795Time taken: 0.043 seconds, Fetched: 1 row(s)hive> select abs(-10.1);OK10.1Time taken: 0.039 seconds, Fetched: 1 row(s)hive> select pmod(16,3);OK1Time taken: 0.036 seconds, Fetched: 1 row(s)
返回值类型 | 方法名 | 描述 |
---|---|---|
int | size(Map<K.V> | 求map的长度 |
int | size(Array) | 求数组的长度 |
array | map_keys(Map<K.V>) | 返回map中的所有key |
array | map_values(Map<K.V>) | 返回map中的所有value |
boolean | array_contains(Array, value) | 若该数组Array包含value返回true。,否则返回false |
array | sort_array(Array) | 按自然顺序对数组进行排序并返回,只能升序 |
如下演示:
hive> select size(map(1,'a',2,'b'));OK2Time taken: 0.052 seconds, Fetched: 1 row(s)hive> select size(array(1,2,3,4,5,6));OK6Time taken: 0.057 seconds, Fetched: 1 row(s)hive> select map_keys(map(1,'a',2,'b'));OK[1,2]Time taken: 0.059 seconds, Fetched: 1 row(s)hive> select map_values(map(1,'a',2,'b'));OK["a","b"]Time taken: 0.058 seconds, Fetched: 1 row(s)hive> select array_contains(array(1,2,3,4,5,6),6);OKtrueTime taken: 0.032 seconds, Fetched: 1 row(s)hive> select sort_array(array(1,2,3,6,5,7,8,0));OK[0,1,2,3,5,6,7,8]Time taken: 0.041 seconds, Fetched: 1 row(s)hive>
返回值类型 | 方法名 | 描述 |
---|---|---|
binary | binary(string|binary) | 将输入的值转换成二进制,一般不用 |
NULL或type | cast(expr as<type>) | 将expr转换成type类型 如:cast(“1” as NT) 将字符串1转换成了INT类型,如果转换失败将返回NULL |
如下演示:
hive> select cast('1' as int)*cast('10' as int);OK10Time taken: 0.043 seconds, Fetched: 1 row(s)hive> select binary('123456');OK123456Time taken: 0.121 seconds, Fetched: 1 row(s)
返回值类型 | 方法名 | 描述 |
---|---|---|
string | from_unixtime(bigint unixtime[, string format]) | 将时间的秒值转换成format格式 |
bigint | unix_timestamp() | 获取本地时区下的时间戳 |
bigint | unix_timestamp(string date) | 将格式为yyyy-MM-dd HH:mm:ss的时间字符串转换成时间戳 |
bigint | unix_timestamp(string date, string pattern) | 将指定时间字符串格式字符串转换成Unix时间戳,如果格式不对返回0 |
string | to_date(string timestamp) | 返回时间字符串的日期部分 |
int | year(string date) | 返回时间字符串的年份部分 |
int | month(string date) | 返回时间字符串的月份部分 |
int | day(string date) | 返回时间字符串的天(月份) |
int | hour(string date) | 返回时间字符串的小时 |
int | minute(string date) | 返回时间字符串的分钟 |
int | second(string date) | 返回时间字符串的秒 |
int | weekofyear(string date) | 返回时间字符串位于一年中的第几个周内 |
int | dayofweek(string date) | 返回时间字符串位于一周中的第几天,星期天开始位第一天 |
int | datediff(string enddate, string startdate) | 计算开始时间startdate到结束时间enddate相差的天数 |
string | date_add(string startdate, int days) | 从开始时间startdate加上days天数 |
string | date_sub(string startdate, int days) | 从开始时间startdate减去days天数 |
date | current_date | 返回当前时间日期 |
timestamp | current_timestamp | 返回当前时间戳,返回的是字符串格式 |
string | add_months(string start_date, int num_months) | 返回当前时间下再增加num_months个月的日期 |
string | last_day(string date) | 返回这个月的最后一天的日期,忽略时分秒部分(HH:mm:ss) |
string | next_day(string start_date, string day_of_week) | 返回当前时间的下一个星期X所对应的日期 |
string | trunc(string date, string format) | 返回时间的最开始年份或月份 ,注意所支持的格式为MONTH/MON/MM, YEAR/YYYY/YY |
double | months_between(date1, date2) | 返回date1与date2之间相差的月份,如date1>date2,则返回正,如果date1<date2,则返回负,否则返回0.0 |
string | date_format(date/timestamp/string ts, string fmt) | fmt格式 |
演示如下:
hive> select from_unixtime(1594604100);OK2020-07-13 09:35:00Time taken: 0.045 seconds, Fetched: 1 row(s)hive> select from_unixtime(1594604100,'yyyy-MM-dd');OK2020-07-13Time taken: 0.051 seconds, Fetched: 1 row(s)hive> select unix_timestamp();unix_timestamp(void) is deprecated. Use current_timestamp instead.OK1594616493Time taken: 0.058 seconds, Fetched: 1 row(s)hive> select unix_timestamp('2020-07-13 09:35:00');OK1594604100Time taken: 0.059 seconds, Fetched: 1 row(s)hive> select to_date('2020-07-13 09:35:00');OK2020-07-13Time taken: 0.096 seconds, Fetched: 1 row(s)hive> select year('2020-07-13 09:35:00');OK2020Time taken: 0.051 seconds, Fetched: 1 row(s)hive> select month('2020-07-13 09:35:00');OK7Time taken: 0.051 seconds, Fetched: 1 row(s)OK13Time taken: 0.196 seconds, Fetched: 1 row(s)hive> select hour('2020-07-13 09:35:00');OK9Time taken: 0.09 seconds, Fetched: 1 row(s)hive> select minute('2020-07-13 09:35:00');OK35Time taken: 0.058 seconds, Fetched: 1 row(s)hive> select second('2020-07-13 09:35:00');OK0hive> select weekofyear('2020-07-13 09:35:00');OK29Time taken: 0.047 seconds, Fetched: 1 row(s)select case dayofweek(current_date())-1 when 1 then '星期一' when 2 then '星期二' when 3 then '星期三' when 4 then '星期四' when 5 then '星期五' when 6 then '星期六' else '星期天' end星期一hive> select datediff('2020-07-13 00:00:00','2020-07-11 09:35:00');OK2Time taken: 0.11 seconds, Fetched: 1 row(s)hive> select date_add('2020-07-13 09:35:00',30);OK2020-08-12Time taken: 0.057 seconds, Fetched: 1 row(s)hive> select date_add('2020-07-13 09:35:00',-30);OK2020-06-13Time taken: 0.057 seconds, Fetched: 1 row(s)hive> select date_sub('2020-07-13 09:35:00',30);OK2020-06-13Time taken: 0.045 seconds, Fetched: 1 row(s)hive> select current_date();OK2020-07-13Time taken: 0.039 seconds, Fetched: 1 row(s)hive> select current_timestamp();OK2020-07-13 13:47:37.025Time taken: 0.038 seconds, Fetched: 1 row(s)hive> select add_months('2020-07-13 09:35:00', 4);OK2020-11-13Time taken: 0.074 seconds, Fetched: 1 row(s)hive> select last_day(current_date());OK2020-07-31Time taken: 0.091 seconds, Fetched: 1 row(s)hive> select next_day(current_date(),'TU');OK2020-07-14Time taken: 0.052 seconds, Fetched: 1 row(s)hive> select trunc(current_date(),'MM');OK2020-07-01Time taken: 0.071 seconds, Fetched: 1 row(s)hive> select months_between(current_date(),'2020-05-18');OK1.83870968Time taken: 0.055 seconds, Fetched: 1 row(s)hive> select date_format('2020-07-13','yyyy-MM-dd HH:mm:ss');OK2020-07-13 00:00:00Time taken: 0.091 seconds, Fetched: 1 row(s)hive> select date_format('2020-07-13','yyyy-dd');OK2020-13Time taken: 0.048 seconds, Fetched: 1 row(s)
返回值类型 | 方法名 | 描述 |
---|---|---|
T | if(boolean testCondition, T valueTrue, T valueFalseOrNull) | 如果testCondition 为true就返回valueTrue,否则返回valueFalseOrNull |
T | nvl(T value, T default_value) | 如果value值为NULL就返回default_value,否则返回value |
T | COALESCE(T v1, T v2, …) | 返回第一非null的值,如果全部都为NULL就返回NULL |
T | CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END | 如果a=b就返回c,a=d就返回e,否则返回f |
T | CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END | 如果a=ture就返回b,c= ture就返回d,否则返回e |
boolean | isnull( a ) | 如果a为null就返回true,否则返回false |
boolean | isnotnull ( a ) | 如果a为非null就返回true,否则返回false |
演示如下:
hive> select if(3>4,1,2);OK2Time taken: 0.097 seconds, Fetched: 1 row(s)hive> select if(3>2,'hello',2);OKhelloTime taken: 0.041 seconds, Fetched: 1 row(s)hive> select nvl(null,'a');OKaTime taken: 0.04 seconds, Fetched: 1 row(s)hive> select nvl('a','b');OKaTime taken: 0.048 seconds, Fetched: 1 row(s)hive> select coalesce(null,'hello');OKhelloTime taken: 0.048 seconds, Fetched: 1 row(s)hive> select coalesce(null,null,null,'hello');OKhelloTime taken: 0.032 seconds, Fetched: 1 row(s)hive> select case when 3>4 then 'hello' when 3<4 then 'world' end;OKworldTime taken: 0.088 seconds, Fetched: 1 row(s)hive> select case 3 when 3 then 'hello' when 4 then 'world' end;OKhelloTime taken: 0.028 seconds, Fetched: 1 row(s)hive> select isnull(null);OKtrueTime taken: 0.16 seconds, Fetched: 1 row(s)hive> select isnull('a');OKfalseTime taken: 0.047 seconds, Fetched: 1 row(s)hive> select isnotnull(null);OKfalseTime taken: 0.034 seconds, Fetched: 1 row(s)hive> select isnotnull('a');OKtrueTime taken: 0.099 seconds, Fetched: 1 row(s)
返回值类型 | 方法名 | 描述 |
---|---|---|
BIGINT | count(*), count(expr), count(DISTINCT expr[, expr…]) | count(*)统计总行数包括含有NULL值的行/count(列)统计列中非null行数,可去重 |
DOUBLE | sum(col), sum(DISTINCT col) | 求指定列的和,列可去重 |
DOUBLE | avg(col), avg(DISTINCT col) | 表示求指定列的平均值,列可去重 |
DOUBLE | min(col) | 求指定列的最小值 |
DOUBLE | max(col) | 求指定列的最大值 |
array | collect_list() | 返回每个组列中的对象集,不去重 |
array | collect_set() | 返回每个组列中的对象集,并且去重 |
DOUBLE | variance(col), var_pop(col) | 求指定列数值的方差 |
DOUBLE | var_samp(col) | 求指定列数值的样本方差 |
DOUBLE | stddev_pop(col) | 求指定列数值的标准偏差 |
DOUBLE | stddev_samp(col) | 求指定列数值的样本标准偏差 |
DOUBLE | covar_pop(col1, col2) | 求指定列数值的协方差 |
DOUBLE | covar_samp(col1, col2) | 求指定列数值的样本协方差 |
DOUBLE | corr(col1, col2) | 返回两列数值的相关系数 |
DOUBLE | percentile(BIGINT col, p) | 返回col的p%分位数 |
返回值类型 | 方法名 | 描述 |
---|---|---|
N rows | explode(ARRAY) | 每行对应数组中的一个元素 |
N rows | explode(MAP) | 每行对应每个map键-值,其中一个字段是map的键,另一个字段是map的值 |
N rows | posexplode(ARRAY) | 与explode类似,不同的是还返回各元素在数组中的位置 |
N rows | stack(INT n, v_1, v_2, …, v_k) | 把M列转换成N行,每行有M/N个字段,其中n必须是个常数 |
tuple | json_tuple(jsonStr, k1, k2, …) | 从一个JSON字符串中获取多个键并作为一个元组返回 |
演示如下:
hive> select explode(array(1,2,3,4));OK1234Time taken: 0.048 seconds, Fetched: 4 row(s)hive> select explode(map('A','apple','o','orage'));OKA appleo orageTime taken: 0.056 seconds, Fetched: 2 row(s)hive> select posexplode(array('a','b','c'));OK0 a1 b2 cTime taken: 0.044 seconds, Fetched: 3 row(s)hive> select stack(2,'a','b','c','d');OKa bc dTime taken: 0.032 seconds, Fetched: 2 row(s)hive> select json_tuple('{"name":"zs","age":"20"}','name','age');OKzs 20Time taken: 0.057 seconds, Fetched: 1 row(s)
返回值类型 | 方法名 | 描述 |
---|---|---|
int | length(string A) | 返回字符串的长度 |
string | concat(string|binary A, string|binary B…) | 对二进制字节码或字符串按次序进行拼接 |
string | concat_ws(string SEP, string A, string B…) | 与concat()类似,但使用指定的分隔符进行分隔,比较方便 |
int | find_in_set(string str, string strList) | 返回以逗号分隔的字符串中str出现的位置,如果参数str为逗号或查找失败将返回0,如果任一参数为NULL将返回NULL |
string | format_number(number x, int d) | 将数值X转换成"x,xxx,xxx.xx"格式字符串,并保留d位小数,如果d为0,将进行四舍五入且不保留小数 |
string | get_json_object(string json_string, string path) | 从指定路径上的JSON字符串抽取出JSON对象,并返回这个对象的JSON格式,如果输入的JSON是非法的将返回NULL,注意此路径上JSON字符串只能由数字 字母 下划线组成且不能有大写字母和特殊字符,且key不能由数字开头,这是由于Hive对列名的限制 |
int | instr(string str, string substr) | 查找字符串str中子字符串substr出现的位置,如果查找失败将返回0,如果任一参数为Null将返回null,注意位置为从1开始的 |
int | locate(string substr, string str[, int pos]) | 查找字符串str中的pos位置后字符串substr第一次出现的位置 |
string | lower(string A) lcase(string A) | 将字符串A的所有字母转换成小写字母 |
string | upper(string A) | 将字符串A的所有字母转换成大写字母 |
string | lpad(string str, int len, string pad) | 从左边开始对字符串str使用字符串pad填充,最终len长度为止,如果字符串str本身长度比len大的话,将去掉多余的部分 |
string | ltrim(string A) | 去掉字符串A前面的空格 |
string | rtrim(string A) | 去掉字符串A后面的空格 |
string | trim(string A) | 去掉字符串A两端空格 |
string | parse_url(string urlString, string partToExtract [, string keyToExtract]) | 返回从URL中抽取指定部分的内容,参数url是URL字符串,而参数partToExtract是要抽取的部分,这个参数包含(HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO) |
string | regexp_extract(string subject, string pattern, int index) | 正则截取 |
string | regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) | 正则替换 |
string | repeat(string str, int n) | 重复输出n次字符串str |
string | reverse(string A) | 反转字符串 |
array | split(string str, string pat) | 按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回 |
string | substr(string A, [int start],int length) substring(string, [int start],int length) | 对于字符串A,从start位置开始截取length长度字符串并返回 |
string | translate(string input, string from, string to) | 将input出现在from中的字符串替换成to中的字符串 如:translate(“MOBIN”,“BIN”,“M”)=“MOM” |
string | initcap(string A) | 将字符串A转换第一个字母大写其余字母的字符串 |
演示如下:
hive> select length('abc');OK3Time taken: 0.033 seconds, Fetched: 1 row(s)hive> select concat('userid',',','username',',','birthday');OKuserid,username,birthdayTime taken: 0.053 seconds, Fetched: 1 row(s)hive> select concat_ws(',','userid','username');OKuserid,usernameTime taken: 0.031 seconds, Fetched: 1 row(s)hive> select find_in_set('ab','ab,abc,ede,acd,abe');OK1Time taken: 0.058 seconds, Fetched: 1 row(s)hive> select format_number(1234456.565,2);OK1,234,456.56Time taken: 0.08 seconds, Fetched: 1 row(s)hive> select get_json_object('{"userid":"1","username":"zs"}','$.username');OKzsTime taken: 0.046 seconds, Fetched: 1 row(s)hive> select instr('abcdfe','fe');OK5Time taken: 0.122 seconds, Fetched: 1 row(s)hive> select locate('ef','abefcdefd',1);OK3Time taken: 0.036 seconds, Fetched: 1 row(s)hive> select locate('ef','abefcdefd',5);OK7Time taken: 0.031 seconds, Fetched: 1 row(s)hive> select lower('ABCde');OKabcdeTime taken: 0.038 seconds, Fetched: 1 row(s)hive> select lcase('ABCde');OKabcdeTime taken: 0.044 seconds, Fetched: 1 row(s)hive> select upper('addA');OKADDATime taken: 0.047 seconds, Fetched: 1 row(s)hive> select lpad('123',2,'0');OK12Time taken: 0.025 seconds, Fetched: 1 row(s)hive> select lpad('123',8,'0');OK00000123Time taken: 0.041 seconds, Fetched: 1 row(s)hive> select lpad('123',length('123')+1,'$');OK$123Time taken: 0.076 seconds, Fetched: 1 row(s)hive> select trim(' abc ');OKabcTime taken: 0.066 seconds, Fetched: 1 row(s)hive> select ltrim(' abc ');OKabcTime taken: 0.059 seconds, Fetched: 1 row(s)hive> select rtrim(' abc ');OK abcTime taken: 0.025 seconds, Fetched: 1 row(s)hive> select length(ltrim(' abc '));OK4Time taken: 0.034 seconds, Fetched: 1 row(s)hive> select length(rtrim(' abc '));OK4Time taken: 0.075 seconds, Fetched: 1 row(s)hive> select length(trim(' abc '));OK3Time taken: 0.031 seconds, Fetched: 1 row(s)hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','QUERY');OKname=1&pwd=123456Time taken: 0.06 seconds, Fetched: 1 row(s)hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','QUERY','pwd');OK123456Time taken: 0.022 seconds, Fetched: 1 row(s)hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','HOST');OKfacebook.comTime taken: 0.036 seconds, Fetched: 1 row(s)hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','PATH');OK/path/p1.phpTime taken: 0.037 seconds, Fetched: 1 row(s)hive> select parse_url('https://facebook.com/path/p1.php?name=1&pwd=123456','PROTOCOL');OKhttpsTime taken: 0.035 seconds, Fetched: 1 row(s)hive> select regexp_extract("hello,world","^((\\w+),(\\w+))$",2); -- 正则提取;OKhelloTime taken: 0.039 seconds, Fetched: 1 row(s)hive> select regexp_replace("hello,world","o|l","e"); -- 正则替换;OKheeee,weredTime taken: 0.042 seconds, Fetched: 1 row(s)hive> select repeat("abc",5); -- 重复输出;OKabcabcabcabcabcTime taken: 0.062 seconds, Fetched: 1 row(s)hive> select reverse("abc"); -- 反转字符;OKcbaTime taken: 0.04 seconds, Fetched: 1 row(s)hive> select split('hello|world,abc|edf kyy',"[\\|, ]"); -- 正则分割;OK["hello","world","abc","edf","kyy"]Time taken: 0.065 seconds, Fetched: 1 row(s)hive> select substr('hello,world',1,5); -- 字符串截取 开始位置 长度;OKhelloTime taken: 0.041 seconds, Fetched: 1 row(s)hive> select translate('hello,world','world','func'); -- 密码常用 一一对应 没有则截掉;OKheccu,funcTime taken: 0.047 seconds, Fetched: 1 row(s)hive> select translate('hello,world','world','cc');OKhec,ccTime taken: 0.048 seconds, Fetched: 1 row(s)hive> select translate('hello,world','world','c'); -- 一个对一个 > ;OKhe,cTime taken: 0.048 seconds, Fetched: 1 row(s)hive> select initcap('hello world'); -- 首字母大写 空格分隔;OKHello WorldTime taken: 0.031 seconds, Fetched: 1 row(s)
转载地址:http://vcjxi.baihongyu.com/