hive学习笔记之七：内置函数

欢迎访问我的GitHubhttps://github.com/zq2599/blog_demos
内容：所有原创文章分类汇总及配套源码，涉及Java、Docker、Kubernetes、DevOPS等；
《hive学习笔记》系列导航

基本数据类型
复杂数据类型
内部表和外部表
分区表
分桶
HiveQL基础
内置函数
Sqoop
基础UDF
用户自定义聚合函数(UDAF)
UDTF

本篇概览

本文是《hive学习笔记》系列的第七篇，前文熟悉了HiveQL的常用语句，接下来把常用的内置函数简单过一遍，分为以下几部分：

数学
字符
json处理
转换
日期
条件
聚合

准备数据

本次实战要准备两个表：学生表和住址表，字段都很简单，如下图所示，学生表有个住址ID字段，是住址表里的记录的唯一ID：

文章插图

2. 先创建住址表：
create table address (addressid int, province string, city string) row format delimited fields terminated by ',';

创建address.txt文件，内容如下：

1,guangdong,guangzhou2,guangdong,shenzhen3,shanxi,xian4,shanxi,hanzhong6,jiangshu,nanjing

加载数据到address表：

load data local inpath '/home/hadoop/temp/202010/25/address.txt' into table address;

创建学生表，其addressid字段关联了address表的addressid字段：

create table student (name string, age int, addressid int) row format delimited fields terminated by ',';

创建student.txt文件，内容如下：

tom,11,1jerry,12,2mike,13,3john,14,4mary,15,5

加载数据到student表：

load data local inpath '/home/hadoop/temp/202010/25/student.txt' into table student;

至此，本次操作所需数据已准备完毕，如下所示：

hive> select * from address;OK1	guangdong	guangzhou2	guangdong	shenzhen3	shanxi	xian4	shanxi	hanzhong6	jiangshu	nanjingTime taken: 0.043 seconds, Fetched: 5 row(s)hive> select * from student;OKtom	11	1jerry	12	2mike	13	3john	14	4mary	15	5Time taken: 0.068 seconds, Fetched: 5 row(s)

开始体验内置函数；

总览

进入hive控制台；
执行命令show functions;显示内置函数列表：

hive> show functions;OK!!=%&*+-/<<=<=><>===>>=^absacosadd_monthsandarrayarray_containsasciiasinassert_trueatanavgbase64betweenbincasecbrtceilceilingcoalescecollect_listcollect_setcompute_statsconcatconcat_wscontext_ngramsconvcorrcoscountcovar_popcovar_sampcreate_unioncume_distcurrent_databasecurrent_datecurrent_timestampcurrent_userdate_adddate_formatdate_subdatediffdaydayofmonthdecodedegreesdense_rankdiveeltencodeewah_bitmapewah_bitmap_andewah_bitmap_emptyewah_bitmap_orexpexplodefactorialfieldfind_in_setfirst_valuefloorformat_numberfrom_unixtimefrom_utc_timestampget_json_objectgreatesthashhexhistogram_numerichourifinin_fileindexinitcapinlineinstrisnotnullisnulljava_methodjson_tuplelaglast_daylast_valuelcaseleadleastlengthlevenshteinlikelnlocateloglog10log2lowerlpadltrimmapmap_keysmap_valuesmatchpathmaxminminutemonthmonths_betweennamed_structnegativenext_dayngramsnoopnoopstreamingnoopwithmapnoopwithmapstreamingnotntilenvlorparse_urlparse_url_tuplepercent_rankpercentilepercentile_approxpipmodposexplodepositivepowpowerprintfradiansrandrankreflectreflect2regexpregexp_extractregexp_replacerepeatreverserlikeroundrow_numberrpadrtrimsecondsentencesshiftleftshiftrightshiftrightunsignedsignsinsizesort_arraysoundexspacesplitsqrtstackstdstddevstddev_popstddev_sampstr_to_mapstructsubstrsubstringsumtanto_dateto_unix_timestampto_utc_timestamptranslatetrimtruncucaseunbase64unhexunix_timestampuppervar_popvar_sampvarianceweekofyearwhenwindowingtablefunctionxpathxpath_booleanxpath_doublexpath_floatxpath_intxpath_longxpath_numberxpath_shortxpath_stringyear|~Time taken: 0.003 seconds, Fetched: 216 row(s)

以lower函数为例，执行命令describe function lower;即可查看lower函数的说明：

hive> describe function lower;OKlower(str) - Returns str with all characters changed to lowercaseTime taken: 0.005 seconds, Fetched: 1 row(s)

接下来从计算函数开始，体验常用函数；
先执行以下命令，使查询结果中带有字段名：

set hive.cli.print.header=true;计算函数

加法+：

hive> select name, age, age+1 as add_value from student;OKname	age	add_valuetom	11	12jerry	12	13mike	13	14john	14	15mary	15	16Time taken: 0.098 seconds, Fetched: 5 row(s)

减法(-)、乘法(*)、除法(/)的使用与加法类似，不再赘述了；
四舍五入round：

hive> select round(1.1), round(1.6);OK_c0 _c11.0 2.0Time taken: 0.028 seconds, Fetched: 1 row(s)

向上取整ceil：

hive> select ceil(1.1);OK_c02Time taken: 0.024 seconds, Fetched: 1 row(s)

向下取整floor：

hive> select floor(1.1);OK_c01Time taken: 0.024 seconds, Fetched: 1 row(s)

平方pow ，例如pow(2,3)表示2的三次方，等于8：

hive> select pow(2,3);OK_c08.0Time taken: 0.027 seconds, Fetched: 1 row(s)

取模pmod：

hive> select pmod(10,3);OK_c01Time taken: 0.059 seconds, Fetched: 1 row(s)字符函数

转小写lower ，转大写upper：

hive> select lower(name), upper(name) from student;OK_c0	_c1tom	TOMjerry	JERRYmike	MIKEjohn	JOHNmary	MARYTime taken: 0.051 seconds, Fetched: 5 row(s)

字符串长度length：

hive> select name, length(name) from student;OKtom	3jerry	5mike	4john	4mary	4Time taken: 0.322 seconds, Fetched: 5 row(s)

字符串拼接concat：

hive> select concat("prefix_", name) from student;OKprefix_tomprefix_jerryprefix_mikeprefix_johnprefix_maryTime taken: 0.106 seconds, Fetched: 5 row(s)

子串substr ， substr(xxx,2)表示从第二位开始到右边所有， substr(xxx,2,3)表示从第二位开始取三个字符：

hive> select substr("0123456",2);OK123456Time taken: 0.067 seconds, Fetched: 1 row(s)hive> select substr("0123456",2,3);OK123Time taken: 0.08 seconds, Fetched: 1 row(s)

去掉前后空格trim：

hive> select trim("123");OK123Time taken: 0.065 seconds, Fetched: 1 row(s)json处理(get_json_object)为了使用json处理的函数，先准备一些数据：

先创建表t15 ，只有一个字段用于保存字符串：

create table t15(json_raw string) row format delimited;

创建t15.txt文件，内容如下：

{"name":"tom","age":"10"}{"name":"jerry","age":"11"}

加载数据到t15表：

load data local inpath '/home/hadoop/temp/202010/25/015.txt' into table t15;

使用get_json_object函数，解析json_raw字段，分别取出指定name和age属性：

select get_json_object(json_raw, "$.name"), get_json_object(json_raw, "$.age") from t15;得到结果：

hive> select> get_json_object(json_raw, "$.name"),> get_json_object(json_raw, "$.age")> from t15;OKtom	10jerry	11Time taken: 0.081 seconds, Fetched: 2 row(s)

日期

获取当前日期current_date：

hive> select current_date();OK2020-11-02Time taken: 0.052 seconds, Fetched: 1 row(s)

获取当前时间戳current_timestamp：

hive> select current_timestamp();OK2020-11-02 10:07:58.967Time taken: 0.049 seconds, Fetched: 1 row(s)

获取年份year、月份month、日期day：

hive> select year(current_date()), month(current_date()), day(current_date());OK2020	11	2Time taken: 0.054 seconds, Fetched: 1 row(s)

另外， year和current_timestamp也能搭配使用：

hive> select year(current_timestamp()), month(current_timestamp()), day(current_timestamp());OK2020	11	2Time taken: 0.042 seconds, Fetched: 1 row(s)

返回日期部分to_date：

hive> select to_date(current_timestamp());OK2020-11-02Time taken: 0.051 seconds, Fetched: 1 row(s)条件函数

条件函数的作用和java中的switch类似，语法是case X when XX then XXX else XXXX end；
示例如下，作用是判断name字段，如果等于tom就返回tom_case ，如果等于jerry就返回jerry_case ，其他情况都返回other_case：

select name,case name when 'tom' then 'tom_case'when 'jerry' then 'jerry_case'else 'other_case'endfrom student;结果如下：
【hive学习笔记之七：内置函数】

hive> select name,> case name when 'tom' then 'tom_case'>when 'jerry' then 'jerry_case'>else 'other_case'> end> from student;OKtom	tom_casejerry	jerry_casemike	other_casejohn	other_casemary	other_caseTime taken: 0.08 seconds, Fetched: 5 row(s)

聚合函数

返回行数count：

select count(*) from student;触发MR ，结果如下：
Total MapReduce CPU Time Spent: 2 seconds 170 msecOK5Time taken: 20.823 seconds, Fetched: 1 row(s)

分组后组内求和sum：

select province, sum(1) from address group by province;触发MR ，结果如下：

Total MapReduce CPU Time Spent: 1 seconds 870 msecOKguangdong	2jiangshu	1shanxi	2Time taken: 19.524 seconds, Fetched: 3 row(s)

分组后，组内最小值min ，最大值max ，平均值avg：

select province, min(addressid), max(addressid), avg(addressid) from address group by province;触发MR ，结果如下：

Total MapReduce CPU Time Spent: 1 seconds 650 msecOKguangdong	1	2	1.5jiangshu	6	6	6.0shanxi	3	4	3.5Time taken: 20.106 seconds, Fetched: 3 row(s)

至此， hive常用到内置函数咱们都体验过一遍了，希望能给您提供一些参考，接下来的文章会体验一个常用工具：Sqoop

你不孤单，欣宸原创一路相伴

Java系列
Spring系列
Docker系列
kubernetes系列
数据库+中间件系列
DevOps系列

欢迎关注公众号：程序员欣宸微信搜索「程序员欣宸」，我是欣宸，期待与您一同畅游Java世界...
https://github.com/zq2599/blog_demos