欢迎访问我的GitHubhttps://github.com/zq2599/blog_demos
内容:所有原创文章分类汇总及配套源码 , 涉及Java、Docker、Kubernetes、DevOPS等;
《hive学习笔记》系列导航
- 基本数据类型
- 复杂数据类型
- 内部表和外部表
- 分区表
- 分桶
- HiveQL基础
- 内置函数
- Sqoop
- 基础UDF
- 用户自定义聚合函数(UDAF)
- UDTF
- 本文是《hive学习笔记》系列的第七篇 , 前文熟悉了HiveQL的常用语句 , 接下来把常用的内置函数简单过一遍 , 分为以下几部分:
- 数学
- 字符
- json处理
- 转换
- 日期
- 条件
- 聚合
- 本次实战要准备两个表:学生表和住址表 , 字段都很简单 , 如下图所示 , 学生表有个住址ID字段 , 是住址表里的记录的唯一ID:

文章插图
2. 先创建住址表:
create table address (addressid int, province string, city string) row format delimited fields terminated by ',';- 创建address.txt文件 , 内容如下:
1,guangdong,guangzhou2,guangdong,shenzhen3,shanxi,xian4,shanxi,hanzhong6,jiangshu,nanjing- 加载数据到address表:
load data local inpath '/home/hadoop/temp/202010/25/address.txt' into table address;- 创建学生表 , 其addressid字段关联了address表的addressid字段:
create table student (name string, age int, addressid int) row format delimited fields terminated by ',';- 创建student.txt文件 , 内容如下:
tom,11,1jerry,12,2mike,13,3john,14,4mary,15,5- 加载数据到student表:
load data local inpath '/home/hadoop/temp/202010/25/student.txt' into table student;- 至此 , 本次操作所需数据已准备完毕 , 如下所示:
hive> select * from address;OK1 guangdong guangzhou2 guangdong shenzhen3 shanxi xian4 shanxi hanzhong6 jiangshu nanjingTime taken: 0.043 seconds, Fetched: 5 row(s)hive> select * from student;OKtom 11 1jerry 12 2mike 13 3john 14 4mary 15 5Time taken: 0.068 seconds, Fetched: 5 row(s)- 开始体验内置函数;
- 进入hive控制台;
- 执行命令show functions;显示内置函数列表:
hive> show functions;OK!!=%&*+-/<<=<=><>===>>=^absacosadd_monthsandarrayarray_containsasciiasinassert_trueatanavgbase64betweenbincasecbrtceilceilingcoalescecollect_listcollect_setcompute_statsconcatconcat_wscontext_ngramsconvcorrcoscountcovar_popcovar_sampcreate_unioncume_distcurrent_databasecurrent_datecurrent_timestampcurrent_userdate_adddate_formatdate_subdatediffdaydayofmonthdecodedegreesdense_rankdiveeltencodeewah_bitmapewah_bitmap_andewah_bitmap_emptyewah_bitmap_orexpexplodefactorialfieldfind_in_setfirst_valuefloorformat_numberfrom_unixtimefrom_utc_timestampget_json_objectgreatesthashhexhistogram_numerichourifinin_fileindexinitcapinlineinstrisnotnullisnulljava_methodjson_tuplelaglast_daylast_valuelcaseleadleastlengthlevenshteinlikelnlocateloglog10log2lowerlpadltrimmapmap_keysmap_valuesmatchpathmaxminminutemonthmonths_betweennamed_structnegativenext_dayngramsnoopnoopstreamingnoopwithmapnoopwithmapstreamingnotntilenvlorparse_urlparse_url_tuplepercent_rankpercentilepercentile_approxpipmodposexplodepositivepowpowerprintfradiansrandrankreflectreflect2regexpregexp_extractregexp_replacerepeatreverserlikeroundrow_numberrpadrtrimsecondsentencesshiftleftshiftrightshiftrightunsignedsignsinsizesort_arraysoundexspacesplitsqrtstackstdstddevstddev_popstddev_sampstr_to_mapstructsubstrsubstringsumtanto_dateto_unix_timestampto_utc_timestamptranslatetrimtruncucaseunbase64unhexunix_timestampuppervar_popvar_sampvarianceweekofyearwhenwindowingtablefunctionxpathxpath_booleanxpath_doublexpath_floatxpath_intxpath_longxpath_numberxpath_shortxpath_stringyear|~Time taken: 0.003 seconds, Fetched: 216 row(s)- 以lower函数为例 , 执行命令describe function lower;即可查看lower函数的说明:
hive> describe function lower;OKlower(str) - Returns str with all characters changed to lowercaseTime taken: 0.005 seconds, Fetched: 1 row(s)- 接下来从计算函数开始 , 体验常用函数;
- 先执行以下命令 , 使查询结果中带有字段名:
set hive.cli.print.header=true;计算函数- 加法+:
hive> select name, age, age+1 as add_value from student;OKname age add_valuetom 11 12jerry 12 13mike 13 14john 14 15mary 15 16Time taken: 0.098 seconds, Fetched: 5 row(s)- 减法(-)、乘法(*)、除法(/)的使用与加法类似 , 不再赘述了;
- 四舍五入round:
hive> select round(1.1), round(1.6);OK_c0 _c11.0 2.0Time taken: 0.028 seconds, Fetched: 1 row(s)- 向上取整ceil:
hive> select ceil(1.1);OK_c02Time taken: 0.024 seconds, Fetched: 1 row(s)- 向下取整floor:
hive> select floor(1.1);OK_c01Time taken: 0.024 seconds, Fetched: 1 row(s)- 平方pow , 例如pow(2,3)表示2的三次方 , 等于8:
hive> select pow(2,3);OK_c08.0Time taken: 0.027 seconds, Fetched: 1 row(s)- 取模pmod:
hive> select pmod(10,3);OK_c01Time taken: 0.059 seconds, Fetched: 1 row(s)字符函数- 转小写lower , 转大写upper:
hive> select lower(name), upper(name) from student;OK_c0 _c1tom TOMjerry JERRYmike MIKEjohn JOHNmary MARYTime taken: 0.051 seconds, Fetched: 5 row(s)- 字符串长度length:
hive> select name, length(name) from student;OKtom 3jerry 5mike 4john 4mary 4Time taken: 0.322 seconds, Fetched: 5 row(s)- 字符串拼接concat:
hive> select concat("prefix_", name) from student;OKprefix_tomprefix_jerryprefix_mikeprefix_johnprefix_maryTime taken: 0.106 seconds, Fetched: 5 row(s)- 子串substr , substr(xxx,2)表示从第二位开始到右边所有 , substr(xxx,2,3)表示从第二位开始取三个字符:
hive> select substr("0123456",2);OK123456Time taken: 0.067 seconds, Fetched: 1 row(s)hive> select substr("0123456",2,3);OK123Time taken: 0.08 seconds, Fetched: 1 row(s)- 去掉前后空格trim:
hive> select trim("123");OK123Time taken: 0.065 seconds, Fetched: 1 row(s)json处理(get_json_object)为了使用json处理的函数 , 先准备一些数据:- 先创建表t15 , 只有一个字段用于保存字符串:
create table t15(json_raw string) row format delimited;- 创建t15.txt文件 , 内容如下:
{"name":"tom","age":"10"}{"name":"jerry","age":"11"}- 加载数据到t15表:
load data local inpath '/home/hadoop/temp/202010/25/015.txt' into table t15;- 使用get_json_object函数 , 解析json_raw字段 , 分别取出指定name和age属性:
select get_json_object(json_raw, "$.name"), get_json_object(json_raw, "$.age") from t15;得到结果:hive> select> get_json_object(json_raw, "$.name"),> get_json_object(json_raw, "$.age")> from t15;OKtom 10jerry 11Time taken: 0.081 seconds, Fetched: 2 row(s)日期- 获取当前日期current_date:
hive> select current_date();OK2020-11-02Time taken: 0.052 seconds, Fetched: 1 row(s)- 获取当前时间戳current_timestamp:
hive> select current_timestamp();OK2020-11-02 10:07:58.967Time taken: 0.049 seconds, Fetched: 1 row(s)- 获取年份year、月份month、日期day:
hive> select year(current_date()), month(current_date()), day(current_date());OK2020 11 2Time taken: 0.054 seconds, Fetched: 1 row(s)- 另外 , year和current_timestamp也能搭配使用:
hive> select year(current_timestamp()), month(current_timestamp()), day(current_timestamp());OK2020 11 2Time taken: 0.042 seconds, Fetched: 1 row(s)- 返回日期部分to_date:
hive> select to_date(current_timestamp());OK2020-11-02Time taken: 0.051 seconds, Fetched: 1 row(s)条件函数- 条件函数的作用和java中的switch类似 , 语法是case X when XX then XXX else XXXX end;
- 示例如下 , 作用是判断name字段 , 如果等于tom就返回tom_case , 如果等于jerry就返回jerry_case , 其他情况都返回other_case:
select name,case name when 'tom' then 'tom_case'when 'jerry' then 'jerry_case'else 'other_case'endfrom student;结果如下:【hive学习笔记之七:内置函数】
hive> select name,> case name when 'tom' then 'tom_case'>when 'jerry' then 'jerry_case'>else 'other_case'> end> from student;OKtom tom_casejerry jerry_casemike other_casejohn other_casemary other_caseTime taken: 0.08 seconds, Fetched: 5 row(s)聚合函数- 返回行数count:
select count(*) from student;触发MR , 结果如下:Total MapReduce CPU Time Spent: 2 seconds 170 msecOK5Time taken: 20.823 seconds, Fetched: 1 row(s)- 分组后组内求和sum:
select province, sum(1) from address group by province;触发MR , 结果如下:Total MapReduce CPU Time Spent: 1 seconds 870 msecOKguangdong 2jiangshu 1shanxi 2Time taken: 19.524 seconds, Fetched: 3 row(s)- 分组后 , 组内最小值min , 最大值max , 平均值avg:
select province, min(addressid), max(addressid), avg(addressid) from address group by province;触发MR , 结果如下:Total MapReduce CPU Time Spent: 1 seconds 650 msecOKguangdong 1 2 1.5jiangshu 6 6 6.0shanxi 3 4 3.5Time taken: 20.106 seconds, Fetched: 3 row(s)- 至此 , hive常用到内置函数咱们都体验过一遍了 , 希望能给您提供一些参考 , 接下来的文章会体验一个常用工具:Sqoop
- Java系列
- Spring系列
- Docker系列
- kubernetes系列
- 数据库+中间件系列
- DevOps系列
https://github.com/zq2599/blog_demos
- 春季老年人吃什么养肝?土豆、米饭换着吃
- 三八妇女节节日祝福分享 三八妇女节节日语录
- 老人谨慎!选好你的“第三只脚”
- 校方进行了深刻的反思 青岛一大学生坠亡校方整改校规
- 脸皮厚的人长寿!有这特征的老人最长寿
- 长寿秘诀:记住这10大妙招 100%增寿
- 春季老年人心血管病高发 3条保命要诀
- 眼睛花不花要看四十八 老年人怎样延缓老花眼
- 香槟然能防治老年痴呆症? 一天三杯它人到90不痴呆
- 老人手抖的原因 为什么老人手会抖
