| Forum Home > Bigdata Learnings( Hadoop, HBase,Hive and other bigdata technologies) > Hive: Accessing Hive Array: Custom UDF | ||
|---|---|---|
|
Site Owner Posts: 83 |
Following is an example Hive Table: hive> select * from test_jsontable; OK id name contact depts sal 1 emp1 9999999999 ["IT","Electronics"] 10000 2 emp2 9999999929 ["IT","Mechanical"] 10000 3 emp4 9999999989 ["IT","Electronics"] 10000 4 emp4 9999999979 ["IT","Electronics"] 10000 5 emp5 9999999969 ["IT","Mechanical"] 10000 6 emp6 9999999949 ["IT","Mechanical","Production"] 10000 Time taken: 0.086 seconds
If you want to get all records where array "depts" contains "Electronics" , it can be done as follows: hive> select * from test_jsontable where depts[0]="Electronics" or depts[1]="Electronics" or depts[2]="Electronics"; However, what if your array contains 100 elements or more , you have to hard code every element of array.So better way to use a custom UDF as follows: package com.impetus.hive; import java.util.ArrayList; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; @Description(name = "arraycontains", value = "arraycontains(ArrayList list,String value) - returns true if list contains value ", extended = "Example:\n" + " > SELECT * FROM src where arraycontains(arrayname,stringvalue;\n") public class ArrayContains extends UDF { public boolean evaluate(ArrayList list, String value) { if (list == null || list.size() < 1) { return false; } if (list.contains(value)) { return true; } else { return false; } } } Create jar file of the project and add it in hive as follows: hive>add jar /pathto/jar; Then you create your custom function as follows: hive>create temporary function arraycontains as 'com.test.hive.ArrayContains';
Now , your UDF is available to use hive> select * from test_jsontable where arraycontains(depts,"Production"); id name contact depts sal 6 emp6 9999999949 ["IT","Mechanical","Production"] 10000 Time taken: 11.807 seconds
| |
--
| ||