TECH SOLUTIONS

Click here to edit subtitle

Forums

Post Reply
Forum Home > Bigdata Learnings( Hadoop, HBase,Hive and other bigdata technologies) > Hive: Accessing Hive Array: Custom UDF

Sourav Gulati
Site Owner
Posts: 83

Following is an example Hive Table:


hive> select * from test_jsontable;

OK

id    name    contact    depts    sal

1    emp1    9999999999    ["IT","Electronics"]    10000

2    emp2    9999999929    ["IT","Mechanical"]    10000

3    emp4    9999999989    ["IT","Electronics"]    10000

4    emp4    9999999979    ["IT","Electronics"]    10000

5    emp5    9999999969    ["IT","Mechanical"]    10000

6    emp6    9999999949    ["IT","Mechanical","Production"]    10000

Time taken: 0.086 seconds

 

 


If you want to get all records where array "depts" contains  "Electronics" , it can be done as follows:

hive> select * from test_jsontable where depts[0]="Electronics" or depts[1]="Electronics" or depts[2]="Electronics";


However, what if your array contains 100 elements or more , you have to hard code every element of array.So better way to use a custom UDF as follows:


package com.impetus.hive;

import java.util.ArrayList;

import org.apache.hadoop.hive.ql.exec.Description;

import org.apache.hadoop.hive.ql.exec.UDF;

@Description(name = "arraycontains", value = "arraycontains(ArrayList list,String value) - returns true if list contains value ", extended = "Example:\n"

        + " > SELECT * FROM src where arraycontains(arrayname,stringvalue;\n")

public class ArrayContains extends UDF {

    public boolean evaluate(ArrayList list, String value) {

        if (list == null || list.size() < 1) {

            return false;

        }

        if (list.contains(value)) {

            return true;

        } else {

            return false;

        }

    }

}


Create jar file of the project and add it in hive as follows:

hive>add jar /pathto/jar;

Then you create your custom function as follows:

hive>create temporary function arraycontains as 'com.test.hive.ArrayContains';

 

Now , your UDF is available to use

hive> select * from test_jsontable where arraycontains(depts,"Production");

id    name    contact    depts    sal

6    emp6    9999999949    ["IT","Mechanical","Production"]    10000

Time taken: 11.807 seconds

 

 


Click here for Other topics of BigData Technologies

 

 

--


April 12, 2013 at 7:30 AM Flag Quote & Reply

You must login to post.