TECH SOLUTIONS

Click here to edit subtitle

Forums

Post Reply
Forum Home > Bigdata Learnings( Hadoop, HBase,Hive and other bigdata technologies) > Cassandra: Alternative of IN clause in CQL

Sourav Gulati
Site Owner
Posts: 83

In CQL, IN clause only works if a column is indexed.

For example, here is a  table in cassandra with "rowkey" as primary key


 

 

cqlsh:mykeyspace> select * from hbtable;

 rowkey | cf      | quantifier | value

--------+---------+------------+-------

  row-2 | colfam1 |      quan2 |  val2

  row-8 | colfam1 |      quan4 |  val8

  row-6 | colfam1 |      quan2 |  val6

  row-3 | colfam1 |      quan3 |  val3

  row-7 | colfam1 |      quan3 |  val7

  row-1 | colfam1 |      quan1 |  val1

  row-5 | colfam1 |      quan4 |  val5

  row-4 | colfam1 |      quan1 |  val4

 

 

 

 

Now, if you execute IN clause on "rowkey", it will work


 

 

cqlsh:mykeyspace> select * from hbtable where rowkey in ('row-2','row-3');

 rowkey | cf      | quantifier | value

--------+---------+------------+-------

  row-2 | colfam1 |      quan2 |  val2

  row-3 | colfam1 |      quan3 |  val3

 

 

 

 

However, if you try it on some other column where index is not created it will not work


 

 

 

cqlsh:mykeyspace> select * from hbtable where quantifier in ('quan1');

Bad Request: No indexed columns present in by-columns clause with Equal operator

 

Since you cannot create index on every table, so I have written a Java code that will help to run IN clause on Cassandra table. This code runs in sets of rows, one set contains 50000 so that process does not go out of memory


 

import me.prettyprint.cassandra.serializers.StringSerializer;

import me.prettyprint.hector.api.Cluster;

import me.prettyprint.hector.api.Keyspace;

import me.prettyprint.hector.api.Serializer;

import me.prettyprint.hector.api.beans.OrderedRows;

import me.prettyprint.hector.api.beans.Row;

import me.prettyprint.hector.api.factory.HFactory;

import me.prettyprint.hector.api.query.QueryResult;

import me.prettyprint.hector.api.query.RangeSlicesQuery;

/*

 *

 * arguments list should be cluster_name, keyspace_name, table_name,column_name,value1,value2  and so on

 *

 *

 *

 * */

public class InFunction {

    public static void main(String[] args) {

        if (args.length < 5) {

            System.out.println("usage <args>");

            System.out

                    .println("args list should be cluster_name,keyspace_name,table_name,column_name,value1,value2,...");

            System.exit(1);

        }

        Cluster cluster = HFactory.getOrCreateCluster(args[0],"192.168.213.11:9160,192.168.213.24:9160,");

        Keyspace keyspace = HFactory.createKeyspace(args[1], cluster);

        Serializer<String> se = StringSerializer.get();

        String r1 = "";

        int rowcount = 50000;// default row count

        RangeSlicesQuery<String, String, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(keyspace, se, se, se);

        while (rowcount == 50000) {

            rangeSlicesQuery.setColumnFamily(args[2]);

            rangeSlicesQuery.setKeys(r1, "");

            rangeSlicesQuery.setRange("", "", false, 2000000000); // Maximum

                                                                    // number of

                                                                    // columns

                                                                    // that

                                                                    // Cassandra

                                                                    // permits

                                                                    // in a line

            rangeSlicesQuery.setRowCount(50000);

            QueryResult<OrderedRows<String, String, String>> result = rangeSlicesQuery

                    .execute();

            OrderedRows<String, String, String> orderedRows = result.get();

            for (Row<String, String, String> r : orderedRows) {

                if (r.getColumnSlice().getColumnByName(args[3]) != null) {

                    String value = (r.getColumnSlice().getColumnByName(args[3])

                            .getValue());

                    for (int i = 4; i < args.length; i++) {

                        if (value.compareTo(args[i]) == 0) {

                            System.out.println(" " + r);

                        }

                    }

                }

            }

            rowcount = orderedRows.getCount();

            r1 = orderedRows.peekLast().getKey();

        }

    }

}


 

 

 

 

 

 

Note: you will need to include Cassandra Hector jar for it

 

 


Click here for Other topics of BigData Technologies

 

 

March 13, 2013 at 8:28 AM Flag Quote & Reply

You must login to post.