| Forum Home > Bigdata Learnings( Hadoop, HBase,Hive and other bigdata technologies) > Cassandra: Alternative of IN clause in CQL | ||
|---|---|---|
|
Site Owner Posts: 83 |
In CQL, IN clause only works if a column is indexed. For example, here is a table in cassandra with "rowkey" as primary key
cqlsh:mykeyspace> select * from hbtable; rowkey | cf | quantifier | value --------+---------+------------+------- row-2 | colfam1 | quan2 | val2 row-8 | colfam1 | quan4 | val8 row-6 | colfam1 | quan2 | val6 row-3 | colfam1 | quan3 | val3 row-7 | colfam1 | quan3 | val7 row-1 | colfam1 | quan1 | val1 row-5 | colfam1 | quan4 | val5 row-4 | colfam1 | quan1 | val4
Now, if you execute IN clause on "rowkey", it will work
cqlsh:mykeyspace> select * from hbtable where rowkey in ('row-2','row-3'); rowkey | cf | quantifier | value --------+---------+------------+------- row-2 | colfam1 | quan2 | val2 row-3 | colfam1 | quan3 | val3
However, if you try it on some other column where index is not created it will not work
cqlsh:mykeyspace> select * from hbtable where quantifier in ('quan1'); Bad Request: No indexed columns present in by-columns clause with Equal operator
Since you cannot create index on every table, so I have written a Java code that will help to run IN clause on Cassandra table. This code runs in sets of rows, one set contains 50000 so that process does not go out of memory
import me.prettyprint.cassandra.serializers.StringSerializer; import me.prettyprint.hector.api.Cluster; import me.prettyprint.hector.api.Keyspace; import me.prettyprint.hector.api.Serializer; import me.prettyprint.hector.api.beans.OrderedRows; import me.prettyprint.hector.api.beans.Row; import me.prettyprint.hector.api.factory.HFactory; import me.prettyprint.hector.api.query.QueryResult; import me.prettyprint.hector.api.query.RangeSlicesQuery; /* * * arguments list should be cluster_name, keyspace_name, table_name,column_name,value1,value2 and so on * * * * */ public class InFunction { public static void main(String[] args) { if (args.length < 5) { System.out.println("usage <args>"); System.out .println("args list should be cluster_name,keyspace_name,table_name,column_name,value1,value2,..."); System.exit(1); } Cluster cluster = HFactory.getOrCreateCluster(args[0],"192.168.213.11:9160,192.168.213.24:9160,"); Keyspace keyspace = HFactory.createKeyspace(args[1], cluster); Serializer<String> se = StringSerializer.get(); String r1 = ""; int rowcount = 50000;// default row count RangeSlicesQuery<String, String, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(keyspace, se, se, se); while (rowcount == 50000) { rangeSlicesQuery.setColumnFamily(args[2]); rangeSlicesQuery.setKeys(r1, ""); rangeSlicesQuery.setRange("", "", false, 2000000000); // Maximum // number of // columns // that // Cassandra // permits // in a line rangeSlicesQuery.setRowCount(50000); QueryResult<OrderedRows<String, String, String>> result = rangeSlicesQuery .execute(); OrderedRows<String, String, String> orderedRows = result.get(); for (Row<String, String, String> r : orderedRows) { if (r.getColumnSlice().getColumnByName(args[3]) != null) { String value = (r.getColumnSlice().getColumnByName(args[3]) .getValue()); for (int i = 4; i < args.length; i++) { if (value.compareTo(args[i]) == 0) { System.out.println(" " + r); } } } } rowcount = orderedRows.getCount(); r1 = orderedRows.peekLast().getKey(); } } }
Note: you will need to include Cassandra Hector jar for it
| |
| ||