Just found one issue in DataStax 4.0 today: if we delete the data by ID from Solr, DataStax will remove all fields, but leave its ID field.
For example, we have ID:1 in Solr and Cassandra table, then run reuqest:
http://localhost:8983/solr/ckeyspace.tablename/update?commit=true&stream.body=1
Datastax will delete the data from Solr, but remove all fields from Cassandra(all fields has default value, null or false).
But if we deleteByQuery in Solr:id:1
, Datastax will delete data from Cassandra.For example, we have ID:1 in Solr and Cassandra table, then run reuqest:
http://localhost:8983/solr/ckeyspace.tablename/update?commit=true&stream.body=
Datastax will delete the data from Solr, but remove all fields from Cassandra(all fields has default value, null or false).
But if we deleteByQuery in Solr:
Hacking
Trying to figure out why this happens. so I use Java Decompiler: JD-GUI to decompile DataStax code, create eclipse project from it, then change cassandra script in dse-4.0.5/resources/cassandra/bin:
JVM_OPTS="$JVM_OPTS -agentlib:jdwp=transport=dt_socket,address=7777,server=y,suspend=n"
exec $NUMACTL "$JAVA" $JVM_OPTS $cassandra_parms -cp "$CLASSPATH" $props "$class"
Then restart DataStax: ./dse cassandra -s -f
This will start DataStax in remote debug mode.
Then run Solr deleteById command, add break point at DirectUpdateHandler2 and CassandraDirectUpdateHandler.
Follow the code, I find out that DataStax will call the following delete command in deleteById case: delete field1, fieldn from table where id=1, this will just delete all fields(except id field).
But in case of delete by query, DataStax will call: delete from table where id=1 which will delete the whole data.
The root cause of the issue is at: Cql3CassandraRowWriter.buildDeleteStatement: it does something unnecessary: get all columns from Casssdran and then build the delete field1... fieldn commands.
DataStax fixed this problem in 4.7:
4.0.5 code from com.datastax.bdp.search.solr.Cql3CassandraRowWriter: public void deleteById(SolrQueryRequest request, String key) throws IOException { String cqlDeleteStatement = buildDeleteStatement(key); doDeletes(request, Arrays.asList(new String[] { cqlDeleteStatement })); } private String buildDeleteStatement(String key) throws IOException { CFMetaData cfMetaData = this.columnFamilyStore.metadata; String compositeKeyClause = Cql3Utils.createKeyClauseFromSolrKey(cfMetaData.getKeyValidator(), cfMetaData.getCfDef(), key); ListSo now, we may choose to upgrade to DataStax 4.7, or have to change the code to use deleteByQuery instead of deleteById.columnNameArray = new ArrayList(); for (CFDefinition.Name name : cfMetaData.getCfDef().regularColumns()) { columnNameArray.add("\"" + name.toString() + "\""); } String delete = "DELETE %s FROM \"%s\".\"%s\" WHERE %s"; return String.format(delete, new Object[] { commaJoiner.join(columnNameArray), this.coreInfo.keySpace, this.coreInfo.columnFamily, compositeKeyClause }); } 4.7 code from com.datastax.bdp.search.solr.Cql3CassandraRowWriter: private String Cql3CassandraRowWriter.buildDeleteStatement(String key) throws IOException { CFMetaData cfMetaData = this.columnFamilyStore.metadata; String compositeKeyClause = Cql3Utils.createKeyClauseFromSolrKey(cfMetaData, key); String delete = "DELETE FROM \"%s\".\"%s\" WHERE %s"; return String.format(delete, new Object[] { this.coreInfo.keySpace, this.coreInfo.columnFamily, compositeKeyClause }); }
Happy hacking...