DataStax 4.0.5 bug: DeleteById from Solr not delete from Cassandra

Just found one issue in DataStax 4.0 today: if we delete the data by ID from Solr, DataStax will remove all fields, but leave its ID field.

For example, we have ID:1 in Solr and Cassandra table, then run reuqest:
http://localhost:8983/solr/ckeyspace.tablename/update?commit=true&stream.body=1

Datastax will delete the data from Solr, but remove all fields from Cassandra(all fields has default value, null or false).

But if we deleteByQuery in Solr:id:1
, Datastax will delete data from Cassandra.
Hacking
Trying to figure out why this happens. so I use Java Decompiler: JD-GUI to decompile DataStax code, create eclipse project from it, then change cassandra script in dse-4.0.5/resources/cassandra/bin:

JVM_OPTS="$JVM_OPTS -agentlib:jdwp=transport=dt_socket,address=7777,server=y,suspend=n"
exec $NUMACTL "$JAVA" $JVM_OPTS $cassandra_parms -cp "$CLASSPATH" $props "$class"

Then restart DataStax: ./dse cassandra -s -f
This will start DataStax in remote debug mode.

Then run Solr deleteById command, add break point at DirectUpdateHandler2 and CassandraDirectUpdateHandler.

Follow the code, I find out that DataStax will call the following delete command in deleteById case: delete field1, fieldn from table where id=1, this will just delete all fields(except id field).

But in case of delete by query, DataStax will call: delete from table where id=1 which will delete the whole data.

The root cause of the issue is at: Cql3CassandraRowWriter.buildDeleteStatement: it does something unnecessary: get all columns from Casssdran and then build the delete field1... fieldn commands.

DataStax fixed this problem in 4.7:


4.0.5 code from com.datastax.bdp.search.solr.Cql3CassandraRowWriter:
  public void deleteById(SolrQueryRequest request, String key)
    throws IOException
  {
    String cqlDeleteStatement = buildDeleteStatement(key);
    doDeletes(request, Arrays.asList(new String[] { cqlDeleteStatement }));
  }

  private String buildDeleteStatement(String key)
    throws IOException
  {
    CFMetaData cfMetaData = this.columnFamilyStore.metadata;
    String compositeKeyClause = Cql3Utils.createKeyClauseFromSolrKey(cfMetaData.getKeyValidator(), cfMetaData.getCfDef(), key);
    
    List columnNameArray = new ArrayList();
    for (CFDefinition.Name name : cfMetaData.getCfDef().regularColumns()) {
      columnNameArray.add("\"" + name.toString() + "\"");
    }
    String delete = "DELETE %s FROM \"%s\".\"%s\" WHERE %s";
    return String.format(delete, new Object[] { commaJoiner.join(columnNameArray), this.coreInfo.keySpace, this.coreInfo.columnFamily, compositeKeyClause });
  }

4.7 code from com.datastax.bdp.search.solr.Cql3CassandraRowWriter:
  private String Cql3CassandraRowWriter.buildDeleteStatement(String key)
    throws IOException
  {
    CFMetaData cfMetaData = this.columnFamilyStore.metadata;
    String compositeKeyClause = Cql3Utils.createKeyClauseFromSolrKey(cfMetaData, key);
    
    String delete = "DELETE FROM \"%s\".\"%s\" WHERE %s";
    return String.format(delete, new Object[] { this.coreInfo.keySpace, this.coreInfo.columnFamily, compositeKeyClause });
  }
So now, we may choose to upgrade to DataStax 4.7, or have to change the code to use deleteByQuery instead of deleteById.

Happy hacking...

Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts