Solr RssResponseWriter by Extending XMLWriter

The problem
Customers want to show search results from Solr in Rss reader, so we need customize Solr Response  Rss format.
There are several ways to do this in Solr:
1. We can use XSLT Response Writer: write a xslt transformer to transform the xml to Rss format. Check XsltResponseWriter:
wt=xslt&tr=example_rss.xsl
We can change the example_rss.xsl or example_atom.xsl Solr provided to match our need.
2. We can write our own Solr ResponseWriter class to write the Rss format response as described in this post.
Solr ResponseWriter
Solr defines several Response Writers, such as XMLResponseWriter, XsltResponseWriter, CSVResponseWriter, etc.
TextResponseWriter is the base class for text-oriented response writers. Solr also allows us to define our own new Response Writers.
The Solution
As the format of Rss is similar as the Solr XML response, we can try to extend XMLResponseWriter and reuse existing code as much as possible.

The difference between Solr XML and Expected RSS format
1. The overall structural difference
In Solr, the format is like: response->result->doc. 
In Rss, the format is like below:
<rss version="2.0">
  <channel>
    <title>title here</title> //channel metadata
    <link>link here</link>    //channel metadata
    <description>description here</description> //channel metadata
    <item>
       <title>item1 title</title> //item metadata
       <link>item1 link</link> //item metadata
       <description>item1 description</description> //item metadata
    </item>
  </channel>
</rss>
For this, we need update writeResponse method to change overall structure.
2. The element structural difference
In Solr, element format is like:
<element-type name="id"> //  like str, int, arr
</element-type>
In Rss, 
<element-name name="id"> //  like title, link, etc
</element-name>
For this, we need udate writeStr/Int/Long implementation.
3. File Name mapping
The field name in Solr may no be expected, for example we may want to map field "url" to "link". We can define a new parameter flmap. We can define the mapping in solrconfig.xml.
<str name="fl">title,url,id,score,physicalpath</str>

<str name="flmap">title,link,,,,physicalpath</str> 
In above example, url will be renamed to link, field id, score would be ignored. title, physicalpath would remain same.
Or we can add fl, flmap as request parameters.

RssResponseWriter Implementation
import com.google.common.base.CharMatcher;
import com.google.common.base.Splitter;
import com.google.common.collect.Lists;

public class RssWriter extends XMLWriter {
  private static final Splitter split = Splitter.on(CharMatcher.anyOf(","))
      .trimResults();
  private static final char[] XML_START1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
      .toCharArray();
  private Map<String,String> oldToNewFLMapping = new HashMap<String,String>();
  private String baseURL;
  
  public RssWriter(Writer writer, SolrQueryRequest req, SolrQueryResponse rsp)
      throws IOException {
    super(writer, req, rsp);
    SolrParams solrParams = req.getParams();
    String fl = solrParams.get("fl");
    
    
    String flmap = solrParams.get("flmap");
    if (fl == null || flmap == null) {
      throw new IOException("do not get fl or flmap parameter");
    }
    
    ArrayList<String> oldFLs = Lists.newArrayList(split.split(fl));
    ArrayList<String> newFLs = Lists.newArrayList(split.split(flmap));
    if (oldFLs.size() != newFLs.size()) {
      throw new IOException("field count different in fl and rnamefl parameter");
    }
    
    Iterator<String> oldIt = oldFLs.iterator(), newIt = newFLs.iterator();
    while (newIt.hasNext()) {
      String oldFl = oldIt.next();
      String newFl = newIt.next();
      if (!StringUtils.isBlank(newFl)) {
        oldToNewFLMapping.put(oldFl, newFl);
      }
    }
    getBaseUrl(req);
    
  }
  @Override
  public void writeResponse() throws IOException {
    writer.write(XML_START1);
    writer.write("<rss version=\"2.0\">");
    writer.write("<channel>");
    String qstr = req.getParams().get(CommonParams.Q);
    writeVal("title", qstr);
    String fullUrl = req.getContext().get("fullUrl").toString();
    writeCdata("link", fullUrl);
    writeVal("copyright", "Copyright ......");
    
    NamedList<?> lst = rsp.getValues();
    Object obj = lst.get("response");
    DocList docList = null;
    if (obj instanceof ResultContext) {
      ResultContext context = (ResultContext) obj;
      docList = context.docs;
    } else if (obj instanceof DocList) {
      docList = (DocList) obj;
    } else {
      throw new RuntimeException("Unkown type: " + obj.getClass());
    }
    writeVal("numFound", Integer.toString(docList.matches()));
    writeVal("start", Integer.toString(docList.offset()));
    writeVal("maxScore", Float.toString(docList.maxScore()));
    
    Set<String> fields = new HashSet<String>(oldToNewFLMapping.keySet());
    SolrIndexSearcher searcher = req.getSearcher();
    DocIterator iterator = docList.iterator();
    int sz = docList.size();
    for (int i = 0; i < sz; i++) {
      int id = iterator.nextDoc();
      Document doc = searcher.doc(id, fields);
      writeVal("item", doc);
    }
    writer.write("\n</channel>");
    writer.write("\n</rss>");
  } 
  @Override
  public void writeSolrDocument(String name, SolrDocument doc,
      ReturnFields returnFields, int idx) throws IOException {
    startTag("item", false);
    incLevel();
    boolean hasLink = false;
    
    Set<String> oldFLs = oldToNewFLMapping.keySet();
    for (String oldFL : returnFields.getLuceneFieldNames()) {
      String newName = oldFL;
      if (oldFLs.contains(oldFL)) {
        newName = oldToNewFLMapping.get(oldFL);
      }
      Object val = doc.getFieldValue(oldFL);
      writeVal(newName, val);
      if ("link".equalsIgnoreCase(newName)) {
        hasLink = true;
      }
    }
    if (!hasLink) {
      String uniqueKey = schema.getUniqueKeyField().getName();
      String uniqueKeyValue = "";
      if (uniqueKey != null) {
        Object obj = doc.getFieldValue(uniqueKey);
        if (obj instanceof Field) {
          Field field = (Field) obj;
          uniqueKeyValue = field.stringValue();
        } else {
          uniqueKeyValue = obj.toString();
        }
      }
      writeCdata("link", baseURL + "viewsourceservlet?docid=" + uniqueKeyValue);
    }
    decLevel();
    if (doIndent) indent();
    writer.write("</item>");
  }
  @Override
  public void writeArray(String name, Iterator iter) throws IOException {
    if (iter.hasNext()) {
      incLevel();
      while (iter.hasNext()) {
        writeVal(name, iter.next());
      }
      decLevel();
    } else {
      startTag(name, true);
    }
  }
  @Override
  public void writeStr(String name, String val, boolean escape)
      throws IOException {
    writePrim(name, val, escape);
  }
  public void writeCdata(String tag, String val) throws IOException {
    writer.write("<" + tag + ">");
    writer.write("<![CDATA[" + val + "]]>");
    writer.write("</" + tag + ">");
  }
  private void writePrim(String name, String val, boolean escape)
      throws IOException {
    int contentLen = val == null ? 0 : val.length();
    
    startTag(name, contentLen == 0);
    if (contentLen == 0) return;
    
    if (escape) {
      XML.escapeCharData(val, writer);
    } else {
      writer.write(val, 0, contentLen);
    }
    writer.write('<');
    writer.write('/');
    writer.write(name);
    writer.write('>');
  }  
  void startTag(String name, boolean closeTag) throws IOException {
    if (doIndent) indent();
    
    writer.write('<');
    writer.write(name);
    if (closeTag) {
      writer.write("/>");
    } else {
      writer.write('>');
    }
  }
  public void getBaseUrl(SolrQueryRequest req) {
    String url = req.getContext().get("url").toString();
    int i = 0;
    int j = 0;
    for (j = 0; j < url.length() && i < 3; ++j) {
      if (url.charAt(j) == '/') {
        ++i;
      }
    }
    baseURL = url.substring(0, j);
  }
  
  @Override
  public void writeNull(String name) throws IOException {
    writePrim(name, "", false);
  }
  
  @Override
  public void writeInt(String name, String val) throws IOException {
    writePrim(name, val, false);
  }
  
  @Override
  public void writeLong(String name, String val) throws IOException {
    writePrim(name, val, false);
  }
  
  @Override
  public void writeBool(String name, String val) throws IOException {
    writePrim(name, val, false);
  }
  
  @Override
  public void writeFloat(String name, String val) throws IOException {
    writePrim(name, val, false);
  }
  
  @Override
  public void writeDouble(String name, String val) throws IOException {
    writePrim(name, val, false);
  }
  
  @Override
  public void writeDate(String name, String val) throws IOException {
    writePrim(name, val, false);
  }
}
RSSResponseWriter
public class RSSResponseWriter implements QueryResponseWriter {
  public void write(Writer writer, SolrQueryRequest req, SolrQueryResponse rsp)
      throws IOException {
    RssWriter rssWriter = new RssWriter(writer, req, rsp);
    try {
      rssWriter.writeResponse();
    } finally {
      rssWriter.close();
    }
  }
  public String getContentType(SolrQueryRequest request,
      SolrQueryResponse response) {
    return CONTENT_TYPE_XML_UTF8;
  }
  public void init(NamedList args) {}
}
Configuration
<requestHandler name="/rss" class="solr.SearchHandler">
 <lst name="defaults">
  <str name="rows">10</str>
  <str name="wt">rss</str>
  <!--default mapping-->
  <str name="fl">title,url,id,score,physicalpath</str>
  <str name="flmap">title,link,,,,physicalpath</str> 
 </lst>
</requestHandler>
<queryResponseWriter name="rss" class="org.apache.solr.response.RSSResponseWriter"/>
Resources
QueryResponseWriter
Solr Search Result (attribute-to-tag) customization using XsltResponseWriter
RSS 2.0 Specification
RSS Tutorial
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts