Solr: Use DocTransformer to dynamically Generate groupCount and time value for group doc


Summary
Use DocTransformer to dynamically generate groupCount and time value for group doc(type:1) efficiently: no need ro run Solr query for each group doc(almost).

The User Case
There are two types of docs in Solr: one is child doc including fields: type(value 0), groupId, time and etc. 
another type of doc is group doc: type(value 1), they are actually just some faked docs.

We use join query with includeParent=true and group function: group.main=true&group.sort=map(type,1,1,-1) asc to make sure groups are sorted by time(the max value in the group) and the group doc is always be front of all child docs.

But Solr doesn't return groupCount in flat mode: in grouped mode, Solr can return groupCount in group header, but no such thing in flat mode.
So we have to dynamically generate groupCount and time value for each group(type=1) doc.


Now the last step is to actually generate groupCount and time value dynamically for group doc(type:1).

The Solution
After bump into one group doc, all we need do is to count how many child docs it follows(++lastGroupCount) until we bump into another group doc: 
we update groupCount when iterate last doc in this group, 
we update time field of group doc when iterate the first foc in this group.

If we don't bump into another group doc at the end, we need run query to get the group count as the accumulated lastGroupCount would be incomplete.

To update the time value of group doc is easy: when we hit its first child doc, change its group doc, note the boundary condition: the last group doc have to run query for it.

public class UpdateGroupDocTransfomerFactory extends TransformerFactory {
  public DocTransformer create(String field, SolrParams params,
      SolrQueryRequest req) {
    return new UpdateGroupDocTransfomer(req, params);
  }  
  /**
   * org.apache.solr.search.SolrReturnFields.parseFieldList(String[],
   * SolrQueryRequest) DocTransformers augmenters = new DocTransformers();
   * DocTransformer is thread safe.
   */
  private static class UpdateGroupDocTransfomer extends DocTransformer {
    private SolrQueryRequest req;
    private SolrDocument lastGroupDoc = null;
    private int lastGroupCount = 0;
    private TransformContext transContext;
    
    public void transform(SolrDocument doc, int docid) throws IOException {
      String type = SolrUtil.getFieldValue(doc, "type");
      if ("1".equals(type)) {
        if (lastGroupDoc != null) {
          lastGroupDoc.setField("[groupCount]", lastGroupCount);
        }
        lastGroupDoc = doc;
        lastGroupCount = 0;
        
        if (!transContext.iterator.hasNext()) {
          // this is last doc, run query to get
          runQueryToGetGroupCountAndTimeField(doc);
        }
      } else if (lastGroupDoc != null) {
        if (lastGroupCount == 0) {
          // the first doc in this group
          lastGroupDoc.setField(
              "time",
              DateUtil.getThreadLocalDateFormat()
                  .format(
                      new Date(Long.parseLong(SolrUtil.getFieldValue(doc,
                          "time")))));
        }
        if (!transContext.iterator.hasNext()) {
          // this is last doc, the lastGroupCount would be not correct for
          // lastGroupDoc, run query to get group count.
          runQueryToGetGroupCount(lastGroupDoc);
        } else {
          ++lastGroupCount;
        }
      }
      // else lastGroupDoc==null, and this is normal doc, nothing to do
    }
    
    public UpdateGroupDocTransfomer(SolrQueryRequest req, SolrParams params) {
      this.req = req;
    }
    public void setContext(TransformContext context) {
      this.transContext = context;
    }    
  }
}
Resources
Solr Join: Return Parent and Child Documents
Use Solr map function query(group.sort=map(type,1,1,-1) ) in group flat mode
Solr: Update other Document in DocTransformer by Writing custom SolrWriter

Solr: Update other Document in DocTransformer by Writing custom SolrWriter


Summary
Write our own XMLWriter so we can update other SolrDocument or even delete current document in DocTransformer.

The User Case
There are two types of docs in Solr: one is child doc including fields: type(value 0), groupId, time and etc. 
another type of doc is group doc: type(value 1), they are actually just some faked docs.

We use join query with includeParent=true and make sure groups are sorted by time(the max value in the group) and the group doc is always be front of all child docs.

But Solr doesn't return groupCount in flat mode: in grouped mode, Solr can return groupCount in group header, but no such thing in flat mode.

So we have to dynamically generate groupCount and time value for each group(type=1) doc.

I tried several solutions:
In DocTransformer, when current doc is group doc(type=1), run query to get num of docs in this group.
SolrPluginUtils.numDocs(req.getSearcher(), baseQuery + new TermQuery(new Term("groupId", groupId)), null);

Later I optimized it by pre-compute baseDocSet which matches the q and fq:
DocSet baseDocSet = req.getSearcher().getDocSet(baseQuery);
int groupCount = req.getSearcher().getDocSet(new TermQuery(new Term("groupId", groupId)), baseDocSet).size();

The Solution
But all seems not good to me: as all child(type==0) docs follows each group doc(type=1), there should be no need to run Solr query at all: we can easily calculate the groupCount and its mtm value.

But the problem here is that we can only change current SolrDocument in DocTransformer:
org.apache.solr.response.TextResponseWriter.writeDocuments(String, ResultContext, ReturnFields)
for (int i=0; i<sz; i++) {
 if( transformer != null ) {
  transformer.transform( sdoc, id);
 }
 // SolrWriter writes the doc to output stream
 writeSolrDocument( null, sdoc, returnFields, i );
}
writeEndDocumentList();

One way is to change Solr's code directly to support this:
We can change The code here like below:
cachMode = req.getParams().getBool("cachMode", false);
SolrDocument[] cachedDocs = new SolrDocument[sz];
for (int i = 0; i < sz; i++) {
 SolrDocument sdoc = toSolrDocument(doc);
 if (transformer != null) {
  transformer.transform(sdoc, id);
 }
 if(cachMode)
 {
    cachedDocs[i] = sdoc;
 }
 else{
    writeSolrDocument( null, sdoc, returnFields, i );
 }
 
}
if (transformer != null) {
 transformer.setContext(null);
}
if(cachMode) {
 for (int i = 0; i < sz; i++) {
  writeSolrDocument(null, cachedDocs[i], returnFields, i);
 }
}
writeEndDocumentList();


Or we can write our own Writer, so we don't have to change solr's code.

Custom Solr Writer: CachedXMLWriter
The implementation is simple: we just cache SolrDocument in writeSolrDocument, write them in writeEndDocumentList. 
We can also allow DocTransfromer to delete doc: by add one specifically field "_del_", if this field is set, we will not write this doc into output stream.
public class CachedXMLWriter extends XMLWriter {
  static class SolrDocumentHolder {
    SolrDocument doc;
    String name;
    int idx;
  }
  List<SolrDocumentHolder> holders = new ArrayList<SolrDocumentHolder>();
  public void writeSolrDocument(String name, SolrDocument doc,
      ReturnFields returnFields, int idx) throws IOException {
    Object del = doc.getFieldValue("_del_");
    if (del == null) {
      SolrDocumentHolder holder = new SolrDocumentHolder();
      holder.doc = doc;
      holder.name = name;
      holder.idx = idx;
      holders.add(holder);
    }
  }
  public void writeEndDocumentList() throws IOException {
    for (SolrDocumentHolder holder : holders) {
      super
          .writeSolrDocument(holder.name, holder.doc, returnFields, holder.idx);
    }
    super.writeEndDocumentList();
  }  
}
CachedXMLResponseWriter
Here is the companion class:
public class CachedXMLResponseWriter implements QueryResponseWriter {
  public void write(Writer writer, SolrQueryRequest req, SolrQueryResponse rsp)
      throws IOException {
    CachedXMLWriter w = new CachedXMLWriter(writer, req, rsp);
    try {
      w.writeResponse();
    } finally {
      w.close();
    }
  }
  public String getContentType(SolrQueryRequest request,
      SolrQueryResponse response) {
    return CONTENT_TYPE_XML_UTF8;
  }
}
at last,declare the writer in solrconfig.xml:
<queryResponseWriter name="cachexml" class="solr.CachedXMLResponseWriter" startup="lazy"/>
Now we can use it: wt=cachexml&fl=f1,[groupCount] 

To hide the implementation from client side, we can encapsulate the logic in our request handler: set wt=cachexml if transformer [groupCount] exists.

Miscs:
Transforming Result Documents
[value] - ValueAugmenterFactory
greeting:[value v='hello']
fl=id,my_number:[value v=42 t=int],my_string:[value v=42]
newname:oldname RenameFieldTransformer

[explain] doesn't work with group
if (grouping.mainResult != null) {
ResultContext ctx = new ResultContext();
ctx.query = null; // TODO? add the query?
}
[child] - ChildDocTransformerFactory
[shard] - ShardAugmenterFactory

public abstract class TransformerWithContext extends DocTransformer

Resources
Solr Join: Return Parent and Child Documents
Use Solr map function query(group.sort=map(type,1,1,-1) ) in group flat mode
Solr: Use DocTransformer to dynamically Generate groupCount and time value for group doc
SOLR-7097: Update other Document in DocTransformer

Use Solr map function query(group.sort=map(type,1,1,-1) ) in group flat mode


Summary
How to use Solr map function to make the fake group doc front of all child docs in group flat mode: group.sort=map(type,1,1,-1) asc,time desc
Updated
Actually the solution is much simpler than we thought:

sort=type asc, time desc&&group.sort=type desc, time desc
sort=type asc, time asc&&group.sort=type desc, time asc

But it's still good to know function queries in Solr and how Solr group and function query works.

The User Case
There are two types of docs in Solr: one is child doc including fields: type(value 0), groupId, time and etc. 
another type of doc is group doc: type(value 1), they are actually just some faked docd.

We extend the join query to make Solr return both parent and child docs: check Solr Join: Return Parent and Child Documents about how to implement it.

Then we use Solr group function: group.main=true&group.limit=100 and we want Solr return response like below:
<doc>
  <str name="id">group1</str> <!-- group doc -->
  <int name="type">1</int>
  <str name="subject">subject1</str>
  <!-- this field should be dynamically generated, as the child docs that match q and fq may vary, 
  the value should be same as the first child -->
  <str name="time">2015-01-06T14:45:00.000Z</str> 
  <!-- how many child docs that match q and fq, dynamically generated -->
  <int name="[groupCount]">3</int>
</doc>
<doc>
  <date name="time">2015-01-06T14:45:00Z</date>
  <str name="subject">subject1</str>
  <str name="id">child1</str>
  <int name="type">0</int>
  <str name="groupId">group1</str>
</doc>
<!-- .... other child docs in this group -->
<doc>
  <str name="id">group2</str> <!-- another group -->
  <int name="type">1</int>
  <str name="subject">subject2</str>
  <str name="time">2015-01-05T14:45:00.000Z</str>
  <int name="[groupCount]">7</int>
</doc>
Then we can use start and rows to do pagination.

We will talk about how to dynamically generate groupCount and time value for group doc in later post, this post will focus on this issue:
How we make sure groups are sorted by time(the max value in the group) and the group doc is always be front of all child docs.

The Solution
We tried several solution, but at last we find out the solution is actually quite easy:
As there is no time value in group doc, so it will not take into count when calculate the score for group, Solr will use the max score in child docs.
All we need do is to make group doc be front of all child docs.

We can use Solr map function in group.sort: 
group.sort=map(type,1,1,-1) asc,time desc
if its type is 1(group doc), then map its score to -1, and sort by the score asc, so the group(type=1) doc would be always the first one in the group.

http://localhost:8983/solr/select?defType=edismax&q={!join from=groupId to=id includeParent=true}some query here&group.main=true&group.limit=100&group.sort=map(type,1,1,-1) asc,time desc&sort=time desc,score 

To hide the implementation from client side, or avoid client side change, we can encapsulate the logic in our request handler.

Resources
Solr function Queries
Solr Join: Return Parent and Child Documents
Use Solr map function query(group.sort=map(type,1,1,-1) ) in group flat mode
Solr: Update other Document in DocTransformer by Writing custom SolrWriter
Solr: Use DocTransformer to dynamically Generate groupCount and time value for group doc

Learning from Solr Jira issues - Part 1


One good way to improve our coding/design skills and knowledge is to learn from open source code.
I use Lucene/Solr and extend them with new features, it's my task and duty to learn deeply how they work and how I can write efficient code.  

It's also important to learn how Lucene/Solr evolves, why need these features, how they are implemented, also how some bugs are  introduced, found and fixed.

This post will focus on bugs, minor features in Solr, I will update it once in a while.

Coding
Performance
Add Long/FixedBitSet and replace usage of OpenBitSet
https://issues.apache.org/jira/browse/LUCENE-5440
http://lucene.markmail.org/thread/35gw3amo53dsqsqj
==> when a lot of data and computation, squeeze every uneeded operation
So clearly FBS is faster than OBS (perhaps unless you use fastSet/Get) since it doesn't need to do bounds checking.
Also, FBS lets your grow itself by offering a convenient copy constructor which allows to expand/shrink the set.

SOLR-7050: realtime get should internally load only fields specified in fl [Performance]
https://issues.apache.org/jira/browse/SOLR-7050
== Only load needed fields when call search.doc
StoredDocument luceneDocument = searcher.doc(docid);
changed to:
       StoredDocument luceneDocument = searcher.doc(docid, rsp.getReturnFields().getLuceneFieldNames());

SOLR-6845: Suggester tests start new cores instead of reloading
https://issues.apache.org/jira/browse/SOLR-6845
LOG.info("reload(" + name + ")"); // better logging
add buildOnStartup option ==> don't make not necessary time consuming(build suggestor) blocks starts.
// in solr, we can use lazy up cores, in web application, don't put time consuming operation in listener, make it asynchronous, servelt can return "INIT Not finished“ error if still not finished.
init
else if (getStoreFile().exists()) { //
        if (LOG.isDebugEnabled()) {
          LOG.debug("attempt reload of the stored lookup from file " + getStoreFile());
}

Resource Tracking
SOLR-6950: Ensure TransactionLogs are closed with test ObjectReleaseTracker.
assert ObjectReleaseTracker.track(this);
assert ObjectReleaseTracker.release(this);
// integration test in assert mode
// use ObjectReleaseTracker to make sure resource is closed and released

Http Client
SOLR-6931: We should do a limited retry when using HttpClient.
// always call setUseRetry, whether it is in config or not
 HttpClientUtil.setUseRetry(httpClient, config.getBool(HttpClientUtil.PROP_USE_RETRY, true));

SOLR-6932: All HttpClient ConnectionManagers and SolrJ clients should always be shutdown in tests and regular code.
change HttpClient to CloseableHttpClient
all of these type of things should be made closeable - including SolrJ clients for 5.0 (rather than shutdown).

SOLR-6324: Set finite default timeouts for select and update.
Currently HttpShardHandlerFactory and UpdateShardHandler default to infinite timeouts for socket connection and read. This can lead to undesirable behaviour, for example, if a machine crashes, then searches in progress will wait forever for a result to come back and end up using threads which will only get terminated at shutdown.
clientParams.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, connectionTimeout);
this.defaultClient = HttpClientUtil.createClient(clientParams);
set socketTimeout and connTimeout for shardHandlerFactory in solr.xml

Miscs
SOLR-6909: Extract atomic update handling logic into AtomicUpdateDocumentMerger
Allow pluggable atomic update merging logic
util method: int docid = searcher.getFirstMatch(new Term(idField.getName(), idBytes));

SOLR-6643: Fix error reporting & logging of low level JVM Errors that occur when loading/reloading a SolrCore
Great example about how to reproduce the problem and add test cases.
CoreContainerCoreInitFailuresTest.testJavaLangErrorFromHandlerOnStartup

SOLR-4839: Upgrade to Jetty 9
set persistTempDirectory to true
Jetty 9 has builtin support for disabling protocols (POODLE)
excludeProtocols: SSLv3

https://issues.apache.org/jira/browse/SOLR-7059
Using paramset with multi-valued keys leads to a 500
Not complete change: Actually map in MapSolrParams is changed from Map to Map

https://issues.apache.org/jira/browse/SOLR-7046
NullPointerException when group.function uses query() function
(Map) context = ValueSource.newContext(searcher); 
The variable context is always null because it's scope is local to this function, but it gets passed on to another function later.


API Design:
https://issues.apache.org/jira/browse/SOLR-6954
Considering changing SolrClient#shutdown to SolrClient#close.
SolrClient implements Serializable, Closeable
==> so client can use try-with-resource to avoid resource leak

https://issues.apache.org/jira/browse/SOLR-6449
Add first class support for Real Time Get in Solrj

Bash and Bat
Learn from solr.cmd or solr(.sh)
SOLR-6928: solr.cmd stop works only in english
Consider Different locals when write bash/bat
change
  For /f "tokens=5" %%j in ('netstat -aon ^| find /i "listening" ^| find ":%SOLR_PORT%"') do (
to
  For /f "tokens=5" %%j in ('netstat -aon ^| find "TCP " ^| find ":%SOLR_PORT%"') do (
One related edit is that the find command should look for ":8983 " (with a space after the port number) to avoid matching other ports, e.g. the following stop command would select two lines in netstat output since :1234 will also match :12345
solr start -p 1234
solr start -p 12345
solr stop -p 1234

SOLR-7016: Fix bin\solr.cmd to work in a directory with spaces in the name.
Add "": "%SOLR_TIP%\bin"

SOLR-7013: use unzip if jar is not available (merged from r1653943)
Solr only reuiqres jre nor jdk, and jre doesn't have jar 
Try best to make the app works

Solr 7024: improve java detection and error message
Unclear error message with solr script when lacking jar executable
https://issues.apache.org/jira/browse/SOLR-7013

Eclipse tricks


Javadoc View font size, too small, should be bigger
The default font size for the Javadoc View in Eclipse is 8, too small on a high resolution screen.
To change it, navigate Eclipse to menu: Windows -> Preferences -> General -> Appearance -> Colors and Fonts -> Java -> Javadoc display font, clieck edit. We can change default size to 12, or change font to any font you prefer.


Solr: Using docid within same Seacher to boost performance


We all know that docid in Lucene/Solr is volatile, it may change when we remove some docs and solr merges segments.

For example:
We add 3 docs: doc0, doc1, doc2
http://localhost:12345/solr/update?stream.body=<add><doc><field name="id">doc0</field></doc><doc><field name="id">doc1</field></doc><doc><field name="id">doc1</field></doc></add>&commit=true
Their docid would be like: doc0:0, doc1:1,  doc2:2

Then we delete doc0, and commit it with expungeDeletes=true(meger deletes will also happen when merge segements)
http://localhost:12345/solr/update?stream.body=<delete><query>id:0</query></delete>&commit=true&expungeDeletes=true

Now, their docid would be changed: doc1:0, doc2:1

But in following request handler, whether the docid will be changed during the 2 queries?
public class TestDocIdHandler extends RequestHandlerBase {
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
      throws Exception {
    int docid = getLookupDocId(req.getSearcher(), "doc1");
    // stop here, and delete doc0:
    // http://localhost:12345/solr/update?stream.body=<delete><query>id:doc0</query></delete>&commit=true&expungeDeletes=true
    // check whether docid is changed
    int newdocid = getLookupDocId(req.getSearcher(), "doc12");
    
    System.out.println(docid == newdocid);
  }
  
  private int getLookupDocId(SolrIndexSearcher searcher, String lookup)
      throws IOException {
    TermQuery tq = new TermQuery(new Term("contentid", lookup));
    TopDocs hits = searcher.search(tq, 1);
    ScoreDoc[] docs = hits.scoreDocs;
    if (docs.length == 1) {
      return docs[0].doc;
    }
    return -1; // not found
  }
}

The answer is no:
The docid would be same, because we are querying using same SolrIndexSearcher: SolrIndexSearcher holds the snapshot of the index(data) at some specific time, it will not reflect the change(add,delete,etc) we made until it's reopened.

In the next post, we will demonstrate how we can use this feature in our code.

Practical Example: Use docid to boost performance
The User Case:
Give some query(q,fq, may be join or group group), we want to know the position of one doc given its id.

We can first get the docid of this document, then run the query:
SolrIndexSearcher.search().scoreDocs
then iterate all docid until we find it.
public class GetDocPositionReqHandler extends RequestHandlerBase {
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
      throws Exception {
    SolrParams params = req.getParams();
    String lookup = Preconditions.checkNotNull(params.get("lookup"));
    
    SolrIndexSearcher searcher = req.getSearcher();
    int lookupId = getLookupDocId(searcher, lookup);
    
    if (lookupId != -1) {
      boolean isGroup = params.getBool(GroupParams.GROUP, false);
      if (!isGroup) {
        nonGroupImpl(req, rsp, lookupId);
      } else {
        //
        groupImpl(req, rsp, params, lookupId);
      }
    }
  }
    
  private void nonGroupImpl(SolrQueryRequest req, SolrQueryResponse rsp,
      int lookupId) throws SyntaxError, IOException {
    int lookupPos = -1;
    ScoreDoc[] docs = runReqQuery(req);
    int newPos = 0;
    for (ScoreDoc doc : docs) {
      newPos++;
      if (doc.doc == lookupId) {
        lookupPos = newPos;
        break;
      }
    }
    
    rsp.add("newPos", lookupPos);
  }
  private void groupImpl(SolrQueryRequest req, SolrQueryResponse rsp,
      SolrParams params, int lookupId) throws SyntaxError, IOException {
    ScoreDoc[] docs = runReqQuery(req);
    // split to group
    // in our case, the type of group.field is string, group.sort is type long field
    Map<String,Set<Integer>> groupMap = new LinkedHashMap<String,Set<Integer>>();
    String lookupGroup = null;
    
    String groupField = Objects.requireNonNull(
        params.get(GroupParams.GROUP_FIELD),
        "No group field in the request string.");
    BinaryDocValues groupCache = FieldCache.DEFAULT.getTerms(req.getSearcher()
        .getAtomicReader(), groupField);
    for (ScoreDoc doc : docs) {
      int docid = doc.doc;
      BytesRef result = new BytesRef();
      groupCache.get(docid, result);
      String groupValue = result.utf8ToString();
      Set<Integer> groupItems = groupMap.get(groupValue);
      if (groupItems == null) {
        groupItems = new LinkedHashSet<Integer>();
        groupMap.put(groupValue, groupItems);
      }
      groupItems.add(docid);
      if (doc.doc == lookupId) {
        lookupGroup = groupValue;
      }
    }
    int lookupPos = -1;
    if (lookupGroup != null) {
      // then iterate the map to get the position
      int newPos = 0;
      Iterator<Entry<String,Set<Integer>>> it = groupMap.entrySet().iterator();
      
      outer: while (it.hasNext()) {
        Entry<String,Set<Integer>> entry = it.next();
        String groupName = entry.getKey();
        if (lookupGroup.equals(groupName)) {
          Set<Integer> items = entry.getValue();
          for (Integer item : items) {
            newPos++;
            if (item == lookupId) {
              lookupPos = newPos;
              break outer;
            }
          }
        } else {
          newPos += entry.getValue().size();
        }
      }
    }
    rsp.add("newPos", lookupPos);
  }
  
  private ScoreDoc[] runReqQuery(SolrQueryRequest req) throws SyntaxError,
      IOException {
    SolrParams params = req.getParams();
    SolrIndexSearcher searcher = req.getSearcher();
    String qstr = params.get(CommonParams.Q);
    
    QParser parser = QParser.getParser(qstr, ExtendedDismaxQParserPlugin.NAME,
        req);
    Query newQuery = parser.parse();
    Sort sort = SolrPluginUtils.getSort(req);
    
    String[] fqs = params.getParams(CommonParams.FQ);
    ChainedFilter chainedFilter = null;
    if (fqs != null) {
      Filter[] filters = new Filter[fqs.length];
      int i = 0;
      for (String fq : fqs) {
        filters[i++] = new QueryWrapperFilter(QParser.getParser(fq,
            ExtendedDismaxQParserPlugin.NAME, req).parse());
      }
      chainedFilter = new ChainedFilter(filters);
    }
    TopDocs topDocs;
    if (sort != null) {
      topDocs = searcher.search(newQuery, chainedFilter, searcher.maxDoc(),
          sort);
    } else {
      topDocs = searcher.search(newQuery, chainedFilter, searcher.maxDoc());
    }
    ScoreDoc[] docs = topDocs.scoreDocs;
    return docs;
  }
  
  private int getLookupDocId(SolrIndexSearcher searcher, String lookup)
      throws IOException {
    TermQuery tq = new TermQuery(new Term("id", lookup));
    TopDocs hits = searcher.search(tq, 1);
    ScoreDoc[] docs = hits.scoreDocs;
    if (docs.length == 1) {
      return docs[0].doc;
    }
    return -1; // not found
  }
  
}

Who Reports


// { employee_id, manager_id }
// ( { 2, 1 }, { 3, 1 }, { 4, 2 }, { 5, 7 }, {6, 3 }, { 7, 4 } )

// whoReportsTo( 3 ) --> ( 6 )
// whoReportsTo( 2 ) --> ( 4, 7, 5 )
// { 4, 2 }, { 7, 4 } ==> {7, 4(manger), 2(manger)}

// Graph represented by Adjacency list
public static class Relations {
  // mangerid --> subids
  private Map<Integer, Set<Integer>> subRelations = new HashMap<>();

  public Relations(int[][] relations) {
    for (int i = 0; i < relations.length; i++) {
      manged(relations[i][0], relations[i][1]);
    }
    // if we also want to get the forward relationship, store it as
    // map<int, int>
  }

  public void manged(int empid, int mangerid) {
    Set<Integer> subs = subRelations.get(mangerid);
    if (subs == null) {
      subs = new HashSet<>();
      subRelations.put(mangerid, subs);
    }

    subs.add(empid);
  }

  public Set<Integer> whoReportsToBFS(int mangerid) {
    Set<Integer> mysubs = subRelations.get(mangerid);
    // if (mysubs == null)
    // throw new IllegalArgumentException(mangerid + " doesn't exist");

    // direct subs comes first
    Set<Integer> result = new LinkedHashSet<>();

    if (mysubs == null)
      return result;
    // BFS, but only for the paths that starts with mangerid
    Queue<Integer> queue = new LinkedList<>();
    queue.addAll(mysubs);
    while (!queue.isEmpty()) {
      Integer subid = queue.poll();
      // one empid is only managed by one manager, and there should be
      // no loop
      if (result.contains(subid)) {
        throw new LoopExistException("subid: " + subid);
      }
      result.add(subid);

      if (subRelations.containsKey(subid)) {
        queue.addAll(subRelations.get(subid));
      }
    }
    return result;
  }

  public Set<Integer> whoReportsToDFS(int mangerid) {
    Set<Integer> mysubs = subRelations.get(mangerid);
    // if (mysubs == null)
    // throw new IllegalArgumentException(mangerid + " doesn't exist");

    // direct subs comes first
    Set<Integer> result = new LinkedHashSet<>();

    if (mysubs == null)
      return result;
    // DFS, but only for the paths that starts with mangerid
    LinkedList<Integer> stack = new LinkedList<>();
    stack.addAll(mysubs);
    while (!stack.isEmpty()) {
      Integer subid = stack.pollLast();
      // one empid is only managed by one manager, and there should be
      // no loop
      if (result.contains(subid)) {
        throw new LoopExistException("subid: " + subid);
      }
      result.add(subid);
      if (subRelations.containsKey(subid)) {
        stack.addAll(subRelations.get(subid));
      }
    }
    return result;
  }

  public static void main(String[] args) {
    // { employee_id, manager_id }
    // ( { 2, 1 }, { 3, 1 }, { 4, 2 }, { 5, 7 }, {6, 3 }, { 7, 4 } )
    int[][] arr = new int[][] { { 2, 1 }, { 3, 1 }, { 4, 2 }, { 5, 7 },
        { 6, 3 }, { 7, 4 } };
    Relations relations = new Relations(arr);
    // System.out.println(relations.whoReportsToBFS(2));
    System.out.println(relations.whoReportsToDFS(2));

    // there is loop in the input { 2, 1 }, { 1, 2 }
    arr = new int[][] { { 2, 1 }, { 1, 2 }, { 3, 1 }, { 4, 2 },
        { 5, 7 }, { 6, 3 }, { 7, 4 } };
    relations = new Relations(arr);
    System.out.println(relations.whoReportsToDFS(2));
  }
}

Solr: When its safe to change field type


It's inevitable that at some time we have to field type: for example from tint to tlong or etc. So we need know whether it is safe or not to change the field type.

Case 1: field is stored-only(indexed=false)
Answer: It's safe.
After we change attributes from tinit to tlong: the field type of old data is still int(even after segment merge r optimize).
12345
23456

Case 2: field is also indexed
In most cases it's not.

when we






Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)