Extend Guava TreeMultimap to Get Topn elements Sorted by Values

It is a common requirement that we have a Hashmap, we want only return the topn elements sorted on the values, for example, in the following HashMap, the key is username, the value is user's score. We want to return top 2 users with highest scores.
David->85, Jeffery->85, Tom->90, Vivian->95

In previous article, I extend TreeSet to add <key, value> pair as Map.Entry, and sort it by value, and override add method to limit TreeSet size.


It works well, and with good time and space complexity. But one of my colleague argued that: in a map, we should sort by key not by value, so the key should be score, the value should be user name, the data structure should be sorted multi-value map: there can be multiple values for a same key, and it's sorted by key.


Of course this works, but the performance would be a little worse, as for a same key, there can be multiple values, we have to iterate the value list.


Neither am I good at arguing, nor like arguing. However arguing and negotiation are important skills for a software engineer: they are a part of our life, so I should try my best to improve these skills.  

I think my solution is better, but I can't clearly express why at that time.
Less Learned: when compare 2 implementations, we should compare: time/space complexity, implementation complexity.

Anyway, I would like to implement my colleague's idea. Just curious, and want to know how to implement a sorted multi-value TreeMap, and limit its size.


Google Guava provides TreeMultimap: there can be multiple values for a same key, and it's sorted by keys.  


We only need extend put method to limit its size. It's easy, but the bad part is that TreeMultimap's constructor is package level, to extend it, the subclass must be in the same package.


The code is like below: you can review the complete code at Github.

public class TopnTreeMultimap<K extends Comparable, V extends Comparable>
  extends TreeMultimap<K, V> {
 private final boolean DEBUG = true;
 private static final long serialVersionUID = 1L;

 private int maxSize;

 public TopnTreeMultimap(Comparator<? super K> keyComparator,
   Comparator<? super V> valueComparator, int maxSize) {
  super(keyComparator, valueComparator);
  this.maxSize = maxSize;
 }

 public static <K extends Comparable, V extends Comparable> TopnTreeMultimap<K, V> create(
   int maxSize) {
  return new TopnTreeMultimap<K, V>(Ordering.natural(), Ordering.natural(),
    maxSize);
 }

 @SuppressWarnings("unchecked")
 @Override
 public boolean put(K newKey, V newValue) {
  boolean added = false;
  if (this.size() > maxSize)
   throw new RuntimeException("Code error, should never happen.");
  if (this.size() == maxSize) {
   Iterator<K> keyIterator = this.keySet().iterator();
   K currentMinKey = keyIterator.next();
   if (newKey.compareTo(currentMinKey) <= 0) {
    if (DEBUG) {
     System.out.println("Ignore the put element: " + newKey + " : "
       + newValue);
    }
   } else {
    added = super.put(newKey, newValue);

    if (added) {
     // remove the first element - the min
     Iterator entryiIterator = this.entryIterator();
     entryiIterator.next();
     entryiIterator.remove();
    }
   }
  } else {
   added = super.put(newKey, newValue);
  }
  return added;
 }

 public static void main(String[] args) throws Exception {
  TopnTreeMultimap<Integer, String> instance = TopnTreeMultimap.create(2);
  instance.put(85, "David");
  instance.put(85, "Jeffery");
  instance.put(90, "Tom");
  instance.put(95, "Vivian");
  instance.put(95, "Vivian");

  System.err.println(instance);
  // print {90=[Tom], 95=[Vivian]}
 }
}
Post a Comment

Labels

Java (159) Lucene-Solr (110) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (36) Eclipse (34) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Troubleshooting (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Http Client (8) Maven (8) Problem Solving (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) Lesson Learned (5) Programmer Skills (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) System Design (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts