Java: Get HashMap's Topn elements Sorted by Values

It is a common requirement that we have a Hashmap, we want to only return the topn elements sorted by the value.
For example, in the following HashMap, the key is username, the value is user's score. We want to return top 2 users with highest scores.
David->85, Jeffery->85, Tom->90, Vivian->95

We may consider using TreeMap, but it is sorted by the key not the value.

In this article, I will construct a new TreeSet subclass, its element is Entry: a key-value pair, we will sort it by value via a custom Comparator:
new Comparator<Map.Entry<K, V>>() {
 public int compare(Map.Entry<K, V> e1, Map.Entry<K, V> e2) {
  int res = e1.getValue().compareTo(e2.getValue());
  if (res == 0) {
   res = e1.getKey().compareTo(e2.getKey());
  }
  return res;
 }
};
We will limit its size to n, by override add method: if its size is less than n, add the entry, if not:
1. If the value of the new entry is less than the minimum-value element(the first element in the TreeSet), then discard the new entry.
2. If the value of the new entry is larger than the first element in the TreeSet, remove the first element, and add the new entry.

The code is like below. You can review the complete source code at Github:

class TreeSetTopnAscendingByValue<K extends Comparable<? super K>, V extends Comparable<? super V>>
 extends TreeSet<Map.Entry<K, V>> {
private static final boolean DEBUG = false;
private static final long serialVersionUID = 1L;

private int maxSize = Integer.MAX_VALUE;
private Comparator<Map.Entry<K, V>> comparator;

public TreeSetTopnAscendingByValue(Comparator<Map.Entry<K, V>> comparator,
  int maxSize) {
 super(comparator);
 this.comparator = comparator;
 this.maxSize = maxSize;
}

public TreeSetTopnAscendingByValue(int maxSize) {
 this(new Comparator<Map.Entry<K, V>>() {
  public int compare(Map.Entry<K, V> e1, Map.Entry<K, V> e2) {
   int res = e1.getValue().compareTo(e2.getValue());
   if (res == 0) {
    res = e1.getKey().compareTo(e2.getKey());
   }
   return res;
  }
 }, maxSize);
}

public boolean add(K newKey, V newValue) {
 return add(new SimpleEntry<K, V>(newKey, newValue));
}

public boolean add(IKeyValue<K, V> obj) {
 return add(new SimpleEntry<K, V>(obj.getKey(), obj.getValue()));
}

@Override
public boolean add(Entry<K, V> newValue) {
 boolean added = false;
 if (this.size() > maxSize)
  throw new RuntimeException("Code error, should never happen.");
 if (this.size() == maxSize) {
  // we can first check whether the map already contains this element, if
  // yes, ignore.
  // or we can first add, then if the size is increased, remove the first
  // element again, maybe the latter is better : we can run test to compare
  // it.
  Iterator<Entry<K, V>> iterator = this.iterator();
  Entry<K, V> currentMin = iterator.next();
  // only remove currentMin if newValue > currentMin. if equals, do nothing
  // for better performance.
  if (comparator.compare(newValue, currentMin) > 0) {
   added = super.add(newValue);
   if (DEBUG) {
    System.out.println("size: " + this.size() + ", add "
      + newValue.toString() + ", remove: " + currentMin);
   }
   // the added element may already exist, if the map doesn't have this
   // element(added is true), remove currentMin
   // or we can us this.size() > maxSize
   if (added) {
    iterator = this.iterator();
    iterator.next();
    iterator.remove();
   }
  } else {
   // don't add newValue;
   if (DEBUG) {
    System.out.println("size: " + this.size() + ", Skip "
      + newValue.toString());
   }
  }
 } else {
  added = super.add(newValue);
 }
 return added;
}


interface IKeyValue<K, V> {
 public K getKey();

 public V getValue();
}
The test code is like below:
public class HashMapSorterByValue {
// http://stackoverflow.com/questions/2864840/treemap-sort-by-value
public static <K extends Comparable<? super K>, V extends Comparable<? super V>> SortedSet<Map.Entry<K, V>> getTopnbyValue(
  Map<K, V> map, int topn) {
 TreeSetTopnAscendingByValue<K, V> sortedEntries = new TreeSetTopnAscendingByValue<K, V>(
   topn);
 sortedEntries.addAll(map.entrySet());
 return sortedEntries;
}

private static void testSortByValues() {
 TreeSetTopnAscendingByValue<String, Integer> sortedEntries = new TreeSetTopnAscendingByValue<String, Integer>(
   3);
 sortedEntries.add("David", 85);
 sortedEntries.add("Jeffery", 85);
 sortedEntries.add("Tom", 90);
 sortedEntries.add("Vivian", 95);
 sortedEntries.add("Jacky", 95);
 sortedEntries.add("Jacky", 95);

 System.out.println(sortedEntries);

 // test iterate
 Iterator<Entry<String, Integer>> iterator = sortedEntries.iterator();
 while (iterator.hasNext()) {
  Entry<String, Integer> entry = iterator.next();
  System.out.println(entry.getKey() + " : " + entry.getValue());
 }
 sortedEntries.remove(new SimpleEntry<String, Integer>("Jacky", 95));
 System.out.println(sortedEntries);
}

private static void testSrotedByPeopleScore() {
 TreeSetTopnAscendingByValue<String, Double> sortedEntries = new TreeSetTopnAscendingByValue<String, Double>(
   2);
 Person p = new Person("David", 85);
 sortedEntries.add(p);
 p = new Person("Jeffery", 85);
 sortedEntries.add(p);
 p = new Person("Tom", 90);
 sortedEntries.add(p);
 p = new Person("Vivian", 95);
 sortedEntries.add(p);

 assertEquals(2, sortedEntries.size());
 Iterator<Entry<String, Double>> iterator = sortedEntries.iterator();
 assertEquals(90.0, iterator.next().getValue());
 assertEquals(95.0, iterator.next().getValue());
 System.out.println(sortedEntries);
}

class Person implements IKeyValue<String, Double> {
 private String name;
 private double score;

 public Person(String name, double score) {
  this.name = name;
  this.score = score;
 }

 public String getName() {
  return name;
 }

 public String getKey() {
  return getName();
 }

 public Double getValue() {
  return score;
 }

 @Override
 public String toString() {
  return name + ":" + score;
 }
}
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (58) Interview (58) J2SE (53) Algorithm (41) Soft Skills (36) Eclipse (34) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Continuous Integration (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts