Algorithms: Find Valid n-Characters-Word from m Characters Iteratively

I was asked to help solve the following word puzzle: find valid n-characters-word from m characters.

In the following example, The answer is combine.
I am not good at this kind of game. But I am a programmer, so I can write a program to do it.

I use open source English dictionary libraries:
WordNet http://wordnet.princeton.edu/wordnet/download/.
JWI: http://projects.csail.mit.edu/jwi/.
JWI is a Java library for interfacing with Wordnet.

First I will load all English words into memory, later, when we call dict.getIndexWord, it will directly search the word HashMap in the memory.
private void init(String dictFile) throws IOException, InterruptedException {
  dict = new RAMDictionary(new File(dictFile), ILoadPolicy.NO_LOAD);
  dict.open();
  dict.load(true);
 } 
In previous post, I explained how to solve it with recursion, and the disadvantage of it: requires a lot of memory.

In this article, I solve this problem iteratively.
The main difference is that:
1. Get all combinations of n-characters-word from m characters: order is not important.
2. For every n-characters-word returned in step 1, using counting quickPerm algorithm to iterate each word in its permutations, if it is a valid word, save it to a list.

The code is like below:
public List getAllValidWords(String str, int len) {
  List result = new ArrayList<>();
  List nStrs = getNChars(str, len);

  for (String aStr : nStrs) {
   result.addAll(getValidWords(aStr));
  }

  return result;
 }

Another similar word game is to find the longest words from m characters.
To solve this, I call getAllValidWords(str, n), getAllValidWords(str, n-1) etc in multiple threads.

Implementation
The code is like below: you can review the complete code at Github.
The output is like below:
getAllValidWords took 755 mill seconds, found [combine, newcomb]
getMaxLenValidWord took 17601, len: 8 found [combine, newcomb]

package org.codeexample.algorithms.collected.wordpuzzle;
import edu.mit.jwi.RAMDictionary;
import edu.mit.jwi.data.ILoadPolicy;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.item.POS;

public class WordPuzzleIteratively {
 private RAMDictionary dict;
 public WordPuzzleIteratively(String dictFile) throws IOException,
   InterruptedException {
  init(dictFile);
 }
 private void init(String dictFile) throws IOException, InterruptedException {
  dict = new RAMDictionary(new File(dictFile), ILoadPolicy.NO_LOAD);
  dict.open();
  dict.load(true);
 }
 public void close() {
  dict.close();
 }

 public List<String> getAllValidWords(String str, int len) {
  long start = new Date().getTime();
  List<String> result = new ArrayList<>();
  List<String> nStrs = getNChars(str, len);

  for (String aStr : nStrs) {
   result.addAll(getValidWords(aStr));
  }
  System.out.println("getAllValidWords took "
    + (new Date().getTime() - start) + " mill seconds, found " + result);
  return result;
 }

 public List<String> getMaxLenValidWordThreaded(final String str)
   throws MalformedURLException, IOException, InterruptedException {
  final ArrayList<ArrayList<String>> list = new ArrayList<ArrayList<String>>();
  if (str == null || str.length() == 0)
   return new ArrayList<>();
  long start = new Date().getTime();

  for (int len = str.length(); len > 0; len--) {
   list.add(new ArrayList<String>());
  }
  ExecutorService executor = Executors.newFixedThreadPool(str.length());
  for (int len = str.length(); len > 0; len--) {
   final int length = len;
   Runnable task = new Runnable() {
    @Override
    public void run() {

     long innerStart = new Date().getTime();
     ArrayList<String> result = (ArrayList<String>) getAllValidWords(str,
       length);
     System.out.println("getMaxLenValidWordThreaded len: " + length
       + " took " + (new Date().getTime() - innerStart) + ", found "
       + result.size() + " : " + result);

     list.set(length, result);
    }
   };
   executor.submit(task);
  }

  executor.shutdown();
  boolean terminated = executor.awaitTermination(Long.MAX_VALUE,
    TimeUnit.SECONDS);
  if (!terminated) {
   throw new RuntimeException("Request takes too much time");
  }

  List<String> result = new ArrayList<>();

  int index = str.length() - 1;
  for (; index >= 0; index--) {
   result = list.get(index);
   if (!result.isEmpty())
    break;
  }
  System.out.println("getMaxLenValidWord took "
    + (new Date().getTime() - start) + ", len: " + (index + 1) + " found "
    + result);
  return result;
 }

 public List<String> getMaxLenValidWord(String str)
   throws MalformedURLException, IOException {
  long start = new Date().getTime();
  List<String> list = new ArrayList<>();
  for (int len = str.length(); len > 0; len--) {
   long innerStart = new Date().getTime();
   list = getAllValidWords(str, len);
   System.out.println("getMaxLenValidWord len: " + len + " took "
     + (new Date().getTime() - innerStart));
   if (!list.isEmpty()) {
    break;
   }
  }
  System.out.println("getMaxLenValidWord took "
    + (new Date().getTime() - start));
  return list;
 }

 /**
  * Idea comes from book: Cracking code interview.
  */
 private List<String> getNChars(String strs, int len) {
  // for each char, it can be in the response or not, if
  List<String> nStrs = new ArrayList<String>();
  long max = 1 << strs.length();
  for (int i = 0; i < max; i++) {
   StringBuilder sb = new StringBuilder(len);
   int k = i;
   int index = 0;
   while (k > 0) {
    if ((k & 1) > 0) {
     sb.append(strs.charAt(index));
    }
    k >>= 1;
    index++;
   }

   if (sb.length() == len) {
    nStrs.add(sb.toString());
   }
  }
  return nStrs;
 }

 /**
  * Ideas comes from: The Counting QuickPerm Algorithm:
  * http://www.freewebs.com/permute/quickperm.html<br>
  * http://stackoverflow.com/questions/11915026/permutations-of-a-string-using-
  * iteration
  */
 private List<String> getValidWords(String str) {
  List<String> validWords = new ArrayList<>();
  if (isValidWorld(str)) {
   validWords.add(str);
  }
  char[] a = str.toCharArray();
  int n = a.length;
  int[] p = new int[n]; // Index control array initially all zeros
  int i = 1;
  while (i < n) {
   if (p[i] < i) {
    int j = ((i % 2) == 0) ? 0 : p[i];
    swap(a, i, j);
    String word = toString(a);
    if (isValidWorld(word)) {
     validWords.add(word);
    }
    p[i]++;
    i = 1;
   } else {
    p[i] = 0;
    i++;
   }
  }
  return validWords;
 }

 private static String toString(char[] a) {
  StringBuilder builder = new StringBuilder();
  builder.append(a);
  return builder.toString();
 }

 private static void swap(char[] a, int i, int j) {
  char temp = a[i];
  a[i] = a[j];
  a[j] = temp;
 }

 public static void main(String[] args) throws Exception {
  String str = "oivmwujbecn";
  int len = 7;
  WordPuzzleIteratively instance = new WordPuzzleIteratively(
    "C:/Program Files (x86)/WordNet/2.1/dict");

  List<String> list = null;
  // getAllValidWords took 755 mill seconds, found [combine, newcomb]
  list = instance.getAllValidWords(str, len);
  System.out.println(list.size());
  System.out.println(list);

  // getMaxLenValidWord took 17601, len: 8 found [combine, newcomb]
  list = instance.getMaxLenValidWordThreaded(str);
  System.out.println(list.size());
  System.out.println(list);
  
  instance.close();
 }

 private boolean isValidWorld(String word) {
  Boolean isWord = false;
  IIndexWord idxWord = dict.getIndexWord(word, POS.NOUN);
  if (idxWord != null && !idxWord.getWordIDs().isEmpty()) {
   isWord = true;
  }
  return isWord;
 }
}
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts