Algorithms: Find Valid n-Characters-Word from m Characters Recursively


I was asked to help solve the following word puzzle: find valid n-characters-word from m characters.
In the following example, The answer is combine.
I am not good at this kind of game. But I am a programmer, so I can write a program to do it.
I use open source English dictionary libraries:
WordNet http://wordnet.princeton.edu/wordnet/download/.
JWI: http://projects.csail.mit.edu/jwi/.
JWI is a Java library for interfacing with Wordnet.

First I will load all English words into memory, later, when we call dict.getIndexWord, it will directly search the word HashMap in the memory.
First I will load all English words into memory, later, when we call dict.getIndexWord, it will directly search the word HashMap in the memory.
private void init(String dictFile) throws IOException, InterruptedException {
  dict = new RAMDictionary(new File(dictFile), ILoadPolicy.NO_LOAD);
  dict.open();
  dict.load(true);
 } 
In this article, I solve this problem with recursion.
1. Get all combinations n-characters-word of m characters.
2. Iterative every word in the list returned in step 1, determine whether it is a valid word.

Another similar word game is to find the longest words from m characters.
To solve this, I tried get m-length word, then (m-1) length word, etc. The code is like below:
public List<String> getMaxLenValidWord(String str)
   throws MalformedURLException, IOException {
  List<String> list = new ArrayList<>();
  for (int len = str.length(); len > 0; len--) {
   list = getAllValidWords(str, len);
   if (!list.isEmpty()) {
    return list;
   }
  }
  return list;
 }
This is not very efficient, as it will stores all all combinations n-characters-word of m characters in memory, this is huge. When n and m is large, it will throw OOM error.

In later post , I will explain how to solve this problem iteratively.
Implementation
The code is like below: you can review the complete code at Github.
The output is like below:
getAllValidWords took 5095 mill seconds.
getMaxLenValidWord throws Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

package org.codeexample.algorithms.collected.wordpuzzle;
import edu.mit.jwi.RAMDictionary;
import edu.mit.jwi.data.ILoadPolicy;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.item.POS;
public class WordPuzzleWithRecusion {
 private RAMDictionary dict;
 public WordPuzzleWithRecusion(String dictFile) throws IOException,
   InterruptedException {
  init(dictFile);
 }
 private void init(String dictFile) throws IOException, InterruptedException {
  dict = new RAMDictionary(new File(dictFile), ILoadPolicy.NO_LOAD);
  dict.open();
  dict.load(true);
 }
 public void close() {
  dict.close();
 }
 
 public List<String> getAllValidWords(String str, int length)
   throws MalformedURLException, IOException {
  List<String> lists = new ArrayList<String>();
  long start = new Date().getTime();
  HashSet<String> result = getNCombination(str, length);
  System.out.println("getAllValidWords took "
    + (new Date().getTime() - start) + " mill seconds.");

  for (String word : result) {
   if (isValidWorld(word)) {
    lists.add(word);
   }
  }
  System.out.println("getAllValidWords took "
    + (new Date().getTime() - start) + " mill seconds.");
  return lists;
 }

 public List<String> getMaxLenValidWord(String str)
   throws MalformedURLException, IOException {
  long start = new Date().getTime();
  List<String> list = new ArrayList<>();
  for (int len = str.length(); len > 0; len--) {
   list = getAllValidWords(str, len);
   if (!list.isEmpty()) {
    return list;
   }
  }
  System.out.println("getMaxLenValidWord took "
    + (new Date().getTime() - start) + " mill seconds, found " + list);
  return list;
 }

 /**
  * Solve the problem using recursion, need a lot of memory.
  */
 private HashSet<String> getNCombination(String strs, int len) {
  HashSet<String> result = new HashSet<String>();
  if (strs == null) { // error case
   return result;
  }
  if (len == 0 || len > strs.length())
   return result;

  if (strs.length() == 1 && len == 1) { // base case
   result.add(strs);
   return result;
  }

  HashSet<String> result1 = getNCombination(strs.substring(1), len);

  result.addAll(result1);

  String firstStr = strs.substring(0, 1);
  if (firstStr.length() == len) {
   result.add(firstStr);
  } else {
   HashSet<String> result2 = getNCombination(strs.substring(1), len - 1);
   for (String tmp : result2) {
    for (int i = 0; i < tmp.length(); i++) {
     String tmpResult = tmp.substring(0, i) + firstStr + tmp.substring(i);
     result.add(tmpResult);
    }

    result.add(tmp + firstStr);
   }
  }

  return result;
 }

 public static void main(String[] args) throws Exception {
  String str = "oivmwujbecn";
  int length = 7;

  // getAllValidWords took 5095 mill seconds.
  WordPuzzleWithRecusion instance = new WordPuzzleWithRecusion(
    "C:/Program Files (x86)/WordNet/2.1/dict");
  System.out.println(instance.getAllValidWords(str, length));

  // This will throw OOM
  System.out.println(instance.getMaxLenValidWord(str));
  instance.close();
 }

 private boolean isValidWorld(String word) {
  Boolean isWord = false;
  IIndexWord idxWord = dict.getIndexWord(word, POS.NOUN);
  if (idxWord != null && !idxWord.getWordIDs().isEmpty()) {
   isWord = true;
  }
  return isWord;
 }
}

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)