Algorithms: Find Valid n-Characters-Word from m Characters Recursively

I was asked to help solve the following word puzzle: find valid n-characters-word from m characters.
In the following example, The answer is combine.
I am not good at this kind of game. But I am a programmer, so I can write a program to do it.
I use open source English dictionary libraries:
WordNet http://wordnet.princeton.edu/wordnet/download/.
JWI: http://projects.csail.mit.edu/jwi/.
JWI is a Java library for interfacing with Wordnet.

First I will load all English words into memory, later, when we call dict.getIndexWord, it will directly search the word HashMap in the memory.
First I will load all English words into memory, later, when we call dict.getIndexWord, it will directly search the word HashMap in the memory.
private void init(String dictFile) throws IOException, InterruptedException {
  dict = new RAMDictionary(new File(dictFile), ILoadPolicy.NO_LOAD);
  dict.open();
  dict.load(true);
 } 
In this article, I solve this problem with recursion.
1. Get all combinations n-characters-word of m characters.
2. Iterative every word in the list returned in step 1, determine whether it is a valid word.

Another similar word game is to find the longest words from m characters.
To solve this, I tried get m-length word, then (m-1) length word, etc. The code is like below:
public List<String> getMaxLenValidWord(String str)
   throws MalformedURLException, IOException {
  List<String> list = new ArrayList<>();
  for (int len = str.length(); len > 0; len--) {
   list = getAllValidWords(str, len);
   if (!list.isEmpty()) {
    return list;
   }
  }
  return list;
 }
This is not very efficient, as it will stores all all combinations n-characters-word of m characters in memory, this is huge. When n and m is large, it will throw OOM error.

In later post , I will explain how to solve this problem iteratively.
Implementation
The code is like below: you can review the complete code at Github.
The output is like below:
getAllValidWords took 5095 mill seconds.
getMaxLenValidWord throws Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

package org.codeexample.algorithms.collected.wordpuzzle;
import edu.mit.jwi.RAMDictionary;
import edu.mit.jwi.data.ILoadPolicy;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.item.POS;
public class WordPuzzleWithRecusion {
 private RAMDictionary dict;
 public WordPuzzleWithRecusion(String dictFile) throws IOException,
   InterruptedException {
  init(dictFile);
 }
 private void init(String dictFile) throws IOException, InterruptedException {
  dict = new RAMDictionary(new File(dictFile), ILoadPolicy.NO_LOAD);
  dict.open();
  dict.load(true);
 }
 public void close() {
  dict.close();
 }
 
 public List<String> getAllValidWords(String str, int length)
   throws MalformedURLException, IOException {
  List<String> lists = new ArrayList<String>();
  long start = new Date().getTime();
  HashSet<String> result = getNCombination(str, length);
  System.out.println("getAllValidWords took "
    + (new Date().getTime() - start) + " mill seconds.");

  for (String word : result) {
   if (isValidWorld(word)) {
    lists.add(word);
   }
  }
  System.out.println("getAllValidWords took "
    + (new Date().getTime() - start) + " mill seconds.");
  return lists;
 }

 public List<String> getMaxLenValidWord(String str)
   throws MalformedURLException, IOException {
  long start = new Date().getTime();
  List<String> list = new ArrayList<>();
  for (int len = str.length(); len > 0; len--) {
   list = getAllValidWords(str, len);
   if (!list.isEmpty()) {
    return list;
   }
  }
  System.out.println("getMaxLenValidWord took "
    + (new Date().getTime() - start) + " mill seconds, found " + list);
  return list;
 }

 /**
  * Solve the problem using recursion, need a lot of memory.
  */
 private HashSet<String> getNCombination(String strs, int len) {
  HashSet<String> result = new HashSet<String>();
  if (strs == null) { // error case
   return result;
  }
  if (len == 0 || len > strs.length())
   return result;

  if (strs.length() == 1 && len == 1) { // base case
   result.add(strs);
   return result;
  }

  HashSet<String> result1 = getNCombination(strs.substring(1), len);

  result.addAll(result1);

  String firstStr = strs.substring(0, 1);
  if (firstStr.length() == len) {
   result.add(firstStr);
  } else {
   HashSet<String> result2 = getNCombination(strs.substring(1), len - 1);
   for (String tmp : result2) {
    for (int i = 0; i < tmp.length(); i++) {
     String tmpResult = tmp.substring(0, i) + firstStr + tmp.substring(i);
     result.add(tmpResult);
    }

    result.add(tmp + firstStr);
   }
  }

  return result;
 }

 public static void main(String[] args) throws Exception {
  String str = "oivmwujbecn";
  int length = 7;

  // getAllValidWords took 5095 mill seconds.
  WordPuzzleWithRecusion instance = new WordPuzzleWithRecusion(
    "C:/Program Files (x86)/WordNet/2.1/dict");
  System.out.println(instance.getAllValidWords(str, length));

  // This will throw OOM
  System.out.println(instance.getMaxLenValidWord(str));
  instance.close();
 }

 private boolean isValidWorld(String word) {
  Boolean isWord = false;
  IIndexWord idxWord = dict.getIndexWord(word, POS.NOUN);
  if (idxWord != null && !idxWord.getWordIDs().isEmpty()) {
   isWord = true;
  }
  return isWord;
 }
}
Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (39) Eclipse (33) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Problem Solving (10) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Life (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) Invest (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts