Solr: form-urlencoded content length exceeds upload limit

The Problem:
Our Solr client application(.Net) received the following exception:
<lst name="error"><str name="msg">application/x-www-form-urlencoded content length (20971671 bytes) exceeds upload limit of 2048 KB</str><int name="code">400</int></lst>

From the error message, seems the exception is thrown from Solr code. Search "exceeds upload limit of" in Solr code, it takes me to org.apache.solr.servlet.SolrRequestParsers.parseFormDataContent.
final long maxLength = ((long) uploadLimitKB) * 1024L;
if (totalLength > maxLength) {
	throw new SolrException(ErrorCode.BAD_REQUEST, "application/x-www-form-urlencoded content length (" +
		totalLength + " bytes) exceeds upload limit of " + uploadLimitKB + " KB");
}
Follow its call hierarchy, it finally takes me to org.apache.solr.servlet.SolrRequestParsers.SolrRequestParsers(Config)
public SolrRequestParsers( Config globalConfig ) {
  final int multipartUploadLimitKB, formUploadLimitKB;
	multipartUploadLimitKB = globalConfig.getInt( 
			"requestDispatcher/requestParsers/@multipartUploadLimitInKB", 2048 );
	
	formUploadLimitKB = globalConfig.getInt( 
			"requestDispatcher/requestParsers/@formdataUploadLimitInKB", 2048 );
  init(multipartUploadLimitKB, formUploadLimitKB);
}

Now it's obvious that Solr read parameter from requestDispatcher/requestParsers/@formdataUploadLimitInKB, if not set, will use use its default value: 2048KB: max size of form post body is 2048*1024 length.
The Solution The fix is to configure the formdataUploadLimitInKB, make it bigger.
<requestParsers enableRemoteStreaming="true" 
			multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048000"  />
Verify the Solution
Now, to prove the change fixed the issue, I need first reproduce the issue without the formdataUploadLimitInKB change.
	public void testSolrJForm() throws IOException {
		CloseableHttpClient httpClient = HttpClientBuilder.create().build();

		HttpPost post = createUrlEncodePost();
		CloseableHttpResponse rsp = httpClient.execute(post);
		try {
			System.out.println(rsp.getStatusLine().getStatusCode());
			String rspStr = EntityUtils.toString(rsp.getEntity());
			System.out.println(rspStr);
		} finally {
			post.releaseConnection();
			rsp.close();
			httpClient.close();
		}
	}

	public HttpPost createUrlEncodePost() throws UnsupportedEncodingException {
		HttpPost post = new HttpPost(
				"http://localhost:8080/solr/update?commit=true");
		post.setHeader("Content-type", "application/x-www-form-urlencoded");
		String str = createBigString();
		StringBuilder xmlSb = new StringBuilder();
		xmlSb.append(
				"<add><doc><field name=\"contentid\">100</field><field name=\"content\">")
				.append(str).append("</field></doc></add>");

		List<NameValuePair> nameValuePairs = Lists.newArrayList();
		nameValuePairs.add(new BasicNameValuePair("stream.body", xmlSb
				.toString()));
		HttpEntity entity = new UrlEncodedFormEntity(nameValuePairs);
		post.setEntity(entity);
		return post;
	}

	public String createBigString() {
		char[] chars = new char[2048 * 1024 * 10];
		Arrays.fill(chars, 'a');
		String str = new String(chars);
		return str;
	}
It receives exception as expected.

Now make formdataUploadLimitInKB bigger in solrconfig.xml, restart solr server, rerun the test. 
Now it successfully upload the big SolrDocuemnt into Solr.

Now the problem solved.

Client application uses form-urlencoded to send solr xml doc, in post body, the key is stream.body, the value is the xml. 

This is kind of weird, we should set Content-type as application/xml and send the XML as http post body like below:
	public HttpPost createApllicationXMLPost()
			throws UnsupportedEncodingException {
		HttpPost post = new HttpPost(
				"http://localhost:8080/solr/update?commit=true");
		post.setHeader("Content-type", "application/xml");
		String str = createBigString();

		StringBuilder xmlSb = new StringBuilder();
		xmlSb.append(
				"<add><doc><field name=\"contentid\">100</field><field name=\"content\">")
				.append(str).append("</field></doc></add>");

		StringEntity entity = new StringEntity(xmlSb.toString());
		entity.setContentEncoding("UTF-8");
		post.setEntity(entity);

		return post;
	}

Java ==: String Intern and Integer Cache

Integer int ==

public void testIntegerCache() {
  Integer i = 123;
  Integer j = 123;
  
  assertTrue(i == j);
  
  Integer heapK = new Integer(123);
  assertFalse(i == heapK);
  // heapK is auto-unboxed to the primitive value 123
  assertTrue(123 == heapK);
  
  // when value is beteween [-128, 127], java returns from its constant cache
  assertTrue(Integer.valueOf("123") == Integer.valueOf("123"));
  
  Integer cacheL = Integer.valueOf("123");
  assertTrue(i == cacheL);
}

Integer and Long cache numbers between -128 and 127.
public static Integer valueOf(int i) {
    assert IntegerCache.high >= 127;
    if (i >= IntegerCache.low && i <= IntegerCache.high)
        return IntegerCache.cache[i + (-IntegerCache.low)];
    return new Integer(i);
}
String ==
public void testStringIntern() {
  String a = "123";
  String b = "123";
  
  // There is only one 123, stored in java string constant pool.
  assertTrue(a == b);
  
  String heapC = new String("123");
  assertFalse(a == heapC);
  
  String internD = new String("123").intern();
  assertTrue(a == internD);
  
  // Special case
  assertTrue(String.valueOf(Integer.MIN_VALUE) == String
      .valueOf(Integer.MIN_VALUE));
  assertFalse(String.valueOf(Integer.MAX_VALUE) == String
      .valueOf(Integer.MAX_VALUE));
}
Boolean boolean ==
public void testBoolean() {
  boolean a = true;
  Boolean b = new Boolean(true);
  // auto-unbox, here Boolean b is auto-unboxed to the primitive boolean true.
  assertTrue(a == b);
  
  Boolean c = Boolean.valueOf("1");
  // different object
  assertFalse(b == c);
  
  // Boolean.valueOf actually returns one of two static final instance: TRUE,
  // FALSE
  Boolean d = Boolean.valueOf("1");
  assertTrue(c == d);
}


Codility Training: Java Solution to TapeEquilibrium


A non-empty zero-indexed array A consisting of N integers is given. Array A represents numbers on a tape.
Any integer P, such that 0 < P < N, splits this tape into two non-empty parts: A[0], A[1], ..., A[P − 1] and A[P], A[P + 1], ..., A[N − 1].
The difference between the two parts is the value of: |(A[0] + A[1] + ... + A[P − 1]) − (A[P] + A[P + 1] + ... + A[N − 1])|
In other words, it is the absolute difference between the sum of the first part and the sum of the second part.
Write a function:
class Solution { public int solution(int[] A); }
that, given a non-empty zero-indexed array A of N integers, returns the minimal difference that can be achieved.
Complexity:
  • expected worst-case time complexity is O(N);
  • expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
The Solution
The idea is simple: first scan the array to get its sum, then scan again to compute the difference between first part and second part.

We can use add accumulatively to get the sum from a[0] to a[i], to get the difference: diff = Math.abs(2 * prefixSum - sum);

/**
 * O(1) space
 */
public int solution(int[] A) {
  if (A == null)
    throw new IllegalArgumentException("Can't be null");
  if (A.length == 0) {
    return 0;
  }
  if (A.length == 1) {
    return Math.abs(A[0]);
  }

  int sum = 0;
  for (int i = 0; i < A.length; i++) {
    sum += A[i];
  }
  int minRst = Integer.MAX_VALUE;

  // ignore last one, as second array can't be empty
  int prefixSum = 0;
  for (int i = 0; i < A.length - 1; i++) {
    prefixSum += A[i];
    int diff = Math.abs(2 * prefixSum - sum);
    if (diff < minRst) {
      minRst = diff;
    }
  }

  return minRst;
}
This is my first solution, it takes O(n) extra space.
/**
 * O(n) space
 */
public int solution1(int[] A) {
  if (A == null)
    throw new IllegalArgumentException("Can't be null");
  if (A.length == 0) {
    return 0;
  }
  if (A.length == 1) {
    return Math.abs(A[0]);
  }

  int[] prefixSum = new int[A.length];

  prefixSum[0] = A[0];
  for (int i = 1; i < A.length; i++) {
    prefixSum[i] += prefixSum[i - 1] + A[i];
  }
  int sum = prefixSum[A.length - 1];
  int minRst = Integer.MAX_VALUE;

  // ignore last one, as second array can't be empty
  for (int i = 0; i < A.length - 1; i++) {
    int diff = Math.abs(2 * prefixSum[i] - sum);
    if (diff < minRst) {
      minRst = diff;
    }
  }

  return minRst;
}

Test Code
To take a real interview test in codility, it's important to add more test cases and run them in our own IDE before submit the solution.

public static void main(String[] args) {
  TapeEquilibrium instance = new TapeEquilibrium();

  Map<int[], Integer> inAns = new HashMap<int[], Integer>();
  // the input and answer from the origin question
  inAns.put(new int[] { 3, 1, 2, 4, 3 }, 1);

  inAns.put(new int[] { 1 }, 1);
  inAns.put(new int[] { 1, 3 }, 2);
  inAns.put(new int[] { 3, 1 }, 2);
  inAns.put(new int[] { -1 }, 1);
  inAns.put(new int[] { 1, -1 }, -1);

  for (Entry<int[], Integer> entry : inAns.entrySet()) {
    assertEquals(instance.solution(entry.getKey()), entry.getValue()
        .intValue());
  }
 }

Lesson Learned
Notice common bug pattern: < or <=, init value is 0 or 1, A.length - 1 or A.length etc.
Handle and test edge cases.
Analyze time and space complexity, and whether we can improve it.

Algorithm: Two Numbers Missing

When I read Harry He's book Coding Interviews: Questions, Analysis & Solutions
Question 39 Two numbers out of n numbers from 1 to n are missing. The remaining n-2 numbers are restored in an array, not in any particular order. Please write a method (or a function) to find the missing two numbers.

He uses another array with size 2n-2, of which the first n-2 numbers are copies of numbers in the original array and others are numbers from 1 to n. All numbers except two numbers, num1 and num2, occur twice in the new array, and num1 and num2 occur only once, and convert the problem to find two numbers occurring once in an array where others occur twice.

This uses O(n) extra space, also the code to "find two numbers occurring once in an array where others occur twice" itself is complex.

The Solution
One classic idea from programming pearls is to use array index to indicate whether element exists or not. Same approach can apply here.

We scan the array two times: 
At the first time, if current element is X, and X<array.length, then negative the item at index X-1. Also remember whether array.length + 1, array.length+2 exist in the array.

Then we scan the array again, if ith value is negative, it means value (i+1) exists, we also restore its original value by negative; if ith is positive, means i+1 is missing. If we array.length + 1 or array.length+2 is not found, add them into the missing items two.

The Code
public static List<Integer> finding2Missing1ToN(int[] array) {
 // use index i to indicate whether (i+1) exists
 // if current element is X, and X<array.length, then negative the iem at
 // index X-1.
 // Also remember whether array.length + 1, array.length+2 exist in the
 // array.
 boolean plus1Exist = false, plus2Exist = false;
 for (int i = 0; i < array.length; i++) {
  int curr = array[i];
  if (curr < 0) {
   curr = -curr; // or ~curr+1
  }
  if ((curr - 1) < array.length) {
   // convert arr[curr-1] to negative value
   array[curr - 1] = -array[curr - 1];
  } else {
   if (curr == (array.length + 1)) {
    plus1Exist = true;
   } else if (curr == (array.length + 2)) {
    plus2Exist = true;
   }
  }
 }

 // scan the array again, if ith value is negative, it means (i+1)
 // exists, if
 // ith is positive, means i+1 is missing. at the same time, we change
 // negative value to absolute value to revert to array's original status.
 List<Integer> missingItems = new ArrayList<>();
 for (int i = 0; i < array.length; i++) {
  int curr = array[i];
  if (curr < 0) {
   array[i] = -array[i];
  } else {
   // i+1 is missing
   missingItems.add(i + 1);
  }
 }

 if (!plus1Exist) {
  missingItems.add(array.length + 1);
 }
 if (!plus2Exist) {
  missingItems.add(array.length + 2);
 }
 return missingItems;
}
Test Code
public void testFinding2Missing1ToN() {
 int[] array = { 3, 5, 4 };
 assertEquals(Lists.newArrayList(1, 2), finding2Missing1ToN(array));

 array = new int[] { 1, 2, 3 };
 assertEquals(finding2Missing1ToN(array), Lists.newArrayList(4, 5));

 array = new int[] { 2, 3, 4 };
 assertEquals(Lists.newArrayList(1, 5), finding2Missing1ToN(array));
}


Eclipse Force Return Feature


Eclipse: Escape Text when Pasting into a String Literal

The Problem
I have to copy some multil-line text and save it into a String variable. 
Unfortunately, Java doesn't support heredoc or Multi-line String literals.

So if to do this manually, I have to add \r\n at the end of each line, and escape special characters such as ", \ etc.

To do this manually would be quite boring and time wasting.

The Solution
Luckily, Eclipse has a nice and less unknown feature which will escape text for us when pasting text into a string literal.

In Eclipse, go to Preferences > Java > Editor > Typing, check the last option: "Escape text when pasting into a string literal", now when we paste text into a string literal, Eclipse will add \r\n, and escape special characters for us. Is it good, right?

After I pasted the text, Eclipse converted and escaped the string  like below:
private static final String FORMAT_NEW_TYPE ="   <typeDescription>\r\n" + 
"    <name>%s</name>\r\n" + 
"    <description />\r\n" + 
"    <supertypeName>uima.tcas.Annotation</supertypeName>\r\n" + 
"    <features>\r\n" + 
"     <featureDescription>\r\n" + 
"      <name>confidence</name>\r\n" + 
"      <description />\r\n" + 
"      <rangeTypeName>uima.cas.Float</rangeTypeName>\r\n" + 
"     </featureDescription>\r\n" + 
"    </features>\r\n" + 
"   </typeDescription>";

Algorithms & Data Structures - Interview Preparation

Books
Elements of Programming Interviews and source codes from github

Cracking the Coding Interview: Gayle Laakmann McDowell
Programming Pearls (2nd Edition): Jon Bentley
The Algorithm Design Manual: Steven S. Skiena
The Art of Computer Programming

How to Think About Algorithms
Data Structures And Algorithms Made Easy In Java 2nd Edition 2nd Edition

LeetCode

http://leetcode.com/
Solutions for all LeetCode Questions in Darren's Blog
http://www.darrensunny.me/

geeksforgeeks
http://www.geeksforgeeks.org/

programcreek
http://www.programcreek.com/category/java-2/algorithms/

http://en.wikipedia.org/wiki/List_of_algorithms

Bit Twiddling Hacks
http://graphics.stanford.edu/~seander/bithacks.html
Hacker's Delight
http://www.hackersdelight.org/

Exercise & Quiz
http://programmingpraxis.com/
http://geeksquiz.com/

Others

http://codercareer.blogspot.com/
http://www.crazyforcode.com/algorithm/
http://tech-queries.blogspot.com/
http://www.algorithmist.com/index.php/Main_Page
https://www.hackerrank.com/categories/algorithms/bit-manipulation

Puzzles, Brainteaser& Logic Thinking
http://www.mytechinterviews.com/
http://puzzlersworld.com/puzzles/interview-questions/
http://www.programmerinterview.com/index.php/puzzles/
http://www.cartalk.com/content/puzzlers
http://www.techinterview.org/

Java Regex Practice: Using Backreferences and Named Group

Recently, I need write one regular expression to validate SSN for our UIMA related project.
I found this article Validating Social Security Numbers through Regular Expressions
^(?!219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$
^(?!219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}$

The regex I need is a little different: the ssn must be separated by dash or one space.

Using Backreferences
The first separator can be dash or space, but the second separator has to be same. This is exactly where we should use backreferences, which repeats the text previous pattern matches.
(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}([- ])(?!00)\\d{2}\\1(?!0{4})(\\d{4})
\\1 here is to repeat the previous text matched bt pattern: ([- ])

Using Named Group
When the regex is complex, it's hard to figure out what \\1 refers to. Also we may change the regex in future, and may point \\1 to a different regex pattern by mistake.

We can use named group feature which is added in Java 7 to improve the regex's readability.
(1) (?X<NAME>) to define a named group NAME"  
(2) \\k<NAME>to backref a named group "NAME" 
(3) <$<NAME>to reference to captured group in matcher's replacement str 
(4) group(String NAME) to return the captured input subsequence by the given "named group"

The final regex is like below:
(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}(?<SEP>[- ])(?!00)\\d{2}\\k<SEP>(?!0{4})(\\d{4})
(?<SEP>[- ]) gives the pattern a name: SEP, \\k<SEP> refers to it.

Java Code
@Test
  public void regexBackReference() {
    String pStr = "(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}([- ])(?!00)\\d{2}\\1(?!0{4})(\\d{4})";
    testSSNRegex(pStr);
  }
  
  @Test
  public void regexBackNamedGroupReference() {
    String pStr = "(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}(?<SEP>[- ])(?!00)\\d{2}\\k<SEP>(?!0{4})(\\d{4})";
    testSSNRegex(pStr);
  }
  
  public void testSSNRegex(String pStr) {
    Matcher m = Pattern.compile(pStr).matcher("123 45 6789");
    if (m.matches()) {
      System.out.println(m.group());
    } else {
      fail("Didn't find match");
    }
    m = Pattern.compile(pStr).matcher("123-45-6789");
    if (m.matches()) {
      System.out.println(m.group());
    } else {
      fail("Didn't find match");
    }
    
    m = Pattern.compile(pStr).matcher("123-456789");
    if (m.matches()) {
      fail("Should not find match");
    }
  }
References
Validating Social Security Numbers through Regular Expressions
Named Capturing Group in JDK7 RegEx

Algorithm Interview: Bit Convertision Between Two Integers

Interview Question:
Write a Java method that will return the number of bits that will need to be changed in order to convert an integer, X, into another integer, Y and vice versa. The method should accept two different integers as input.

Answer:
The key here is to use XOR operation z = x ^ y; to get the bit difference between them, then count the bit set in the result z.

public static int bitConvertBetweenTwoTintegers(int x, int y) {
  // XOR x and y to get the bit difference between x and y
  int z = x ^ y;
  return numberofBits(z);
}

public static int numberofBits(int x) {
  int n = 0;
  while (x != 0) {
    n = n + 1;
    // clear the least significant bit set
    x = x & (x - 1);
  }
  return n;
}

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts