Solr: form-urlencoded content length exceeds upload limit

The Problem:
Our Solr client application(.Net) received the following exception:
<lst name="error"><str name="msg">application/x-www-form-urlencoded content length (20971671 bytes) exceeds upload limit of 2048 KB</str><int name="code">400</int></lst>

From the error message, seems the exception is thrown from Solr code. Search "exceeds upload limit of" in Solr code, it takes me to org.apache.solr.servlet.SolrRequestParsers.parseFormDataContent.
final long maxLength = ((long) uploadLimitKB) * 1024L;
if (totalLength > maxLength) {
	throw new SolrException(ErrorCode.BAD_REQUEST, "application/x-www-form-urlencoded content length (" +
		totalLength + " bytes) exceeds upload limit of " + uploadLimitKB + " KB");
}
Follow its call hierarchy, it finally takes me to org.apache.solr.servlet.SolrRequestParsers.SolrRequestParsers(Config)
public SolrRequestParsers( Config globalConfig ) {
  final int multipartUploadLimitKB, formUploadLimitKB;
	multipartUploadLimitKB = globalConfig.getInt( 
			"requestDispatcher/requestParsers/@multipartUploadLimitInKB", 2048 );
	
	formUploadLimitKB = globalConfig.getInt( 
			"requestDispatcher/requestParsers/@formdataUploadLimitInKB", 2048 );
  init(multipartUploadLimitKB, formUploadLimitKB);
}

Now it's obvious that Solr read parameter from requestDispatcher/requestParsers/@formdataUploadLimitInKB, if not set, will use use its default value: 2048KB: max size of form post body is 2048*1024 length.
The Solution The fix is to configure the formdataUploadLimitInKB, make it bigger.
<requestParsers enableRemoteStreaming="true" 
			multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048000"  />
Verify the Solution
Now, to prove the change fixed the issue, I need first reproduce the issue without the formdataUploadLimitInKB change.
	public void testSolrJForm() throws IOException {
		CloseableHttpClient httpClient = HttpClientBuilder.create().build();

		HttpPost post = createUrlEncodePost();
		CloseableHttpResponse rsp = httpClient.execute(post);
		try {
			System.out.println(rsp.getStatusLine().getStatusCode());
			String rspStr = EntityUtils.toString(rsp.getEntity());
			System.out.println(rspStr);
		} finally {
			post.releaseConnection();
			rsp.close();
			httpClient.close();
		}
	}

	public HttpPost createUrlEncodePost() throws UnsupportedEncodingException {
		HttpPost post = new HttpPost(
				"http://localhost:8080/solr/update?commit=true");
		post.setHeader("Content-type", "application/x-www-form-urlencoded");
		String str = createBigString();
		StringBuilder xmlSb = new StringBuilder();
		xmlSb.append(
				"<add><doc><field name=\"contentid\">100</field><field name=\"content\">")
				.append(str).append("</field></doc></add>");

		List<NameValuePair> nameValuePairs = Lists.newArrayList();
		nameValuePairs.add(new BasicNameValuePair("stream.body", xmlSb
				.toString()));
		HttpEntity entity = new UrlEncodedFormEntity(nameValuePairs);
		post.setEntity(entity);
		return post;
	}

	public String createBigString() {
		char[] chars = new char[2048 * 1024 * 10];
		Arrays.fill(chars, 'a');
		String str = new String(chars);
		return str;
	}
It receives exception as expected.

Now make formdataUploadLimitInKB bigger in solrconfig.xml, restart solr server, rerun the test. 
Now it successfully upload the big SolrDocuemnt into Solr.

Now the problem solved.

Client application uses form-urlencoded to send solr xml doc, in post body, the key is stream.body, the value is the xml. 

This is kind of weird, we should set Content-type as application/xml and send the XML as http post body like below:
	public HttpPost createApllicationXMLPost()
			throws UnsupportedEncodingException {
		HttpPost post = new HttpPost(
				"http://localhost:8080/solr/update?commit=true");
		post.setHeader("Content-type", "application/xml");
		String str = createBigString();

		StringBuilder xmlSb = new StringBuilder();
		xmlSb.append(
				"<add><doc><field name=\"contentid\">100</field><field name=\"content\">")
				.append(str).append("</field></doc></add>");

		StringEntity entity = new StringEntity(xmlSb.toString());
		entity.setContentEncoding("UTF-8");
		post.setEntity(entity);

		return post;
	}
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (58) Interview (58) J2SE (53) Algorithm (43) Soft Skills (36) Eclipse (34) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts