Deploying Hadoop and Solr with Docker

Docker is an open platform for building, shipping, running distributed applications. There are a lot of docker containers with different os and bundled with different application such as hadoop, mongoDB.

When we want to learn or give some tools a try, we can just call docker run with the specific image: for example: docker run --name some-mongo -d mongo
This will not mess our host environment, when we are done, we can just call docker kill to kill the running container.

We can also use Docker to create a consistent environment which can be ran on any Docker enabled machine.

In this article, I would like to introduce how to run hadoop and Solr in docker.

Install Hadoop Image and Run it
Search Haddop in Docker registry: https://registry.hub.docker.com, and I chooses the most popular sequenceiq/hadoop-docker
Run the command in my Ubuntu host:
docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

This will download the hadoop-docker image, and start it. After several minutes, it will start the bash of  hadoop-docker container.

Install Solr in Hadoop Container
Run the following commands, it will download latest Solr-4.10.1, and unzip it.
mkdir -p /home/lifelongprogrammer/src/solr; cd /home/lifelongprogrammer/src/solr
curl -O http://mirrors.advancedhosters.com/apache/lucene/solr/4.10.1/solr-4.10.1.tgz
tar -xf solr-4.10.1.tgz
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example

Then run the following command, it will run solr on HDFS with default port 8983.
java -Dsolr.directoryFactory=HdfsDirectoryFactory \
     -Dsolr.lock.type=hdfs \
     -Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
     -Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar

Run Solr in background on Startup
Edit /etc/bootstrap.sh, and add the following commands after HADOOP_PREFIX/sbin/start-yarn.sh  
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example && nohup java -Dsolr.directoryFactory=HdfsDirectoryFactory \
   -Dsolr.lock.type=hdfs \
   -Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
   -Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar &

Commit changes and Create Docker Images
First run docker ps to get the container id:
CONTAINER ID        IMAGE 
2cd8fadba668        93186936bee2

Then let's commit the change and create our own docker images:
docker commit 2cd8fadba668   hadoop_docker_withsolr

Run exit in opened docker bash to logout it. Then run
docker run -d -t -p 8983:8983 hadoop_docker_withsolr /etc/bootstrap.sh -d

The first -d tells docker to tun the image in detached mode, the -p tells docker to publish a container's port to the host
The last -d is parameter of /etc/bootstrap.sh 

After several minutes, we can access http://linuxhostip:8983/solr/#/ to access solr admin page. Now solr is running in the hadoop docker image.

After we are done with our test, we run docker ps to get its container id, then call docker kill $container_id to kill it. 

Persist Modified Image
Now let's save our modified docker image:
docker save hadoop_docker_withsolr  > hadoop_docker_withsolr_save.tar

Now we can copy this tar to another machine, and load it:
docker load < hadoop_docker_withsolr_save.tar


References
Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts