Docker is an open platform for building, shipping, running distributed applications. There are a lot of docker containers with different os and bundled with different application such as hadoop, mongoDB.
When we want to learn or give some tools a try, we can just call docker run with the specific image: for example: docker run --name some-mongo -d mongo
This will not mess our host environment, when we are done, we can just call docker kill to kill the running container.
We can also use Docker to create a consistent environment which can be ran on any Docker enabled machine.
In this article, I would like to introduce how to run hadoop and Solr in docker.
Install Hadoop Image and Run it
Search Haddop in Docker registry: https://registry.hub.docker.com, and I chooses the most popular sequenceiq/hadoop-docker
Run the command in my Ubuntu host:
docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
This will download the hadoop-docker image, and start it. After several minutes, it will start the bash of hadoop-docker container.
Install Solr in Hadoop Container
Run the following commands, it will download latest Solr-4.10.1, and unzip it.
mkdir -p /home/lifelongprogrammer/src/solr; cd /home/lifelongprogrammer/src/solr
curl -O http://mirrors.advancedhosters.com/apache/lucene/solr/4.10.1/solr-4.10.1.tgz
tar -xf solr-4.10.1.tgz
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example
Then run the following command, it will run solr on HDFS with default port 8983.
java -Dsolr.directoryFactory=HdfsDirectoryFactory \
-Dsolr.lock.type=hdfs \
-Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
-Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar
Run Solr in background on Startup
Edit /etc/bootstrap.sh, and add the following commands after HADOOP_PREFIX/sbin/start-yarn.sh
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example && nohup java -Dsolr.directoryFactory=HdfsDirectoryFactory \
-Dsolr.lock.type=hdfs \
-Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
-Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar &
Commit changes and Create Docker Images
First run docker ps to get the container id:
CONTAINER ID IMAGE
2cd8fadba668 93186936bee2
Then let's commit the change and create our own docker images:
docker commit 2cd8fadba668 hadoop_docker_withsolr
When we want to learn or give some tools a try, we can just call docker run with the specific image: for example: docker run --name some-mongo -d mongo
This will not mess our host environment, when we are done, we can just call docker kill to kill the running container.
We can also use Docker to create a consistent environment which can be ran on any Docker enabled machine.
In this article, I would like to introduce how to run hadoop and Solr in docker.
Install Hadoop Image and Run it
Search Haddop in Docker registry: https://registry.hub.docker.com, and I chooses the most popular sequenceiq/hadoop-docker
Run the command in my Ubuntu host:
docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
This will download the hadoop-docker image, and start it. After several minutes, it will start the bash of hadoop-docker container.
Install Solr in Hadoop Container
Run the following commands, it will download latest Solr-4.10.1, and unzip it.
mkdir -p /home/lifelongprogrammer/src/solr; cd /home/lifelongprogrammer/src/solr
curl -O http://mirrors.advancedhosters.com/apache/lucene/solr/4.10.1/solr-4.10.1.tgz
tar -xf solr-4.10.1.tgz
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example
Then run the following command, it will run solr on HDFS with default port 8983.
java -Dsolr.directoryFactory=HdfsDirectoryFactory \
-Dsolr.lock.type=hdfs \
-Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
-Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar
Run Solr in background on Startup
Edit /etc/bootstrap.sh, and add the following commands after HADOOP_PREFIX/sbin/start-yarn.sh
cd /home/lifelongprogrammer/src/solr/solr-4.10.1/example && nohup java -Dsolr.directoryFactory=HdfsDirectoryFactory \
-Dsolr.lock.type=hdfs \
-Dsolr.data.dir=hdfs://$(hostname):9000/solr/datadir \
-Dsolr.updatelog=hdfs://$(hostname):9000/solr/updateLog -jar start.jar &
Commit changes and Create Docker Images
First run docker ps to get the container id:
CONTAINER ID IMAGE
2cd8fadba668 93186936bee2
Then let's commit the change and create our own docker images:
docker commit 2cd8fadba668 hadoop_docker_withsolr
Run exit in opened docker bash to logout it. Then run
docker run -d -t -p 8983:8983 hadoop_docker_withsolr /etc/bootstrap.sh -d
The first -d tells docker to tun the image in detached mode, the -p tells docker to publish a container's port to the host
The last -d is parameter of /etc/bootstrap.sh
After several minutes, we can access http://linuxhostip:8983/solr/#/ to access solr admin page. Now solr is running in the hadoop docker image.
After we are done with our test, we run docker ps to get its container id, then call docker kill $container_id to kill it.
Persist Modified Image
Now let's save our modified docker image:
docker save hadoop_docker_withsolr > hadoop_docker_withsolr_save.tar
Now we can copy this tar to another machine, and load it:
docker load < hadoop_docker_withsolr_save.tar
Check this page for Difference between save and export in Docker
References