I recently answered this question on Quora about Dockerizing data analytics application,
intriguing a thought on starting my series of posts on Docker.
This is my first post on that series.
As opposed to traditional ways of a series, I will touch very little of
preliminary details on Docker and it's purpose. I will take a straight dip into
design and architecture aspects of dockerizing a system.
First thing first
Docker is a very light-weight application engine that deploys VM-like containers that shares system level resources to allow easy deploy and multi-tenancy. But it has its own network and process space, as well as a layered union mount file system. It's all written in Go. The three main components are:
Docker is a very light-weight application engine that deploys VM-like containers that shares system level resources to allow easy deploy and multi-tenancy. But it has its own network and process space, as well as a layered union mount file system. It's all written in Go. The three main components are:
·
docker
client,
·
docker
daemon or server (REST API)
·
docker
containers.
How big is the docker container?
The docker container rootfs (lossely speaking, the operating system FS layer) and tmpfs can be of any size depending on the service you are dockerinzing. A small python app can be under 1MB or an full blown service can range around 16G or whatever it is configured to be.
The docker image can be a few hundred megabytes, if that is what you are asking. Usually it takes fractions of a second to launch a container from an image.
Can I put a data analytics product in it
Like I said, yes, you can. But let's break the problem down here.
·
Services - Docker centers around
Service Oriented Architecture, or SOA. How will you you reorganize your
application into micro-level, self-sufficient services that can communicate
with each other? Let's say you have a web app. You need the web engine server
(WARs) to be dockerized and there are plenty of examples on the internet to do
this. You sure have a database instance, and that can be in a container. Then
say you have a few daemons running for something - you need to make a call on
where to put them. In short, the key design principle is to identify the
services to dockerize. Then maybe start with writing your own dockerfile for
one component and get the ball rolling from there.
·
Networking - Docker solves the port-conflicts
in multi-tenancy by dynamic mapping of ports. Each docker container has
configurable and statically mapped ports exposed to the user that maps to
physical ports in the system (a process abstracted by docker). Docker
containers also have IPs assigned that are not discoverable outside the host.
In case of service colocation being absent, you might also need the host IPs or
configure the docker containers with unique discoverable IPs.
·
The
data - Docker does not work with all the
filesystems, so based on how your files and other data is stored, it might
become a tall order. But in general, you can expose a volume on the host, or
even dockerize a volume, and make it available to dockerized services. So yes,
data can be ported too.
·
Handling - You might want a resource manager
like YARN to allocate container. Zookeeper or Consul can take care of failover.
Consul has built in support for configuration management too.