Monday, December 7, 2015

How can I use docker to containerize my data analytics app: A general overview

I recently answered this question on Quora about Dockerizing data analytics application, intriguing a thought on starting my series of posts on Docker.

This is my first post on that series. As opposed to traditional ways of a series, I will touch very little of preliminary details on Docker and it's purpose. I will take a straight dip into design and architecture aspects of dockerizing a system.

First thing first
Docker is a very light-weight application engine that deploys VM-like containers that shares system level resources to allow easy deploy and multi-tenancy. But it has its own network and process space, as well as a layered union mount file system. It's all written in Go. The three main components are:
·        docker client,
·        docker daemon or server (REST API)
·        docker containers.

How big is the docker container?
The docker container rootfs (lossely speaking, the operating system FS layer) and tmpfs can be of any size depending on the service you are dockerinzing. A small python app can be under 1MB or an full blown service can range around 16G or whatever it is configured to be.
The docker image can be a few hundred megabytes, if that is what you are asking. Usually it takes fractions of a second to launch a container from an image.
Can I put a data analytics product in it
Like I said, yes, you can. But let's break the problem down here.
·        Services - Docker centers around Service Oriented Architecture, or SOA. How will you you reorganize your application into micro-level, self-sufficient services that can communicate with each other? Let's say you have a web app. You need the web engine server (WARs) to be dockerized and there are plenty of examples on the internet to do this. You sure have a database instance, and that can be in a container. Then say you have a few daemons running for something - you need to make a call on where to put them. In short,  the key design principle is to identify the services to dockerize. Then maybe start with writing your own dockerfile for one component and get the ball rolling from there.
·        Networking - Docker solves the port-conflicts in multi-tenancy by dynamic mapping of ports. Each docker container has configurable and statically mapped ports exposed to the user that maps to physical ports in the system (a process abstracted by docker). Docker containers also have IPs assigned that are not discoverable outside the host. In case of service colocation being absent, you might also need the host IPs or configure the docker containers with unique discoverable IPs.
·        The data - Docker does not work with all the filesystems, so based on how your files and other data is stored, it might become a tall order. But in general, you can expose a volume on the host, or even dockerize a volume, and make it available to dockerized services. So yes, data can be ported too.
·        Handling - You might want a resource manager like YARN to allocate container. Zookeeper or Consul can take care of failover. Consul has built in support for configuration management too.


Monday, April 6, 2015

Unpaired element in array

I started using codility after long, and turned to a very basic problem. It was easy and did not take me more than 10 minutes to solve. I worked out the testcases by hand, and then was browsing a bit on Java basics after I finished the code, then submitted it. Otherwise it was so easy that I doubt the possibility of this occurring on any interview.

Still I felt that it was beneficial to solve it because:
>> It helped me jargon bitwise operators in mind ensuring I wouldn't mistake the syntaxes due to rustiness (we bearly use bitwise in day-to-day job, do we?)
>> It gave me a confidence boost to crack the best solution with perfect testcases / edge cases in one go.

Problem:
A non-empty zero-indexed array A consisting of N integers is given. The array contains an odd number of elements, and each element of the array can be paired with another element that has the same value, except for one element that is left unpaired.
For example, in array A such that:
  A[0] = 9  A[1] = 3  A[2] = 9
  A[3] = 3  A[4] = 9  A[5] = 7
  A[6] = 9
  • the elements at indexes 0 and 2 have value 9,
  • the elements at indexes 1 and 3 have value 3,
  • the elements at indexes 4 and 6 have value 9,
  • the element at index 5 has value 7 and is unpaired.
Write a function:
class Solution { public int solution(int[] A); }
that, given an array A consisting of N integers fulfilling the above conditions, returns the value of the unpaired element.
For example, given array A such that:
  A[0] = 9  A[1] = 3  A[2] = 9
  A[3] = 3  A[4] = 9  A[5] = 7
  A[6] = 9
the function should return 7, as explained in the example above.
Assume that:
  • N is an odd integer within the range [1..1,000,000];
  • each element of array A is an integer within the range [1..1,000,000,000];
  • all but one of the values in A occur an even number of times.
Complexity:
  • expected worst-case time complexity is O(N);
  • expected worst-case space complexity is O(1), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.

My solution that received 100%
class Solution { public int solution(int[] A) { int temp = 0; for(int i = 0; i < A.length; i++) { temp = temp ^ A[i]; } return temp; } }