Monday, December 7, 2015

How can I use docker to containerize my data analytics app: A general overview

I recently answered this question on Quora about Dockerizing data analytics application, intriguing a thought on starting my series of posts on Docker.

This is my first post on that series. As opposed to traditional ways of a series, I will touch very little of preliminary details on Docker and it's purpose. I will take a straight dip into design and architecture aspects of dockerizing a system.

First thing first
Docker is a very light-weight application engine that deploys VM-like containers that shares system level resources to allow easy deploy and multi-tenancy. But it has its own network and process space, as well as a layered union mount file system. It's all written in Go. The three main components are:
·        docker client,
·        docker daemon or server (REST API)
·        docker containers.

How big is the docker container?
The docker container rootfs (lossely speaking, the operating system FS layer) and tmpfs can be of any size depending on the service you are dockerinzing. A small python app can be under 1MB or an full blown service can range around 16G or whatever it is configured to be.
The docker image can be a few hundred megabytes, if that is what you are asking. Usually it takes fractions of a second to launch a container from an image.
Can I put a data analytics product in it
Like I said, yes, you can. But let's break the problem down here.
·        Services - Docker centers around Service Oriented Architecture, or SOA. How will you you reorganize your application into micro-level, self-sufficient services that can communicate with each other? Let's say you have a web app. You need the web engine server (WARs) to be dockerized and there are plenty of examples on the internet to do this. You sure have a database instance, and that can be in a container. Then say you have a few daemons running for something - you need to make a call on where to put them. In short,  the key design principle is to identify the services to dockerize. Then maybe start with writing your own dockerfile for one component and get the ball rolling from there.
·        Networking - Docker solves the port-conflicts in multi-tenancy by dynamic mapping of ports. Each docker container has configurable and statically mapped ports exposed to the user that maps to physical ports in the system (a process abstracted by docker). Docker containers also have IPs assigned that are not discoverable outside the host. In case of service colocation being absent, you might also need the host IPs or configure the docker containers with unique discoverable IPs.
·        The data - Docker does not work with all the filesystems, so based on how your files and other data is stored, it might become a tall order. But in general, you can expose a volume on the host, or even dockerize a volume, and make it available to dockerized services. So yes, data can be ported too.
·        Handling - You might want a resource manager like YARN to allocate container. Zookeeper or Consul can take care of failover. Consul has built in support for configuration management too.


Monday, April 6, 2015

Unpaired element in array

I started using codility after long, and turned to a very basic problem. It was easy and did not take me more than 10 minutes to solve. I worked out the testcases by hand, and then was browsing a bit on Java basics after I finished the code, then submitted it. Otherwise it was so easy that I doubt the possibility of this occurring on any interview.

Still I felt that it was beneficial to solve it because:
>> It helped me jargon bitwise operators in mind ensuring I wouldn't mistake the syntaxes due to rustiness (we bearly use bitwise in day-to-day job, do we?)
>> It gave me a confidence boost to crack the best solution with perfect testcases / edge cases in one go.

Problem:
A non-empty zero-indexed array A consisting of N integers is given. The array contains an odd number of elements, and each element of the array can be paired with another element that has the same value, except for one element that is left unpaired.
For example, in array A such that:
  A[0] = 9  A[1] = 3  A[2] = 9
  A[3] = 3  A[4] = 9  A[5] = 7
  A[6] = 9
  • the elements at indexes 0 and 2 have value 9,
  • the elements at indexes 1 and 3 have value 3,
  • the elements at indexes 4 and 6 have value 9,
  • the element at index 5 has value 7 and is unpaired.
Write a function:
class Solution { public int solution(int[] A); }
that, given an array A consisting of N integers fulfilling the above conditions, returns the value of the unpaired element.
For example, given array A such that:
  A[0] = 9  A[1] = 3  A[2] = 9
  A[3] = 3  A[4] = 9  A[5] = 7
  A[6] = 9
the function should return 7, as explained in the example above.
Assume that:
  • N is an odd integer within the range [1..1,000,000];
  • each element of array A is an integer within the range [1..1,000,000,000];
  • all but one of the values in A occur an even number of times.
Complexity:
  • expected worst-case time complexity is O(N);
  • expected worst-case space complexity is O(1), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.

My solution that received 100%
class Solution { public int solution(int[] A) { int temp = 0; for(int i = 0; i < A.length; i++) { temp = temp ^ A[i]; } return temp; } }

Saturday, March 1, 2014

Maximum Binary Gap : O(log n) solution

binary gap within a positive integer N is any maximal sequence of consecutive zeros that is surrounded by ones at both ends in the binary representation of N.
For example, number 9 has binary representation 1001 and contains a binary gap of length 2. The number 529 has binary representation1000010001) and contains two binary gaps: one of length 4 and one of length 3. The number 20 has binary representation 10100 and contains one binary gap of length 1. The number 15 has binary representation 1111 and has no binary gaps.
Write a function:
class Solution { public int solution(int N); }
that, given a positive integer N, returns the length of its longest binary gap. The function should return 0 if N doesn't contain a binary gap.
For example, given N = 1041 the function should return 5, because N has binary representation 10000010001 and so its longest binary gap is of length 5.
Assume that:
·    N is an integer within the range [1..2,147,483,647].
Complexity:
expected worst-case time complexity is O(log(N));
expected worst-case space complexity is O(1).

// My perfect score solution
class Solution {
    public int solution(int N) {
        if(N < 1)
            return -1;    
        int res = 0;
        int gapLen = 0;
        boolean binGapStart = false;
        boolean gapZeroes = false;           
        while(N >= 1) {
            if(N%2 == 1) {
                binGapStart = binGapStart && gapZeroes;              
                if(binGapStart) {
                    res = (res > gapLen)? res:gapLen;
                    gapZeroes = false;
                    gapLen = 0;
                }
                else
                    binGapStart = true;
            }
            else{
                gapZeroes = binGapStart;
                if(gapZeroes)
                    gapLen++;
            }           
            N = N/2;
        }       
        return res;       
    }
}


Java Tidbidz 1: Access modifiers, Arrays

  • Access Modifiers - All members of interfaces are implicitly public. It is, in fact, a compile-time error to specify any access specifier for an interface member other than public (although no access specifier at all defaults to public access).
  • Array - Arrays are special objects in java with no "class definition" (no .class file).
  • Array.length [public final int variable], but String.length()

Java tidbidz 2: Abstract class and Interface

  • Interface variables static and final by default [interfaces cannot be instantiated in their own right; the value of the variable must be assigned in a static context in which no instance exists. The final modifier ensures the value assigned to the interface variable is a true constant that cannot be re-assigned by program code.]
  • We can have abstract class without abstract method but not abstract method in non-abstract class, because declaring a class abstract only means that you don't allow it to be instantiated on its own, while an abstract method must be defined by subclasses.
  • You can even have abstract classes with final methods but never final classes with abstract methods.