Java Tech Discussion: August 2014

Wednesday 20 August 2014

Why wait (), notify () and notifyAll () must be called from synchronized block or method in Java ?

Most of Java developer knows that wait() ,notify() and notifyAll() method of object class must have to be called inside synchronized method or synchronized block in Java but how many times we thought why ?

We use wait () and notify () or notifyAll () method mostly for inter-thread communication. One thread is waiting after checking a condition e.g. In Producer Consumer example Producer Thread is waiting if buffer is full and Consumer thread notify Producer thread after he creates a space in buffer by consuming an element. calling notify() or notifyAll() issues a notification to a single or multiple thread that a condition has changed and once notification thread leaves synchronized block , all the threads which are waiting fight for object lock on which they are waiting and lucky thread returns from wait() method after reacquiring the lock and proceed further. Let’s divide this whole operation in steps to see a possibility of race condition between wait () and notify () method in Java, we will useProduce Consumer thread example to understand the scenario better:

1. The Producer thread tests the condition (buffer is full or not) and confirms that it must wait (after finding buffer is full).

2. The Consumer thread sets the condition after consuming an element from buffer.

3. The Consumer thread calls the notify () method; this goes unheard since the Producer thread is not yet waiting.

4. The Producer thread calls the wait () method and goes into waiting state.

So due to race condition here we potential lost a notification and if we use buffer or just one element Produce thread will be waiting forever and your program will hang.

Now let's think how does this potential race condition get resolved? This race condition is resolved by using synchronized keyword and locking provided by java. In order to call the wait (), notify () or notifyAll () methods in Java, we must have obtained the lock for the object on which we're calling the method. Since the wait () method in Java also releases the lock prior to waiting and reacquires the lock prior to returning from the wait () method, we must use this lock to ensure that checking the condition (buffer is full or not) and setting the condition (taking element from buffer) is atomic which can be achieved by using synchronized method or block in Java.

Just to summarize we call wait (), notify () or notifyAll method in Java from synchronized method or synchronized block in Java to avoid:

1) IllegalMonitorStateException in Java which will occur if we don't call wait (), notify () or notifyAll () method from synchronized context.

2) Any potential race condition between wait and notify method in Java.

You have thread T1, T2 and T3, how will you ensure that thread T2 run after T1 and thread T3 run after T2?

This thread interview questions is mostly asked in first round or phone screening round of interview and purpose of this multi-threading question is to check whether candidate is familiar with concept of "join" method or not. Answer of this multi-threading questions is simple it can be achieved by using join method of Thread class.

main(){

t1.start();

t1.join();

t2.start();

t2.join();

t3.start();

}

What is difference between Executor.submit() and Executer.execute() method ?

The answer is that former returns an object of Future which can be used to find result from worker thread)

There is a difference when looking at exception handling. If your tasks throws an exception and if it was submitted with execute this exception will go to the uncaught exception handler (when you don't have provided one explicitly, the default one will just print the stack trace to System.err). If you submitted the task with submit any thrown exception, checked exception or not, is then part of the task's return status. For a task that was submitted with submit and that terminates with an exception, the Future.get will re-throw this exception, wrapped in an ExecutionException.

How to create immutable class in Java and what are its pros and cons?

public final class Contacts {

private final String name;

private final String mobile;

public Contacts(String name, String mobile) {

this.name = name;

this.mobile = mobile;

}

public String getName(){

return name;

}

public String getMobile(){

return mobile;

}

This Java class is immutable, because its state can not be changed once created. You can see that all of it’s fields are final. This is one of the most simple way of creating immutable class in Java, where all fields of class also remains immutable like String in above case. Some time you may need to write immutable class which includes mutable classes like java.util.Date, despite storing Date into final field it can be modifiedinternally, if internal date is returned to the client. In order to preserve immutability in such cases, its advised to return copy of original object, which is also one of the Java best practice. here is another example of making a class immutable in Java, which includes mutable member variable.

public final class ImmutableReminder{

private final Date remindingDate;

public ImmutableReminder (Date remindingDate) {

if(remindingDate.getTime() < System.currentTimeMillis()){

throw new IllegalArgumentException("Can not set reminder” +

“ for past time: " + remindingDate);

}

this.remindingDate = new Date(remindingDate.getTime());

}

public Date getRemindingDate() {

return (Date) remindingDate.clone();

}

In above example of creating immutable class, Date is a mutable object. If getRemindingDate() returns actual Date object than despite remindingDate being final variable, internals of Date can be modified by client code. By returning clone() or copy of remindingDate, we avoid that danger and preserves immutability of class.

Pros of Immutable Classes in Java

As I said earlier Immutable classes offers several benefits, here are few to mention:

1) Immutable objects are by default thread safe, can be shared without synchronization in concurrent environment.
2) Immutable object simplifies development, because its easier to share between multiple threads without external synchronization.

3) Immutable object boost performance of Java application by reducing synchronization in code.

4) Another important benefit of Immutable objects is reusability, you can cache Immutable object and reuse them, much like String literals and Integers. You can use static factory methods to provide methods like valueOf(), which can return an existing Immutable object from cache, instead of creating a new one.

Cons of Immutable Classes in Java

Immutable object has disadvantage of creating garbage as well. Since immutable object can not be reused and they are just a use and throw. String being a prime example, which can create lot of garbage and can potentially slow down application due to heavy garbage collection, but again that's extreme case and if used properly Immutable object adds lot of value.

First Normal Form, Second Normal Form, Third Normal Form in Database

To begin: First, memorize the 3 normal forms so that you can recite them in your sleep. The meaning will become clear as we go. Just memorize them for now:

No repeating elements or groups of elements
No partial dependencies on a concatenated key
No dependencies on non-key attributes

Database Normalisation is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.

Normalization is used for mainly two purpose,

Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.

Problem Without Normalization

Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To understand these anomalies let us take an example of Student table.

S_id	S_Name	S_Address	Subject_opted
401	Adam	Noida	Bio
402	Alex	Panipat	Maths
403	Stuart	Jammu	Maths
404	Adam	Noida	Physics

Updation Anamoly : To update address of a student who occurs twice or more than twice in a table, we will have to update S_Address column in all the rows, else data will become inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly.
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.

Normalization Rule

Normalization rule are divided into following normal form.

First Normal Form
Second Normal Form
Third Normal Form
BCNF

First Normal Form (1NF)

As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of column must have a unique value, such that multiple columns cannot be used to fetch the same row. Each table should be organized into rows, and each row should have a primary key that distinguishes it as unique.

The Primary key is usually a single column, but sometimes more than one column can be combined to create a single primary key. For example consider a table which is not in First normal form

Student Table :

Student	Age	Subject
Adam	15	Biology, Maths
Alex	14	Maths
Stuart	17	Maths

In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows.

Student Table following 1NF will be :

Student	Age	Subject
Adam	15	Biology
Adam	15	Maths
Alex	14	Maths
Stuart	17	Maths

Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.

Second Normal Form (2NF)

As per the Second Normal Form there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form.

In example of First Normal Form there are two rows for Adam, to include multiple subjects that he has opted for. While this is searchable, and follows First normal form, it is an inefficient use of space. Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age of Student only depends on Student column, which is incorrect as per Second Normal Form. To achieve second normal form, it would be helpful to split out the subjects into an independent table, and match them up using the student names as foreign keys.

New Student Table following 2NF will be :

Student	Age
Adam	15
Alex	14
Stuart	17

In Student Table the candidate key will be Student column, because all other column i.e Age is dependent on it.

New Subject Table introduced for 2NF will be :

Student	Subject
Adam	Biology
Adam	Maths
Alex	Maths
Stuart	Maths

In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables qualifies for Second Normal Form and will never suffer from Update Anomalies. Although there are a few complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle those scenarios Third Normal Form is there.

Third Normal Form (3NF)

Third Normal form applies that every non-prime attribute of table must be dependent on primary key. Thetransitive functional dependency should be removed from the table. The table must be in Second Normal form. For example, consider a table with following fields.

Student_Detail Table :

Student_id	Student_name	DOB	Street	city	State	Zip

In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.

New Student_Detail Table :

Student_id	Student_name	DOB	Zip

Address Table :

Zip	Street	city	state

The advantage of removing transtive dependency is,

Amount of data duplication is reduced.
Data integrity achieved.

How to work with Fork/Join? Discuss with Example

Fork / Join as the name suggests is designed for work that can be broken into smaller pieces recursively. It is a new addition to the JDK 1.7 to support parallelism. It is an implementation of the ExecutorService interface that helps you take advantage of multiple processors.The goal is to use all the available processing power to enhance the performance of your application.

The Fork/Join framework is designed to make divide-and-conquer algorithms easy to parallelize. That type of algorithms is perfect for problems that can be divided into two or more sub-problems of the same type. They use recursion to break down the problem to simple tasks until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

The center of the fork/join framework is the ForkJoinPool class, an extension of the AbstractExecutorService class. ForkJoinPool implements the core work-stealing algorithm and can execute ForkJoinTask processes.It is similar to the MapReduce approach used to paralyze tasks. Difference is that Fork/Join tasks will subdivide themselves into smaller tasks only if necessary (if too large), whereas MapReduce algorithms divide up all the work into portions as the first step of their execution.

Basic Algorithm:

if(the job is small enough)

{

   compute directly

}

else

{

   split the work in two pieces (fork)

   invoke the pieces and join the results (join)

}

A ForkJoinTask is an abstract base class for tasks that run within a ForkJoinPool. A ForkJoinTask is a thread-like entity that is much lighter weight than a normal thread. Huge numbers of tasks and subtasks may be hosted by a small number of actual threads in a ForkJoinPool, at the price of some usage limitations.

There are two specialized subclasses of the ForkJoinTask :

1. RecursiveAction : It is to be used when you don’t need the task to return a result, for example, when the task works on positions of an array, it doesn’t return anything because it worked on the array. The method you should implement in order to do the job iscompute():void, notice the void return.

2. RecursiveTask : It is to be used when your tasks return a result. For example, when computing addition of elements in an array, each task must return the number it computed in order to join them and obtain the general solution. The method you should implement in order to do the job is compute():V, where V is the type of return; for example in calculating the sum of integer elements in an array, V may be java.lang.Integer.

A Useful Example

The key for non-dumb examples, which is hinted at nicely by the name RecursiveTask, is that your computemethod can create other RecursiveTask objects and have the pool run them in parallel. First you create another object. Then you call its fork method. That actually starts parallel computation -- fork itself returns quickly, but more computation is now going on. When you need the answer, you call the join method on the object you called fork on. The join method will get you the answer from compute() that was figured out byfork. If it is not ready yet, then join will block (i.e., not return) until it is ready. So the point is to call fork"early" and call join "late", doing other useful work in-between.
Those are the "rules" of how fork, join, and compute work, but in practice a lot of the parallel algorithms you write in this framework have a very similar form, best seen with an example. What this example does is sum all the elements of an array, using parallelism to potentially process different 5000-element segments in parallel. (The types long / Long are just like int / Integer except they are 64 bits instead of 32. They can be a good choice if your data can be large -- a sum could easily exceed 2³², but exceeding 2⁶⁴ is less likely.)

import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

class Globals {
    static ForkJoinPool fjPool = new ForkJoinPool();
}

class Sum extends RecursiveTask<Long> {
    static final int SEQUENTIAL_THRESHOLD = 5000;

    int low;
    int high;
    int[] array;

    Sum(int[] arr, int lo, int hi) {
        array = arr;
        low   = lo;
        high  = hi;
    }

    protected Long compute() {
        if(high - low <= SEQUENTIAL_THRESHOLD) {
            long sum = 0;
            for(int i=low; i < high; ++i) 
                sum += array[i];
            return sum;
         } else {
            int mid = low + (high - low) / 2;
            Sum left  = new Sum(array, low, mid);
            Sum right = new Sum(array, mid, high);
            left.fork();
            long rightAns = right.compute();
            long leftAns  = left.join();
            return leftAns + rightAns;
         }
     }

     static long sumArray(int[] array) {
         return Globals.fjPool.invoke(new Sum(array,0,array.length));
     }
}

How does this code work? A Sum object is given an array and a range of that array. The compute method sums the elements in that range. If the range has fewer than SEQUENTIAL_THRESHOLD elements, it uses a simple for-loop like you learned in introductory programming. Otherwise, it creates two Sum objects for problems of half the size. It uses fork to compute the left half in parallel with computing the right half, which this object does itself by calling right.compute(). To get the answer for the left, it calls left.join().
Why do we have a SEQUENTIAL_THRESHOLD? It would be correct instead to keep recurring until high==low+1and then return array[low]. But this creates a lot more Sum objects and calls to fork, so it will end up being much less efficient despite the same asymptotic complexity.
Why do we create more Sum objects than we are likely to have procesors? Because it is the framework's job to make a reasonable number of parallel tasks execute efficiently and to schedule them in a good way. By having lots of fairly small parallel tasks it can do a better job, especially if the number of processors available to your program changes during execution (e.g., because the operating system is also running other programs) or the tasks end up taking different amounts of time.
So setting SEQUENTIAL_THRESHOLD to a good-in-practice value is a trade-off. The documentation for the ForkJoin framework suggests creating parallel subtasks until the number of basic computation steps is somewhere over 100 and less than 10,000. The exact number is not crucial provided you avoid extremes.

Gotchas

There are a few "gotchas" when using the library that you might need to be aware of:

It might seem more natural to call fork twice for the two subproblems and then call join twice. This is naturally a little less efficient than just calling compute for no benefit since you are creating more parallel tasks than is helpful. But it turns out to be a lot less efficient, for reasons that are specific to the current implementation of the library and related to the overhead of creating tasks that do very little work themselves.
Remember that calling join blocks until the answer is ready. So if you look at the code:
```
    left.fork();
    long rightAns = right.compute();
    long leftAns  = left.join();
    return leftAns + rightAns;
```
you'll see that the order is crucial. If we had written:
```
    left.fork();
    long leftAns  = left.join();
    long rightAns = right.compute();
    return leftAns + rightAns;
```
our entire array-summing algorithm would have no parallelism since each step would completely compute the left before starting to compute the right. Similarly, this version is non-parallel because it computes the right before starting to compute the left:
```
    long rightAns = right.compute();
    left.fork();
    long leftAns  = left.join();
    return leftAns + rightAns;
```
You should not use the invoke method of a ForkJoinPool from within a RecursiveTask orRecursiveAction. Instead you should always call compute or fork directly even if the object is a different subclass of RecursiveTask or RecursiveAction. You may be conceptually doing a "different" parallel computation, but it is still part of the same parallel task. Only sequential code should call invoke to begin parallelism.
When debugging an uncaught exception, it is common to examine the "stack trace" in the debugger: the methods on the call stack when the exception occurred. With a fork-join computation, this is not as simple since the call to compute occurs in a different thread than the conceptual caller (the code that called fork). The library and debugger try to give helpful information including stack information for the thread running compute and the thread that called fork, but it can be hard to read and it includes a number of calls related to the library implementation that you should ignore. You may find it easier to debug by catching the exception inside the call to compute and just printing that stack trace.
In terms of performance, there are many reasons a fork-join computation might run slower than you expect, even slower than a sequential version of the algorithm. See the next section.

Pages