Home
SOA
Spring-Hibernate
GOF Design Patterns
Java BP
Java Tutorial
Java Traps
Datawarehouse
C++ Tutorial
Unix Commands
Interview QnA
About MDC

Data structure and Algorithm bits

Data Structures in General:

Collections: Collections can be broken down into two types—linear and nonlinear. A linear collection is a list of elements where one element follows the previous element. Elements in a linear collection are normally ordered by position (first, second, third, etc.). In the real world, a grocery list exemplifies a linear collection; in the computer world (which is also real), an array is designed as a linear collection. Nonlinear collections hold elements that do not have positional order within the collection. An organizational chart is an example of a nonlinear collection, as is a rack of billiard balls. In the computer world, trees, heaps, graphs, and sets are nonlinear collections.

A binary tree is a special type of tree collection where each node has no more than two children. A binary tree can become a binary search tree, making searches for large amounts of data much more efficient. This is accomplished by placing nodes in such a way that the path from the root to a node where the data are stored takes the shortest route possible. Yet another tree type, the heap, is organized so that the smallest data value is always placed in the root node. The root node is removed during a deletion, and insertions into and deletions from a heap always cause the heap to reorganize so that the smallest value is placed in the root. Heaps are often used for sorts, called a heap sort. Data elements stored in a heap can be kept sorted by repeatedly deleting the root node and reorganizing the heap.

A set is a collection of unordered data values where each value is unique. A graph is a set of nodes and a set of edges connecting the nodes. Graphs are used to model situations where each of the nodes in a graph must be visited, sometimes in a particular order, and the goal is to find the most efficient way to “traverse” the graph. A network is a special type of graph in which each of the edges is assigned a weight. The weight is associated with a cost for using that edge to move from one node to another.

Bubble sort (VB):
Public Sub BubbleSort()
Dim outer, inner, temp As Integer
For outer = numElements - 1 To 2 Step -1
For inner = 0 To outer - 1
If (arr(inner) > arr(inner + 1)) Then
temp = arr(inner)
arr(inner) = arr(inner + 1)
arr(inner + 1) = temp
End If
Next
Next
End Sub

Selection sort: This sort works by starting at the beginning of the array, comparing the first element with the other elements in the array.
The smallest element is placed in position 0, and the sort then begins again at position 1. This continues until each position except the last
position has been the starting point for a new loop.

Insertion Sort: Regular approach of identifying best position for a card from top and inserting it there.
Public Sub InsertionSort()
Dim inner, outer, temp As Integer
For outer = 1 To numElements - 1
temp = arr(outer)
inner = outer
While (inner > 0 AndAlso (arr(inner - 1) >= temp))
arr(inner) = arr(inner - 1)
inner -= 1
End While
arr(inner) = temp
Next
End Sub

The Selection sort is the most efficient of the algorithms, followed by the Bubble sort, and then the Insertion sort among basic sorting algorithms. None of these algorithms is not well suited for larger data sets.

A sequential search (also called a linear search) is very easy to implement. Start at the beginning of the array and compare each accessed array element to the value you’re searching for. If you find a match, the search is over. If you get to the end of the array without generating
a match, then the value is not in the array.

Self Organizing Data: The concept behind this strategy is that we can minimize search times by putting items that are frequently searched for at the beginning of the data set. Eventually, all the most frequently searched for data items will be located at the beginning of the data set.
The data set is organized not by the programmer before the program runs, but by the program while the program is running.

Binary Search: 50-50 algorithms:
Normal:
Public Function binSearch(ByVal value As Integer) As Integer
Dim upperBound, lowerBound, mid As Integer
upperBound = arr.GetUpperBound(0)
lowerBound = 0
While (lowerBound <= upperBound)
mid = (upperBound + lowerBound) \ 2
If (arr(mid) = value) Then
Return mid
ElseIf (value < arr(mid)) Then
upperBound = mid - 1
Else
lowerBound = mid + 1
End If
End While
Return -1
End Function

Recursive:
Public Function RbinSearch(ByVal value As Integer, ByVal lower As Integer, ByVal upper As Integer) As Integer
If (lower > upper) Then
Return -1
Else
Dim mid As Integer
mid = (upper + lower) \ 2
If (value < arr(mid)) Then
RbinSearch(value, lower, mid - 1)
ElseIf (value = arr(mid)) Then
Return mid
Else
RbinSearch(value, mid + 1, upper)
End If
End If
End Function

Stacks and Queues: Data in a stack are added and removed from only one end of the list (LIFO); data in a queue are added at one end and removed from the other end of a list (FIFO). Stacks are used extensively in programming language implementations, from everything from expression
evaluation to handling function calls. Queues are used to prioritize operating system processes and to simulate events in the real world, such as teller lines at banks and the operation of elevators in buildings. Stack has methods like peek, clear and contains. Queue has operations like peek, enque and deque. Implementing the Queue class using an ArrayList is straightforward. When we need to insert an item into our queue, the Arraylist Add method places the item in the next free element of the list. When we need to remove the front item from the queue, the ArrayList moves each remaining item in the list up one element. We don’t have to maintain a placeholder.

One algorithm that we can use to convert numbers from decimal to octal or binary makes use of a stack. The steps of the algorithm are as follows:

Get number
Get base
Loop
Push the number mod base onto the stack
Number becomes the number integer-divided by the base
While number not equal to 0

Queues are also useful for sorting data. Radix Sort is based on queue. The radix sort works by making two passes over a set of data, in this case integers in the range 0–99. The first pass sorts the numbers based on the 1’s digit and the second pass sorts the numbers based on the
10’s digit. Each number is then placed in a bin based on the digit in each of these places.

Given the numbers: 91 46 85 15 92 35 31 22
the first pass results in this bin configuration:
Bin 0:
Bin 1: 91 31
Bin 2: 92 22
Bin 3:
Bin 4:
Bin 5: 85 15 35
Bin 6: 46
Bin 7:
Bin 8:
Bin 9:

Now put the numbers in order based on which bin they’re in:
91 31 92 22 85 15 35 46

Next, take the list and sort by the 10’s digit into the appropriate bins:
Bin 0:
Bin 1: 15
Bin 2: 22
Bin 3: 31 35
Bin 4: 46
Bin 5:
Bin 6:
Bin 7:
Bin 8: 85
Bin 9: 91 92

Take the numbers from the bins and put them back into a list, which results
in a sorted set of integers:
15 22 31 35 46 85 91 92

Hashing is a very common technique for storing data in such a way that the data can be inserted and retrieved very quickly. Hashing uses a data structure called a hash table. Although hash tables provide fast insertion, deletion, and retrieval, they perform poorly for operations that involve searching, such as finding the minimum or maximum value.

Problem 0: Computing Fibonacci number using recursion:
The Fibonacci numbers can be described by the following sequence:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . .

Simple recursive program you can use to generate any specific number in this sequence. Here is the code for the function:

Function recurFib(ByVal n As Integer) As Long
If (n < 2) Then
Return n
Else
Return recurFib(n - 1) + recurFib(n - 2)
End If
End Function

The problem with the recursive solution is that too many values are recomputed during a recursive call.

Better approach:
Function iterFib(ByVal n As Integer) As Long
Dim index As Integer
Dim val(n) As Integer
If (n = 1 Or n = 2) Then
Return 1
Else
val(1) = 1
val(2) = 2
For index = 3 To n - 1
val(index) = val(index - 1) + val(index - 2)
Next
Return val(n - 1)
End If
End Function

Problem: Finding the longest common substring in two strings. For example, in the words “raven” and “havoc,” the longest common substring is
“av”.

Solution: http://en.wikipedia.org/wiki/Longest_common_substring_problem  (increment diagonally and largest sequence of number is longest common
string)

function LCSubstr(S[1..m], T[1..n])
L := array(0..m, 0..n)
z := 0
ret := {}
for i := 1..m
for j := 1..n
if S[i] = T[j]
L[i,j] := L[i-1,j-1] + 1
if L[i,j] > z
z := L[i,j]
ret := {}
if L[i,j] = z
ret := ret {S[i-z+1..i]}
return ret

Problem: The Knapsack Problem: If the safe in our example has five items, the items have a size of 3, 4, 7, 8, and 9, respectively, and values of 4, 5, 10, 11, and 13, respectively, and the knapsack has a capacity of 16, then the proper solution is to pick items 3 and 5 with a total size of 16 and a total value of 23. Write generic code:

Module Module1
Sub Main()
Dim capacity As Integer = 16
Dim size() As Integer = {3, 4, 7, 8, 9}
Dim values() As Integer = {4, 5, 10, 11, 13}
Dim totval(capacity) As Integer
Dim best(capacity) As Integer
Dim n As Integer = values.Length
Dim i, j As Integer
For j = 0 To n - 1
For i = 0 To capacity
If (i >= size(j)) Then
If (totval(i) < totval(i - size(j)) + values(j)) Then
totval(i) = totval(i - size(j)) + values(j)
best(i) = j
End If
End If
Next
Next
Console.WriteLine("The maximum value is: " & _
totval(capacity))
Console.Read()
End Sub
End Module

Problem: Greedy algorithm: A greedy algorithm is one that always chooses the best solution at the time, with no regard for how that choice will affect future choices. Using a greedy algorithm generally indicates that the implementer hopes that the series of “best” local choices made will lead to a final “best” choice. If so, then the algorithm has produced an optimal solution; if not, a suboptimal solution has been found. However, for many problems, it is not worth the trouble to find an optimal solution, so using a greedy algorithm works just fine.

The classic example of following a greedy algorithm is making change. Let’s say you buy some items at the store and the change from your purchase is 63 cents. How does the clerk determine the change to give you? If the clerk follows a greedy algorithm, he or she gives you two quarters, a dime, and three pennies. That is the smallest number of coins that will equal 63 cents (given that we don’t allow fifty-cent pieces). http://en.wikipedia.org/wiki/Greedy_algorithm

Problem: Huffman Algorithm: Compressing data is an important technique for the practice of computing. Data sent over the Internet need to be
sent as compactly as possible. There are many different schemes for compressing data, but one particular scheme makes use of a greedy algorithm—Huffman coding. This algorithm is named for the late David Huffman, an information theorist and computer
scientist who invented the technique in the 1950s. Data compressed using a Huffman code can achieve savings of 20% to 90%. When data are compressed, the characters that make up the data are usually translated into some other representation to save space. A typical
compression scheme is to translate each character to a binary character code, or bit string.

For example, we can encode the character “a” as 000, the character “b” as 001, the character “c” as 010, and so on. This is called a fixed-length code. The Huffman code algorithm takes a string of characters, translates them to a variable-length binary string, and creates a binary tree for the purpose of decoding the binary strings. The path to each left child is assigned the binary character 0 and each right child is assigned the binary character 1. The algorithm works as follows: Start with a string of characters you want
to compress. For each character in the string, calculate its frequency of occurrence in the string. Then sort the characters into order from lowest frequency to highest frequency. Take the two characters with the smallest frequencies and create a node with each character (and its frequency) as children of the node. The parent node’s data element consists of the sum of the frequencies of the two child nodes. Insert the node back into the list. Continue this process until every character is placed into the tree. When this process is complete, you have a complete binary tree that can be used to decode the Huffman code. Decoding involves following a path of 0s and 1s until you get to a leaf node, which will contain a character. http://en.wikipedia.org/wiki/Huffman_coding



Problem: String search algorithms http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm



Problem: Find duplicate number in an sequential array up length 5.
Solution: Since there are n values from 1 to n-1 with one duplicate, we can use the fact that the sum of the integers from 1 to m is (m*(m+1))/2, take the sum from 1 to n-1, subtract that from the actual sum of numbers in the array, and the result will be the duplicated number:

let s ← 0
for 1 <= i <= n:
s ← s + A[i]
return s - n*(n-1)/2

A binary search tree is a binary tree in which data with lesser values are stored in left nodes and data with greater values are stored in right nodes. This property provides for very efficient searches.

There are three traversal methods used with BSTs:
inorder, preorder, and postorder. An inorder traversal visits all the nodes in a BST in ascending order of the node key values. A preorder traversal visits the root node first, followed by the nodes in the subtrees under the left child of the root, followed by the
nodes in the subtrees under the right child of the root.

The code for finding the minimum and maximum values is almost trivial in both cases, owing to the properties of a BST. The smallest value in a BST will always be found at the last left-child node of a subtree beginning with the left child of the root node. In contrast, the largest value in a BST is found at the last right-child node of a subtree beginning with the right child of the root node. Code for finding the minimum value first:

Public Function FindMin() As Integer
Dim current As Node = root
While (Not (current.Left Is Nothing))
current = current.Left
End While
Return current.iData
End Function

Now here’s the code for finding the maximum value in a BST:

Public Function FindMax() As Integer
Dim current As Node = root
While (Not (current.Right Is Nothing))
current = current.Right
End While
Return current.iData
End Function

In general, an algorithm is called polymorphic if it can achieve the same functionality using different data structures. For example, if the same sorting or searching algorithm could be used with different types of containers, such as vectors, lists, maps, etc., then that algorithm would be called polymorphic. As a case in point, the generic algorithms of the STL library in C++ are polymorphic because they can be used for different container classes.
In the context of Java collections, the polymorphic algorithms are all supplied by the various static methods of the java.util.Collections class. Consider, for example, the method sort(List list). This method is capable of sorting any collection that has implemented the List interface, and that includes containers of types ArrayList, LinkedList, Vector, etc.

Puzzles and questions
 

Problem: There are 4 men who want to cross a bridge. They all begin on the same side. You have 17 minutes to get all of them across to the
other side. It is night. There is one flashlight. A maximum of two people can cross at one time. Any party who crosses, either 1 or 2 people,
must have the flashlight with them. The flashlight must be walked back and forth, it cannot be thrown, etc. Each man walks at a different
speed. A pair must walk together at the rate of the slower mans pace.
Man 1:1 minute to cross
Man 2: 2 minutes to cross
Man 3: 5 minutes to cross
Man 4: 10 minutes to cross

Solution: 1 and 2 cross together. 1 comes back. then 3 and 4 cross. 2 comes back. then 1 and 2 cross. Total time is 2+1+10+2+2 = 17.

Problem: You have 5 jars of pills. Each pill weighs 10 gram, except for contaminated pills contained in one jar, where each pill weighs 9 gm.
Given a scale, how could you tell which jar had the contaminated pills in just one measurement?

Solution: Take one pill from first, two from second, three from third and so on. Total pills are n(n+1)/2 and should weigh 10n(n+1)/2. If it
weighs x gm less than that then the x'th jar is contaminated, since we took x pills from that jar which weighed 1 gm less.


Problem: One train leaves Los Angeles at 15 MPH heading for New York. Another train leaves from New York at 20mph heading for Los Angeles on
the same track. If a bird, flying at 25mph, leaves from Los Angeles at the same time as the train and flies back and forth between the two
trains until they collide, how far will the bird have traveled?

Solution: If distance is X miles between NY and LA, then it takes X/(15+20) hours for the trains to collide, and bird will have travelled
25X/(15+20) = 5X/7 miles in that time.

Problem: Imagine that you have 26 constants, labelled A through Z. Each constant is assigned a value in the following way: A = 1; the rest of
the values equal their position in the alphabet (B corresponds to the second position so it equals 2, C = 3, etc.) raised to the power of the
preceeding constant value. So, B = 2 ^ (A's value), or B = 2^1 = 2. C = 3^2 = 9. D = 4^9, etc., etc. Find the exact numerical value to the
following equation:
(X - A) * (X - B) * (X - C) * ... * (X - Y) * (X - Z)

Answer is 0, because (X-X) is present in the product.