My Developer Connection knowledgebase for software developers
Architecture
Note
While designing a standard Business
Intelligence (BI) solution, one of the primary area of
concerns is to make the application as flexible as possible in order
to cater for future business needs. To design a universal solution,
which could handle several possible use cases, we end up architecting
a highly complex solution, which is difficult to maintain. If we
could design all the three parts of BI solutions independent of
each other, the resulting solution would be easier to maintain and
each module could evolve separately with business requirements over
time. A BI solution is typically made of three parts. An ETL module
where data could be extracted in raw format from various sources,
transformed into the desired format and loaded into some kind of
OLAP or data warehousing schema (star/snowflake or hybrid). This
module can be broken into smaller extraction, transformation and
loader tasks or jobs. Hadoop or spring Job framework can be evaluated
to make ETL process more efficient and faster.
One of the important
step in ETL is not in any of the "E", "T" or "L". This is cleanup
part which comes just before loader. Sometimes this step is intermingled
before or after transformation. Here we handle exceptions and bad
data, which is not suitable for load. These records can be discarded
or sent for more intensive processing to fill in missing data or
stub out proxy data to make it suitable for load. The cleanup process
is often time consuming and it should be designed as an offline
process so that majority of good data can make it through the ETL
process. Generally the architectural concerns of ETL phase are high
throughput I/O, network latency while moving data from one source
to another. Large amount of inserts into data store could be very
slow. It is a common practice to use a staging area away from transactional
system so that transactional requirement could be relaxed to achieve
faster writes into data store.
The second phase of BI solution
is about classification or enrichment where data can be further
refined through ontology, business dictionary or classification
schemes. Business rules are applied to add/create appropriate dimensions.
This step is processing (cpu cycle) intensive and requires highly
efficient algorithms and aggregation logic. In memory database or
some kind of in memory structure is needed to achieve enrichment
without multiple read and writes into database. Once a block of
data is enriched, it can be pulled by visualization tools such as
reporting or analytics interface directly or inserted into OLTP
data store.
The final phase is user interaction phase where
user can view the outcome of Business intelligence data in many
possible ways - interactive or offline. Several commercial tools
from vendors like IBM, SAP, Oracle and Microsoft are available to
achieve this. Some organizations also use home grown solutions composed
of free or cheaper tools such as Jasper Reports, Birt, Pentaho and
others. Usability and business requirements drive this phase and
it should be independent of previous two phases. It important to
facilitate some kind of traceability for final data points back
to the origin so that bad results can be corrected and algorithms/logic
can be improved over time.
Coding
Tip
Java 7 introduces AsynchronousServerSocketChannel
and AsynchronousSocketChannel. These are simple api to create a
client server messaging application with asynchrounous streaming
"out of the box". AsynchronousServerSocketChannel provides open, bind
and accept methods to process messages on the server.
Similarly
AsynchronousSocketChannel has open method to open client side channel.
An instance of InetSocketAddress, corresponding to server can be
used to connect to server channel. ByteBuffer messages can be written
to stream socket, which will be recieved by server side completion
handler. AsynchronousChannelGroup can also be used for resource
sharing. The default group is managed by JVM. Datagram sockets
and multicast is also supported using NetworkInterface api. Most of
these methods are static. This provides an alternative to JGroups.
However developers should perform their own benchmarking specially
in multithreaded environment as Java 7 is not yet used commonly at the
time of writing this tip. You can find a concise tutorial at
http://www.ibm.com/developerworks/java/library/j-nio2-1/index.html?ca=drs-
Open
Source Pick
Apache QPid is an open
source middleware supporting AMQP messaging protocol. AMQP is an
open standard and a string competitor for IBM MQ and Tibco Rendezvous.
Find more details about AMQP at
http://www.amqp.org/.
Find more details about apache QPid at
https://cwiki.apache.org/qpid/index.html. Spring provides a
high level abstraction for AMQP with Spring AMQP. Read more about
spring AMQP at
http://www.springsource.org/spring-amqp
Do
you know?
According to NY Times (click
here for the complete story), Microsoft could not cash big for
it's new software os, windows 8 and surface tablet devices based on windows
8, even during busy holiday shopping season. According to industry
experts, windows 8 and windows surface failed to create much
needed spark inspite of huge investment and redesign. Microsoft
had a late start in this area and the lack of significant number
of windows 8 apps could be one of the reasons to blame. Google plus and Apple app store already has a big lead
in terms of mobile apps.
Oracle has released early acces version
of JDK 8 for ARM and looking developers to provide feedbacks. You
can a take a look at android competitor at
http://jdk8.java.net/fxarmpreview..
It is based on JavaFX. Will it click or face a similar fate as windows
8 for mobile devices?