Posts

Showing posts from 2015

Data Aggregation & Data Discovery - Part II

Expanding on the context of Data Aggregation , variously called data refinery, data factory or data lake, I would like to analyze if the concept of Data Aggregation is just a theoretical construct or if there is a practical side to this. My opinion is that Data Aggregation (regardless of how it is referred to) is just a means to an end; an enabler or precursor for  Data Discovery .   This is truly a facility to bring together various types of disconnected sources of data that were previously leveraged in very “targeted” use cases.  The idea being to discover new connections or to explore new usage patterns.   These explorations might belong to the realm of identifying proactive growth opportunities or in the domain of preemptive loss prevention.  Data scientists are able to employ statistical algorithms and predictive modeling techniques to see if new patterns emerge or else to see they are able to ferret out alternate connections.  One also can imagine the use of clustering an

Data Aggregation & Data Discovery - Part I

A lot of talk has been heard lately about the concept of data lake.  Variously known as, data refinery, data factory etc.  I find it interesting that we now hear logical architectural terms that speak to the concepts and to the purpose of the big data technologies such as Hadoop / HDFS and Apache distributed database technologies such as HBase / Cassandra .    This may be indicative of a shift.  What I am not sure of is does this mean that there is a level of maturity that has been achieved by this suite of open source technologies? Or could  this point to the fact that these technologies have practical applications that solve enterprise scale problems? Or does it show that enterprises have realized that they are no longer able to just deal with "structured data" and that a vast majority of information lies in the space of "unstructured content" leaving them no choice but to venture into the realm of big data technologies?  Not really sure!   The fact re