Posts

Showing posts from May, 2015

Data Aggregation & Data Discovery - Part I

A lot of talk has been heard lately about the concept of data lake.  Variously known as, data refinery, data factory etc.  I find it interesting that we now hear logical architectural terms that speak to the concepts and to the purpose of the big data technologies such as Hadoop / HDFS and Apache distributed database technologies such as HBase / Cassandra .    This may be indicative of a shift.  What I am not sure of is does this mean that there is a level of maturity that has been achieved by this suite of open source technologies? Or could  this point to the fact that these technologies have practical applications that solve enterprise scale problems? Or does it show that enterprises have realized that they are no longer able to just deal with "structured data" and that a vast majority of information lies in the space of "unstructured content" leaving them no choice but to venture into the realm of big data technologies?  Not really sure! ...