Expanding on the context of Data Aggregation, variously called data refinery, data factory or data lake, I would like to analyze if the concept of Data Aggregation is just a theoretical construct or if there is a practical side to this.
My opinion is that Data Aggregation (regardless of how it is referred to) is just a means to an end; an enabler or precursor for Data Discovery. This is truly a facility to bring together various types of disconnected sources of data that were previously leveraged in very “targeted” use cases. The idea being to discover new connections or to explore new usage patterns. These explorations might belong to the realm of identifying proactive growth opportunities or in the domain of preemptive loss prevention. Data scientists are able to employ statistical algorithms and predictive modeling techniques to see if new patterns emerge or else to see they are able to ferret out alternate connections. One also can imagine the use of clustering and machine learning techniques to find unknown patterns that could be applied to marketing, operational process, product placement decisions. None of these would have been possible with dissociated sources of data. Thus, the truly quantifiable benefit of leveraging a Data Aggregation platform is to bring large disconnected data into one holistic platform and to run traditional statistical modeling techniques for the purpose of Data Discovery.
Now you might ask as to how the concept of Data Aggregation and the need to create a platform for disparate data sources are connected to Data Discovery and Big Data? You are wondering what if any options open up. Traditional infrastructure has always been a real limiting factor for any enterprise that wanted to create a hosting Platform for large data sets. So most enterprises would be more inclined to host Data Ware Housing Platforms that offered KPIs which were projections of known trends rather than to invest in Platforms that explored the somewhat unreliable potential that lay in the realm of statistical modeling. Data scientists overcame this by using “sample data sets”, knowing fully well that the results could be skewed or the patterns discovered could be choppy. This is where Hadoop based Big Data techniques come in handy.
Now your IT departments can offer cost-effective Hadoop based Platforms for deploying large data sets needed for Data Aggregation. Linearly scalable commodity-hardware based big Data Aggregation Platforms make it possible for the data scientist to execute their predictive models / algorithms against representative data sets, instead of on scaled down sub-sets. Most of all, Hadoop based Data Aggregation now insures the reliability of business outcomes generated by these models/ algorithms. The efficacy of the outcomes and the applicability of the predictions ultimately increase the rate of adoption of data science predictive modeling techniques. The bottom line – businesses gain a competitive edge from the process of Data Discovery deployed against the Data Aggregation constructs.
I look forward to hearing from you to learn what is working for you and if you have been able to realize the benefits that these technologies seem to tout.
Popular posts from this blog
Fellow Bloggers – My role is to create & deliver digital products and solutions that help deliver value to the customer and increase customer loyalty. As an architect of these solutions I am constantly striving to effectively leverage Big Data, NLP and data science techniques. However, when it comes to data science I always struggle with the concepts of machine learning (ML) and artificial intelligence (AI) . In this blog I embark on a quest to find a way to set apart the concepts of ML & AI and to simplify the decision of when to apply which of these two concepts. In just the past couple of years, ML/ AI have magically penetrated into all aspects of our service industry - from automating a manual process to driving cars to offering self-help assistance to recommending next best offers to automation of complex decision making. So the question becomes are these algorithms "simulating" the human or just "mimicking" the human. Do they b
It is one thing to read about Internet of Things (IoT) and get dazzled by the commercial opportunities it offers based on the stats like the number of connected devises there are in the world today. Or how more and more consumer products are getting connected to the "grid" to enable remote monitoring/ operations. Despite all that, it is unclear as to how an enterprise would be able to make a strategic decision about benefits of an investment in IoT. How would it know if their business model or product portfolio or customer base would gain from this investment? I am looking for any case studies, market research material that might help in this analysis. Are there players (established industrial giants and/ or manufacturing heavy-hitters) who have adopted this technology and gained market share or helped make significant product improvements and /or branch out into services not possible before the advent of IoT. Some notable companies come to mind in this space -
A lot of talk has been heard lately about the concept of data lake. Variously known as, data refinery, data factory etc. I find it interesting that we now hear logical architectural terms that speak to the concepts and to the purpose of the big data technologies such as Hadoop / HDFS and Apache distributed database technologies such as HBase / Cassandra . This may be indicative of a shift. What I am not sure of is does this mean that there is a level of maturity that has been achieved by this suite of open source technologies? Or could this point to the fact that these technologies have practical applications that solve enterprise scale problems? Or does it show that enterprises have realized that they are no longer able to just deal with "structured data" and that a vast majority of information lies in the space of "unstructured content" leaving them no choice but to venture into the realm of big data technologies? Not really sure! The fact re