Sunday, June 28, 2015

Data Aggregation & Data Discovery - Part II

Expanding on the context of Data Aggregation, variously called data refinery, data factory or data lake, I would like to analyze if the concept of Data Aggregation is just a theoretical construct or if there is a practical side to this.

My opinion is that Data Aggregation (regardless of how it is referred to) is just a means to an end; an enabler or precursor for Data Discovery.  This is truly a facility to bring together various types of disconnected sources of data that were previously leveraged in very “targeted” use cases.  The idea being to discover new connections or to explore new usage patterns.   These explorations might belong to the realm of identifying proactive growth opportunities or in the domain of preemptive loss prevention.  Data scientists are able to employ statistical algorithms and predictive modeling techniques to see if new patterns emerge or else to see they are able to ferret out alternate connections.  One also can imagine the use of clustering and machine learning techniques to find unknown patterns that could be applied to marketing, operational process, product placement decisions.  None of these would have been possible with dissociated sources of data.  Thus, the truly quantifiable benefit of leveraging a Data Aggregation platform is to bring large disconnected data into one holistic platform and to run traditional statistical modeling techniques for the purpose of Data Discovery.   

Now you might ask as to how the concept of Data Aggregation and the need to create a platform for disparate data sources are connected to Data Discovery and Big Data?  You are wondering what if any options open up.   Traditional infrastructure has always been a real limiting factor for any enterprise that wanted to create a hosting Platform for large data sets.  So most enterprises would be more inclined to host Data Ware Housing Platforms that offered KPIs which were projections of known trends rather than to invest in Platforms that explored the somewhat unreliable potential that lay in the realm of statistical modeling.  Data scientists overcame this by using “sample data sets”, knowing fully well that the results could be skewed or the patterns discovered could be choppy.  This is where Hadoop based Big Data techniques come in handy. 

Now your IT departments can offer cost-effective Hadoop based Platforms for deploying large data sets needed for Data Aggregation.  Linearly scalable commodity-hardware based big Data Aggregation Platforms make it possible for the data scientist to execute their  predictive models / algorithms against representative  data sets, instead of on scaled down sub-sets.  Most of all, Hadoop based Data Aggregation now insures the reliability of business outcomes generated by these models/ algorithms.  The efficacy of the outcomes and the applicability of the predictions ultimately increase the rate of adoption of data science predictive modeling techniques.  The bottom line – businesses gain a competitive edge from the process of Data Discovery deployed against the Data Aggregation constructs.

I look forward to hearing from you to learn what is working for you and if you have been able to realize the benefits that these technologies seem to tout.

Wednesday, May 27, 2015

Data Aggregation & Data Discovery - Part I

A lot of talk has been heard lately about the concept of data lake.  Variously known as, data refinery, data factory etc.  I find it interesting that we now hear logical architectural terms that speak to the concepts and to the purpose of the big data technologies such as Hadoop / HDFS and Apache distributed database technologies such as HBase/ Cassandra.   

This may be indicative of a shift.  What I am not sure of is does this mean that there is a level of maturity that has been achieved by this suite of open source technologies? Or could  this point to the fact that these technologies have practical applications that solve enterprise scale problems? Or does it show that enterprises have realized that they are no longer able to just deal with "structured data" and that a vast majority of information lies in the space of "unstructured content" leaving them no choice but to venture into the realm of big data technologies?  Not really sure!  

The fact remains, when the big name software vendors start getting into the business of marketing big data technologies and call start publishing white papers with cool sounding names then there is something going on!!  I look at the concepts of data lake, data refinary, data factory etc as synonymous terms for what in the information science realm we call data aggregation!  I could be totally off base here and would love to have more of a conceptual / architectural debate on this topic.

I would love to hear from others actively leveraging these technologies as to how they are applying these concepts/ technologies.

surekha -

Monday, February 11, 2013

Is Operational Data Store as a concept still relavant in today's Informatoin ecosystem?

Hi Fellow Architects -

I was recently reviewing old publications EDW and some of the related concepts such as those of Corporate Information Facotry, Operational BI etc. and came across an old Inmon article from ’98.  I was curious to find out as to whether or not an ODS is relevant in today's landscape where we have EII, EDW Appliances, cloud based Warehousing solutions.

Thanks for tuning in!!
surekha -

Monday, October 15, 2012

What is a Platform?

Hi Fellow Architects - Here are some common questions Architects often have to deal with when trying to build/ define a Platform. Of course, I would like to hear from the experienced among you to seed this discussion.

What is the purpose of the Platform?
Is it for others to build business capabilities or else is this for serving up some business capability that is your competitive advantage. The former may be called Infrastructure as a Service and the later Information as a Service. This informs your decision on how much of the interface to expose.

The first, Infrastructure as a Service, requires you tighten the integration and management interface and the integration pathways via strict SLAs and service contracts that drive both the infrastructure usage patterns, infrastructure management and billing.
The later, Information as a Service, drives you to define the service contract that is unique to your business needs with very tight service definition and service usage criteria while treating all of the business rules for aggregation and information interpretation as proprietary Intellectual Property which is kept under tight wraps. Here the interaction is via clean, crisp, stable interfaces that are defined and described in standard formats.
What are the common attributes of either Platform?
It has to be Extensible, Scalable and Flexible and it has to have the level of reliability that we have come to expect with services such as email, Google maps etc.
I would like to hear from you on what other key attributes should or could be part of any Platform.
  1. Are there any standards that govern these concepts?
  2. What if any principles and tenets apply to the building of a Platform?
  3. Do you treat this as just a technology choice or is this about using the technology to deliver the Platform capabilities?
  4. How do you insure that the constraints of the technology are not becoming the limiting factor to your Platform?
  5. How do you build the Platform so that you are in a forever beta mode without it becoming an onerous Platform management process?
  6. What are some of the key concepts you would want to establish as ground rules or concrete principles so that the basic premise of the Platform is not broken over the multiple iterations?
Thank you for tuning in. Your feedback is invaluable.
surekha -

Sunday, July 22, 2012

Mastering Master Data??

Hi Fellow Architects,

Master Data has almost become a boring topic these days but I still find that many orgnaizations have not yet either harnessed the power of their master data or else have not really turned the corner on the "Master Data Project" which is now becoming an unwieldy expensive never ending project. 

I was wondering if any of you have had success stories in this space and have been able to come up with some simple best practices on tocpics related to master data such as the following -

a) how do you determine what is valuable information to be mastered and what not to?

b) how do you determine if the master data continues to be of relavance to the enterprise?

c) how the profile of master data has changed in the world of social networking etc.?

Your feedback is always welcome!!
Best Regards.

surekha -

Monday, June 27, 2011

Possible Use Cases For Platform as a Service?

I am wondering if the following use cases qualify for being a first foray into Application Infrastructure Virtualization or more loosely Platform as a Service (PaaS)?

For instance, would I be able to deploy a shared global service that has a wide variety of consumers with different usage peaks?  These peaks could coincide with global marketing events where  by the compute cycles available for the service that is hosted in the different regions could be varied by just changing the provisioning policies.

In addition, I am looking for the platform to be able to insure that the commerce site has the ability to honor the QoS needs of its' key subscribers/ top rated customers even during times of unpredictable bursts.  So the question is does anyone know of PaaS or Application Infrastructure Virtualization platforms that are able to provide such an elasticity in the platform to scale up or to throttle the compute resources specifically based on consumer calls.  Is there a possibility for management of these resource pools  with specific routing rules that protect high value transactions during peaks without causing overall service disruption?

In addition, I could see how a PaaS could be useful in creating a "shared" prod-like stress/ load environment where by the usage of this environment is governed based on policies that determine who and what gets access to the shared stress environment compute resources.  This would enable the business services to also be tested to insure that they are able to optimally use dynamic clusters without "wasting" too many resources in "hydrating" and "persisting" in-process state.  Also, key would be to leverage capabilities of the platform to test how the service is able to honor calls from different consumers with different SLAs and service priority levels.  

Finally, the platform's ability to dial down resource allocation to the business service would come in handy to test behavior of the business service to analyze how it handles lower priority tasks that may be need to operate under resource starvation conditions.  This use case also could be used to bring down the overall cost of offering a business service at base levels of utilzation to keep cost of offering this "seasonal" service low. 

The idea in all cases would be to insulate the business service from having to "hard wire" the decisions of work load and resource management with deployment platform providing the ability to do so just based policies and QoS based resource provisioning.

I am curious to find out what if any experience you have had with the commercial PaaS solutions.  Your feedback is invaluable!!

surekha -

Sunday, May 29, 2011

Attributes of a Product Manager...

My fellow blogger wrote about "Should Architects aspire to be Product Managers?" in which he recounts talents like being able to interact with customers, understanding the revenue model, being able to understand and articulate how the technology maps/ aligns with the business strategy, being able to work with a cross-functional teams and so on (8 attributes in total).   Strikingly, softer skills of "passion" and "focus" were called out on at least a couple of occasions, two key characteristics , in my opinion shared with the consummate  architect!!

Not only do I completely agree with the list my colleague outlines but from experience I can also testify these as key qualities of an good Product Manager.  In addition, I have found the following two attribute in a couple of "great" Product Managers with whom I have had the pleasure of interaction.  I wish to highlight these qualities as well.  

Hence, at nine and ten I propose these abilities.   Please read "Should architects aspire to be Product Managers?" for the other eight!!

- #9 - Ability to differentiate between what is core to the product offering and what is not - guarding against diluting the sweet spot of the product.

- #10 - Good understanding of competition and what differentiates this product from competition and if the product is at risk of loosing ground against competition.

Often times, I find Sales Account Managers touting their product as the end all be all elixir that addresses my every need.  By contrast, a Product Manager has to be pragmatic about presenting their product's capabilities and pain points that it was designed to address.

Thanks for listening.  Please let us know if you agree!!
Surekha -