Legacy systems and unstructured data analysis, a clash of ideologies
Data, by nature, was assumed to be structured. Data creation has always followed regimented, controlled processes ensuring defined structures are in place right from design phase to capture to analysis. Creation of Data was never in the hands of the end user of the systems. With the advent of Web 2.0 and social media came the concept of user generated content. The very essence of social media is unstructured data generation and capture, to reflect the essence of human interaction. Web 2.0 and Social media does not want to restrict the user by providing structured input method for data capture, but wants to enable the user to write, speak and interact in a manner that comes naturally, and in a seemingly unstructured fashion. No two conversations will follow the same style, no two speeches will have the same tone, and no two environments the users are interacting will be identical. Legacy systems were designed to bucket and trend already understood and structured processes. They failed miserable to analyze human behavior that today we know as sentiment, perception, reaction patterns etc. However, make no mistake. Legacy systems fair well in analyzing structured elements that Web 2.0 and social media providers provide, for example number of clicks, path analysis, number of followers and number of comments since all this data is again, structured. What gets missed out is what is in text (unstructured) which is to understand sentiment and perception. Legacy systems help analyze some of the behavioral aspects, but fail in the psychographic analysis.
Web 2.0 and social media brought in another aspect to customer insight and business in general. For the first time, businesses could directly gather what end users felt about their products. They got to be a part of the conversations the end users were having with their groups. With the enabling of rapid exchange of data amongst customers, the need for real time analysis became strong and imminent. Legacy systems were designed to analyze data post action. New media needed real time, and also predictive analysis. With data going unstructured (85% of all data1) and volumes increasing manifolds (unstructured growing at 60% whereas structured growing at 20%1), legacy systems needed a redesign themselves, even to provide some form of post action analysis of Web 2.0 and social media content. Modern tools in the market provide basic analytics in terms of keyword-based segmentations, but the real essence is still missing. A mechanism to provide structure to conversation data and store it in an easily analyzable format is the need of the hour. Furthermore, Mathematical treatment of the textual contents requires implementation of sophisticated algebraic computations to summarize the information in mathematical forms. Only once this is achieved, can the core objectives of social media (to assess sentiment, perception and predict behavior) can be addressed through innovative rules and algorithms. Companies like HP are heavily invested in research to mine the unstructured data and bring out the golden nuggets for the business to drive positive outcomes. More specifically driven at influencing customer perception, predicting product launch success, product lifecycle analysis, customer satisfaction and service etc.
In conclusion it is evident that legacy systems need an upgrade to be able to better handle unstructured data, and a new approach is needed to provide real time and predictive analysis to ensure we make the most of the opportunity Web 2.0 and social media have provided businesses.
Source: 1. IDC – Digital Universe. May 2010
Labels: business intelligence, legacy systems, listening, social media