The field of Business Analytics can be very complex. Top level analysts are experts; just like medical specialists, they have undergone years of additional training and know their area of specialty (perhaps price sensitivity, multivariate statistical modeling, survey analysis or mathematical optimization) backwards. Keeping with this analogy, most business managers are as well informed as to what business analytics can do for them as a patient heading in to see their primary physician; perhaps less so.
I've blogged before about the need to use the right tools to hold and manipulate data as data quantity increases (Data Handling the Right Tool for the Job). But, I really want to get to some value-enhancing analytics and as data grows it becomes increasingly hard to apply analytical tools.
Let’s assume that we have a few Terabytes of data and that it's sat in an industrial-strength database (Oracle, SQL*Server, MySQL, DB2, …) - one that can handle the data volume without choking. Each of these databases has its own dialect of the querying language (SQL) and while you can do a lot of sophisticated data manipulation, even a simple analytical routine like calculating correlations is a chore.
The current Wikipedia page on Cluster Analysis, excerpted below, is correct, detailed and makes absolute sense. Then again, if you do not have a background in statistical modeling, I'm guessing these two paragraphs leave you no wiser.
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.
Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.
In this post I hope to provide a workable introduction for people that need to be educated consumers of cluster analysis.
Reporting is about what happened; Analytics is about answering "what if" and "what's best" questions. Most of the materials that land on a VP/Director’s desk (or inbox) are examples of reporting with no analytical value added.