Finding groups of similar stores (for example) can be a very effective way to manage the complexity of offering each store group what they really need without having to deal with each one individually, a mammoth task. Whether you are looking to find groups of stores, shoppers, regions, products or even sales patterns a very similar approach can work for you.
Clustering is part of the journey it's not a destination. If you don't know and understand what decisions your analytic work should enable (your destination) how can you build a good model?
Let's take a real-life example again around store clustering. Much of this work is done to help define the assortment offering at each store. Get the right products in each store and sales should increase: good news for the retailer and good news for the manufacturer.
I've seen a number of approaches to this work and many start out with "we need a cluster analysis" - good idea. Sadly, many then forget why they needed that cluster analysis, pull all the data they have on stores, throw it at a clustering model and hope something useful falls out the other end. What sort of data would that be? Well, assuming you have access to Point of Sale data, its easy enough to get a measure of overall sales for your category (to split the big stores from the small ones). Many of you will also have access to demographic data (ethnicity, income-levels, household size, age etc.) that you can map to store locations. Maybe you also have geographic data like latitude/longitude by store and you could throw that in too.
As long as your data is (largely) error-free and complete the cluster analysis will run, but the output may or may not support your decision process. Remember that we started with a goal of wanting to understand what products to put on the shelf. Is this supported by knowing the demographic mix? Only indirectly - if we know that we a group of stores serving an, on average, older population we can make assumptions about the sorts of products they would prefer and guess at how much more of these products to put on the shelf. Analytically speaking, "assumption" is a rather bad word: "guess" is a curse-word.
(BTW - there is some very good work being done to more accurately map local demographic data to stores. It gets complicated very quickly, feels cutting-edge and even "cool", but, if it does not support you in reaching your particular destination, its not useful to you right now.)
We can do so much better than this by keeping the destination in mind throughout the process. We want to improve assortment by store so let's start by ignoring demographics, ignoring geography and let's look at sales. We want to know what sells well and what sells badly by store. Even for one category there can be hundreds of products in the mix so we really can't analyze this by product. We can get sales profiles though based on key product characteristics. Here's one such profile (mocked up for this example) for prepared food products.
Just visually looking at these 8 stores, you can see that:
- some stores sell relatively less "American" food (3 and 8)and a higher proportion of Italian and/or Asian foods.
- some stores sell relatively more chicken and less beef (4,5,7)
- some stores are heavily value oriented (1,4,7) , some mid-range (3,5) and some heavy on premium (2,8)
Deciding which product characteristics are most relevant to your category is still a challenge and you may go through a few cycles of exploration in determining what really characterizes the difference between stores but I'm hoping you can see that this is already a much better approach that the one we started with and it's tied directly to the decisions we need to make on product assortment.
Getting the sales profile data may be a bit more challenging: you still need Point-of-Sale data and potentially a lot of it as you will need to start with at least item:store level detail. You will most likely need a DSR or a very friendly and flexible database administrator with time on his hands to get it into this format. (If you do not know what a DSR is check out my posts Point of Sale Data – Basic Analytics and Data Handling - the right tool for the job)
Based off sales-profile data and a few hours work we can have clusters that are built on product preferences of shoppers in these stores - what better to drive your assortment decisions ?
Of course, even if clustering is the right analytic tool, if you have a different destination in mind, this may not be how you want to get there.
A last thought: this does not need to be the end of our journey. We can use geographic and demographic data not to build the clusters but to help us describe them or to find out which geo/demographic features are related to specific sales profiles. More on this in future posts.