I've created the data for this example so that there are no issues with confidentiality but not only does it closely represent reality, I have seen many more extreme versions of this in real data.
First let's look at 2 years of weekly Point of Sale (POS) data. Anything interesting happening here?
I think we can say that there is no obvious seasonality, and there seems to be no trend in the data: longer term sales are neither growing nor declining. There appears to be no pattern to it: sales oscillate up/down in a fairly random looking manner, sometimes, it's up or down for 2 weeks at a time, sometimes,1 , sometimes 3. It looks like fairly random noise - right ?
There is a lot going on here! Before you dismiss it as noise, look a little closer. Can you see a repeating monthly pattern? Can you see a spike in sales near the beginning of the month and another a few days later? That's worth looking into. How about a repeating weekly pattern: much harder to see but retail outlets generally sell more on Friday/Saturday/Sunday than on other days of the week - chances are that is causing some of the oscillation too.
Let's look at this visually for day of the week. This chart shows sales by day of the week indexed to the "average day".
On Sundays, we see sales of almost 20% more than average, on Friday and Saturday slightly more and for the rest of the week about 80% of average.
Now, let's look at it by day of the month.
This is still showing indexes relative to the "average day" but in comparison to the day of the week visual above these spikes are big! Clear spikes on day 1 and day 10, with increased sales for 2-3 days after each spike and perhaps a small increase just before each spike?
What could be causing such spikes? How about SNAP? (see What drives your sales? SNAP?) ? Or perhaps WIC a similar US government program that subsidizes food stuffs targeted at children.? Any event distributing funds that routinely happens on a monthly calendar could cause such a problem.
Looking at these charts separately by day of the week and then by day of the month, we are trying to ignore the impact of one factor while reviewing the other. Doing so introduces a bias into the output: by pure luck, some days of the week will coincide a little more often with the day of the month spikes and their average will increase. (In this case, Thursday gets an extra spike day, Sunday and Tuesday are one less than average). What we really need is a tool that can examine the impact of both variables at the same time and tease them apart cleanly. A regression model does this quite effectively. I'll not get into how this is constructed today, let's just review a few results.
The regression model predicting sales using just the day of the week and day of the month as inputs explains 83% of the variation in this data. That is a truly staggering result for something that a first glance looked like random noise.
The first chart below shows the daily sales data (blue) with the model prediction (yellow) on top. The model prediction is very close to actual sales with the exception of a few troughs in sales. To help mark these outliers, the second chart, plotted on the same time-scale shows where we have the biggest gaps between predicted and actual sales. We have 6 occasions when we have relatively large errors. Any guesses as to what these are?
This data reflects US retail stores, which are closed, or largely empty of shoppers, at Christmas (end of December), Thanksgiving (End of November) and Easter (April). Thanksgiving and Easter are not on fixed dates so they are not in exactly the same point on the timeline but if you were to look up the actual dates for 2010/2011 you'll find these coincide perfectly. These are clear outliers and we can safely remove them from our data (or build a model that flags them correctly). Removing them further improves the model fit and now we can explain 87% of the data variation in sales from the day of week and day of month.
What does this mean to you? If your products are heavily impacted by hidden events like this, you may be losing substantial sales opportunities.
In this case the spike days are about twice the sales volume of other days and I have seen examples that are much more extreme than this. Even at 2 times normal sales it is likely that you did not have product on-shelf all the time during these spikes. It's very possible you do not really know how high the spike could be.
If the Retailer's forecasting system is based on weekly buckets (and I'm betting it is), it does not know or understand that this problem exists, it only sees random noise. But these spikes can be predicted with surprising accuracy as in the model above. If the retailer forecasting/replenishment system cannot be fixed you will need over-rides every month to ensure product is in place before the peak: specially created seasonal profiles; forecast-adjustments; script-orders; promotional-orders; stock policy exceptions etc. every month to place inventory at the store in advance of the peak.
If shelf planograms are built based on average sales expectations, there may be a need to increase facings for more volatile products or to add temporary positions for volatile product. (I love this analogy for the pain that can be involved in working with averages so I just had to include it again)
Finally, consider your promotional plans. You now know more clearly when shoppers are in the store spending money. Promotional calendars based on a weekly plan may have your promotion kicking off just after the peak shoppers have left for the month.
My takeAggregating point of sale data to weekly buckets, in the presence of patterns that repeat on a non-weekly basis, is a superb way of: hiding sales patterns from you; not keeping product on the shelf and losing sales.
Handling this properly means working with more data than you may be used to. At least 7 times the volume of handling weekly data; 8 times the volume if you must also capture the weekly snapshot.
If you are dealing with programs like SNAP and WIC which vary in their implementation across the country, you will also need store level detail and, as not all products are equally sensitive to any program, you will need item level data too. This can be a lot of data - 1000 products at 1000 stores for 3 years of daily data is just over 1 billion data points ( 1,000 * 1,000 * 1,095 ). You will not manage this is Excel or in Access other than for very small samples of data.
You need the right tools for the job (Data handling - the right tool for the job) a Demand Signal Repository (DSR) that can handle daily data or a very well optimized database AND enough of an analytic engine to run correlation/regression analysis against that mass of data. (Bringing your analytical guns to bear on Big Data ...).
If you're ready to get started - call me.