Better Business Analytics: April 2014

Next-Generation DSRs (multi-retailer)

This post continues my look at the Next Generation DSR. Demand Signal Repositories collect, clean, report-on and analyze Point of Sale data to help CPGs drive increased revenues and reduce costs.

Most CPG implementations of a DSR support just one retailer's POS data. OK before someone get's back to me with "but we have multiple retailers' POS data in our system", I'll clarify:

Having Walmart and Sam's Club data in the same DSR does not count (as the data comes from the same single source, RetailLink) and I bet you are still limited as to what you can report on across them.
If you have multiple-retailer's POS data set up in isolated databases using the same front-end... it does not count
If you have the data in the same database but without common data standards ... it does not count.
If you have the data in the same database but with no way to run analysis or reports across multiple retailers at once... it does not count.

So, yes, a number of CPGs have DSRs that support multi-retailer POS data sources, very, very few (if any?) have integrated that data into a single database with common data standards so they can report and analyze across multiple POS sources at the same time.

Does it matter? I think so, multi-retailer ability opens up big opportunities around promotional-effectiveness, assortment planning, supply-chain forecasting (demand sensing) and ease of use.

The right tools for (structured) BIG DATA handling - columnar, mpp and cloud - AWS Redshift

Today, I'm coming back a little closer to the series of promised posts on the Next Generation DSR to look at some benchmark results for the Amazon Redshift database. Some time ago I wrote a couple of quite popular posts on using columnar databases and faster (solid state) storage to dramatically (4100%) improve the speed of aggregation queries against large data sets. As data volumes even for ad-hoc analyses continue to grow though, I'm looking at other options.

Here's the scenario I've been working with: you are a business analyst charged with providing reporting and basic analytics on more data than you know how to handle - and you need to do it without the combined resources of your IT department being placed at your disposal.

Data Visualization - are pie-charts evil ?

I'll be speaking next week at the Supply Chain Management Conference at the University of Arkansas on how data-visualization enables action.

Good visualization is fairly easy, unfortunately, building bad visualizations that are hard to use, easy to misunderstand and that obscure and distort the data you are trying to present is even easier - many analysts can do it without trying to.

In honor of the event, I'm resurrecting a post I created a couple of years ago "Are pie charts evil or just misunderstood". I wrote this around the time I was moving away from a trial and error approach (and 20 years of trial and error effort does get you cleaner visuals) to attempting to understand why some visuals so clearly work better than others.

It turns out that there are some great frameworks to help in building better visuals. Join me next week and we'll talk about human graphical perception, chart junk and non-data ink.

Enjoy !

Data Visualization - enabling action

I'll be speaking next week at the Supply Chain Management Research Center Conference at the University of Arkansas on how data-visualization enables action.

The basic premise (and one I firmly believe) is that the hardest part of any analytic project is not defining the problem, doing the analytics or finding the "solution", it's enabling action. Far too many otherwise excellent analytic projects, tools and reports go unused because the results are presented in a way that is somewhere between difficult-to-understand and incomprehensible.

Next Generation DSRs - Scale Out !!

Last week, I posted my thoughts on how new technology enables a simpler and faster database to support your DSR applications.Next Generations DSRs (data handling). Over the next few posts I'll extend that idea to show how speed and simplicity are essential to your personal productivity, user experience and the ability to apply powerful analytic tools to your data.

In the meantime though I came across a great post in Rob Klopp's "Database Fog Blog" regarding Redshift. Redshift is Amazon's cloud-based, columnar, parallel database.

Remember that my interest in database technology is all about feeding my insatiable desire for data to drive value-added analytics, my own area of expertise. To that end, I have become adept in a number of programming languages and relational database systems and while I'm a lot better than "competent" I am not "expert". Rob clearly is an expert in this field and I will be following his posts carefully.

Here's a highlight from his post Thoughts on AWS Redshift:

... if you can add nodes and scale out to improve query response then why not throw hardware at performance problems rather than build a fragile infrastructure of aggregate tables, cubes, pre-joined/de-normalized marts, materialized views, indexes, etc. Each of these performance workarounds are both expensive to build and expensive to operate.

He goes on to talk about why scale-out has not been generally adopted and how Amazon Redshift changes the game by making it easy to acquire and release processing power on demand.

The answer does not have to be Redshift, perhaps it's Impala or Hekaton or... whatever. Bottom line for me is that new technology enables DSR's that are simpler and faster and that creates a fundamental shift in system capability.

FYI - I have done some DSR-scale testing with Redshift and the results were very impressive. More on that soon.

Next-Generation DSRs - data handling

This post continues my look at the Next Generation DSR. A DSR (Demand Signal Repository) holds data, typically Point of Sale data, and that data volume is big, not Google-search-engine big, but compared to a CPG's transaction systems, it's huge. Furthermore, the system is required to rapidly load large quantities of new data, clean it, tie it into known data dimensions and report against it in very limited time-frames.

But scale and performance needs aside, why have most (though not all) CPGs chosen to buy rather than build the capability? After all, it is primarily a business-intelligence/database application and most businesses run a number of them. One key reason is that it's challenging to get business reporting performance at this data scale from existing technology.

This post looks at how this problem gets solved today and how newer database technology can change that landscape.