Microsoft Excel must be the most
widely used analytic tool available today but, it’s not the only tool in my tool-bag and it should not be the only one in yours either.
It’s perfect for building small models, has broad embedded functionality, good charting capabilities, pivot-tables (a
superb tool worthy of its own post), comes with a programming language so
you can extend its capability even further and almost everybody has a copy already.
It’s awesome and I use it every day.
But...the data we analyze is getting much bigger and a lot more complex.
Even with newer versions of Excel allowing over 1 million records in a sheet, what can you do usefully in Excel with 1 million records? Certainly, you don’t want to print it, it’s
near impossible to chart or model against, bottom line you are using the wrong
tool for the job. To do the job well you
need to find, learn and use the right tool.
Don’t believe me? Try chopping
down a tree with a hammer!
It may be tempting to aggregate your data to reduce data size and continue using Excel. Beware! When you aggregate (Sum, Average, etc.) you lose detail you might need. As an example, let's look at retailer point-of-sale data: the record of every purchase in every store:
- Aggregate individual shopper transactions into daily summaries (by product and store) and you can no longer see product associations in the basket or product substitution from basket to basket;
- Aggregate daily to weekly data and you will be blind to monthly sales patterns (which can be very extreme).
- Aggregate store to national data and you will be unable to see regional patterns or relate sales profiles to demographics by store.
At this point, most business analysts will open up Microsoft Access.
This can handle more data (up to 2 Gigabytes) and the complexity of
multiple tables better than excel does it in multiple worksheets. It also uses the same programming language so
there is less to learn. It’s a
very good tool when used appropriately and I have a lot of experience with it,
even designing custom data architectures and programming extensions to
drive faster performance from it.
It seems easy to go beyond 2 Gigabytes these
days but even before you reach 2GB you really should move to a more
industrial-strength database (SQL Server, Oracle, DB2, MySQL…) just to handle
the data volume effectively. As data
heads towards Terabytes and the data complexity increases you will need
much more knowledge about database design and programming to do it well.
The analysis tools are not nearly so easy to apply in these
environments as in Excel, requiring access to more high powered and analytical,
data mining and optimization tools (SAS, SPSS, ILOG) or, for simpler analytics
and my preference, a capable programmer that can embed the analytics directly
into your database applications so they are usable on demand.
I have recently been working on a 3 Terabyte database,
around 1500 times larger than the biggest Access database you can build and
substantially larger than most SQL Server databases. At this point, we are really talking ‘Big
Data’. In the analytical world you’ll
see this phrase ‘Big data’ a lot right now.
If, for example, you’re collecting high frequency sensor data for
weather modeling, or collecting data on for a web search engine, 3TB may not
seem that big, but by current CPG standards it certainly is and importantly,
its big enough that it becomes challenging to use regular, even
industrial-strength, database packages.
In this case (for Point of Sale data analysis) we use a
novel and application-specific, data-architecture and associated query engine
developed by Shiloh Technologies (www.shilohtech.com) on a SQL Server
database. The Shiloh application,
enhanced now by a variety of analytic tools handles the huge data volumes for
us, runs analysis, filters, summarizes and presents the data back to the user
in a volume that a mere human can manage (certainly less than 100,000 records,
10,000 would be better). And guess where
we put that data? Back into Excel of
course – it’s a truly phenomenal tool!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.