tag:blogger.com,1999:blog-36632828541384697742024-03-13T04:55:20.591-07:00Better Business AnalyticsBusiness Analytics and Predictive Modeling for CPG/RetailAndrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.comBlogger67125tag:blogger.com,1999:blog-3663282854138469774.post-49922086105170427372014-09-29T09:19:00.001-07:002014-09-29T09:19:23.052-07:00Next Generation Point of Sale Analytics<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
<img alt="" class="left" data-loading-tracked="true" height="233" src="https://media.licdn.com/mpr/mpr/p/7/005/08c/08c/207a381.jpg" style="border: 0px; box-sizing: border-box; float: left; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; height: auto; line-height: inherit; margin: 0px 30px 15px 0px; max-width: 100%; outline: 0px; padding: 0px; vertical-align: baseline;" width="311" /><span style="font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit;">Over the last few months I've been exploring the features I want to see in a next generation platform for point of sale analytics: It's simpler, faster and cheaper, supports rapid blending of new data sources and is powered up with real analytic capability.</span><span style="font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit;"> </span><span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Looking back there are a lot of posts on this topic so here is a quick summary with links back to the detail.</span></div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"></em></div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><br /></em></div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><br /></em></div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Note</em></div>
<ul style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; list-style-image: initial; list-style-position: initial; margin: 0px 0px 30px 40px; outline: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><em style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">I have no immediate plans to build such a system for sale but I do use systems with many of these features for ad-hoc analytics as they are flexible yet relatively easy to set up and tear-down without incurring substantial overheads. Consider this series more of a manifesto/buyers-guide.</em></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><em style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">I do see changes in the marketplace suggesting that a number of DSR vendors are at least considering a move in this direction. As to which one will get there first, I think it will be whoever feels least weighed down by their existing architecture.</em></li>
</ul>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Database technology has moved on dramatically over the last few years. For this scale of data, analytic solutions should be columnar, parallel and (possibly) in memory. This enables speed, scalability and a simple data structure that makes it easy to hook up whatever analytic or BI tools you wish.</span></div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<ul style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; list-style-image: initial; list-style-position: initial; margin: 0px 0px 30px 40px; outline: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140414152519-11894214-next-generation-dsrs-data-handling?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Data handling</a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140512155142-11894214-next-generation-dsrs-it-s-all-about-speed?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">It's all about speed</a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140414153001-11894214-next-generation-dsrs-data-handling-update?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Scale out</a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140421142634-11894214-the-right-tools-for-structured-big-data-handling-columnar-mpp-and-cloud-aws-redshift?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Structured BIG DATA <span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">handling with RedShift</span></a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140519135814-11894214-the-right-tools-for-structured-big-data-handling-more-redshift?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">more Redshift</a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140812210326-11894214-next-generation-dsrs-analytic-freedom?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Analytic Freedom</a></li>
</ul>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">If the only data you have in the system is pos sales for a single retailer, you can build a reporting system ("what sold well last week") but you will struggle to understand <em style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">why</em> sales change. Bringing in other data sources: multi-retailer, demographics, weather information, promotional calendars, competitor activity, socio-economic trends, Google trends, social media, etc. allow for much more insighful analtyics. It's not easy to do this though, particularly if your source database is locked down so that it takes a software engineer to add tables</span></div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<ul style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; list-style-image: initial; list-style-position: initial; margin: 0px 0px 30px 40px; outline: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140428142136-11894214-next-generation-dsrs-true-multi-retailer-capability?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">True multi-retailer capability</a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140630133548-11894214-next-generation-dsrs-data-blending?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Data blending</a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140707144829-11894214-next-generation-dsrs-data-blending-part-2?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Data blending (part 2)</a></li>
</ul>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
The term "Analytics" in general use covers a lot of activities most of which involve little more than reporting. In some instances you can slice and dice your way through a dataset to find insight, reporting is not without value but it's not analytics. Not even close.</div>
<ul style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; list-style-image: initial; list-style-position: initial; margin: 0px 0px 30px 40px; outline: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140728184532-11894214-next-generation-dsr-reporting-is-not-analytics?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Reporting is NOT Analytics</a></li>
</ul>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
Can you buy good analytics? Yes, but there are also a number of pseudo-analytic solutions in the market that have little to no analytic power - caveat emptor!</div>
<ul style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; list-style-image: initial; list-style-position: initial; margin: 0px 0px 30px 40px; outline: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140804163037-11894214-next-generation-dsrs-an-analytic-name-is-not-enough?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">An Analytic name is not enough</a></li>
</ul>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
To get to real, deep insights you need real analytic tools. Depending on the taxonomy you are used to, we are talking about <span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">predictive and prescriptive analytics,</span><span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">machine learning, </span><span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">statistics, </span><span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">optimization or </span><span style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">data science. Most of these tools are not new but they are not generally found in standard BI offerings and even when they are (e.g. reporting level R integration) you may struggle to apply the analytic tools at scale.</span></div>
<ul style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; list-style-image: initial; list-style-position: initial; margin: 0px 0px 30px 40px; outline: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140730161215-11894214-next-generation-dsrs-analytic-power?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Analytic Power</a></li>
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140820152734-11894214-next-generation-dsrs-bring-the-analytics-to-the-data?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Bring the analytics to the data</a></li>
</ul>
<div style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 30px; outline: 0px; padding: 0px; vertical-align: baseline;">
Finally, whether you build your own analytic tools or buy them in to run on your platform, clever math is not enough. If a user cannot comprehend the tool or it's suggestions due to poor user interface design and /or bad visualization choices it's worth precisely ... squat.</div>
<ul style="border: 0px; box-sizing: border-box; color: #4d4f51; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; list-style-image: initial; list-style-position: initial; margin: 0px 0px 30px 40px; outline: 0px; padding: 0px; vertical-align: baseline;">
<li style="border: 0px; box-sizing: border-box; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 0px 5px; outline: 0px; padding: 0px; vertical-align: baseline;"><a href="https://www.linkedin.com/today/post/article/20140825153205-11894214-analytics-is-for-everyone?trk=mp-author-card" style="border: 0px; box-sizing: border-box; color: #7b539d; font-family: inherit; font-size: 15.5555562973022px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Analytics are for everyone</a></li>
</ul>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com1tag:blogger.com,1999:blog-3663282854138469774.post-2457152940449157232014-08-25T08:39:00.000-07:002014-08-25T11:13:08.039-07:00Analytics are for everyone !<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Analytics are for everyone! Well, not building analytics, no. That needs a high level of expertise in statistics, machine-learning, optimization, programming, database skills, a healthy does of domain knowledge for the problem being addressed and a pretty wide masochistic streak too.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Using analytics, now that is for everyone, or at least it should be. <span style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">We all use analytics, and, I think, the best examples, we use without thinking about just how complex it is.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><img alt="" class="left" src="http://m.c.lnkd.licdn.com/mpr/mpr/p/5/005/080/370/0a50009.jpg" height="165" style="border: 0px; float: left; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline;" width="185" />Is there anyone out there that hasn't used an electronic mapping service (GPS) for directions? Even ignoring the electronics, these are remarkable pieces of engineering! An extensive, detailed database of road systems and advanced routing analytics to help you find the best route from A to B without sending you backwards down one-way roads or across half-finished bridges.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Perhaps you're thinking it's not that hard? Could you build it? What if I got the data for you? No? <span style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">But you can use it right? T</span><span style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">hey are not perfect, mostly I think because of data cleanliness problems, but they are close enough that I don't travel far from home without one.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
More examples. Anyone used a search engine? How about an on-line weather forecast? How about web-sites that predict house-values? Recommendation engines like those used by Amazon and Netflix? All heavy analytic cores wrapped in an easy to consume, highly usable front-end.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
These are, I think, among the exceptions in analytic applications - good analytics AND good delivery.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
I talked about pseudo-analytics in a recent <a href="https://www.linkedin.com/today/post/article/20140804163037-11894214-next-generation-dsrs-an-analytic-name-is-not-enough?trk=mp-edit-rr-posts" style="border: 0px; color: #7b539d; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">post</a><span style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">: shams with no basis in science wrapped in a User Interface with the hope that nobody asks too many questions about what's under the hood. This is not good analytics.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Unfortunately even good analytic tools get under-used if they have not been made accessible to the poor people that have to use them. Spreadsheet tools probably top the list for unusable analytic applications: unusable that is by anyone except the person that wrote them. Sadly though, I have seen many examples both in reporting and applications where so little effort was put in to User Experience that any good analytics is almost completely obscured.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Building new analytic capability is a highly skilled job. Delivering analytic results in an easy to consume format so that it gets used is also a highly skilled and, frankly, often forgotten step in the process. After all we do build analytic tools so that they get used. Don't we? Sometime I wonder.</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-28938216186884468832014-08-25T08:35:00.004-07:002014-08-25T08:36:10.997-07:00Next Generation DSRs - Bring the Analytics to the data<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px;"></span><br />
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<a href="http://m.c.lnkd.licdn.com/mpr/mpr/p/7/005/078/118/3c936fa.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img alt="" border="0" class="left" src="http://m.c.lnkd.licdn.com/mpr/mpr/p/7/005/078/118/3c936fa.jpg" height="175" style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-top: 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline;" width="206" /></a>Under old world analytics, you move data from the DSR to your analytic server, build models, then write results (sometimes models too) back out for integration into the DSR.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="font-size: 15.555556297302246px;">Now, consider this:</span></div>
<ul style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; outline: 0px; padding: 0px 0px 0px 3em; vertical-align: baseline;">
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">DSR datasets are often enormous.</strong> (2 years of data for a DSR I worked with recently input to a model was approx. 270 GB)</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Analytic tools are small. </strong>(The R base software, all 150 packages I have installed and the development environment is 625 MB)</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Analytic models are tiny. </strong>(Expressing a 10 component regression model in SQL, just 288 bytes and most of that is down to variable names)</li>
</ul>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Let's try that visually.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<img alt="" class="left" src="http://m.c.lnkd.licdn.com/mpr/mpr/p/6/005/078/104/399ec26.jpg" style="border: 0px; float: left; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline;" /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
The input data is huge, everything needed to run R (my analytics tool of choice) is barely a blip on the scale and the resulting model can't be seen on this scale at all. And today we move the DSR data to the analytic server to run the analytics.... anyone else having an issue with this ?</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Where the data is small enough that we can pull what we need via query over an ODBC connection and hold it in memory to run the analytics, perhaps you can live with the network overhead.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Similarly, if the DSR and analytic servers are co-located with a big fat data pipe connecting them, it doesn't matter so much. It's not same machine I'm after necessarily, but same rack would be nice.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
What happens though, when the data is too big and the connection too slow (think wide area network) to be feasible? Now we need to build database structures on the analytic server, load the data (taking a copy), and if we are to re-run the analytics routinely, keep it in sync with the source on an ongoing basis. This is a lot of (non-analytic) maintenance work before we can even get started on the analytics.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
So why do we do this?</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">"The analytic server is a high power, high memory machine great for analytics!"</em> That's true but chances are your database servers have the same thing.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
There are also valid concerns around how an analytic tool connecting directly to a database may impact other users. I do have a little sympathy for this, certainly much more than I used to, but think on this: a DSR is not a mission critical system. The failure of a mission-critical systems stops your business. If the DSR stops (and the chances are very good that you will have no issue at all), your reports are a bit late. Relax !</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
I have a suspicion that some of this is related to licensing. If you pay a small fortune for your analytic tools and they are priced per server, per CPU or per core, I can see why you would not want to go installing that software everywhere you might want to use it. Cheaper perhaps to bring the data to the software. Working with free open-source tools, it's not been an issue for me to install co-located or even on the same machine as needed.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Recently a number of database and BI vendors have moved to integrate analytic tools (often R, sometimes SAS) into their offerings trying to deliver real in-database analytics. I do think this is a great direction to move in though I have some concerns about the level of integration currently available. see my post on <a href="https://www.linkedin.com/today/post/article/20140730161215-11894214-next-generation-dsrs-analytic-power?trk=mp-edit-rr-posts" style="border: 0px; color: #7b539d; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Analytic Power !</a> <span style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">for more details.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Even if you can't execute true in-database analytics (which should be a Next Generation feature) there are still things you should be able to do to<strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"> bring the analytics to the data.</strong></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">First let's make a distinction between model-building</strong> (the act of creating new models from data) and <strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">model-scoring </strong>(running existing models against new data to make new predictions). All predictive analytic models I can think of can have this same split. (Descriptive and Prescriptive analytics do not)</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Model-building</strong> is an intensive task, this is where all the heavy lifting happens in analytic work so processing and memory needs can be substantial though this varies widely depending on the analytic method and to some extent the implementation. If you have installed analytic tools directly on your database servers this may be enough to cause something of a slow-down. OK - try to co-locate instead. If you absolutely must replicate data to an analytic server on the other side of the world and try to keep your data in sync, I pity you.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Model-scoring</strong> is fast. A model is just a set of simple calculations. Deciding exactly what simple calculations you needed was the job of model-building but now you have done that, scoring new data against that model is quick.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
This is what the result of a simple regression model looks like (in SQL):</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
[Variable_1] *-49.8916 + [Variable_2] *-24.2773 + [Variable_3] *-48.1305 + [Variable_4] -253.7238 + [Variable_5] *-20.7173 + [Variable_6] *17.722 + [Variable_7] *12.9865 + [Variable_8] *-17.4036 + [Variable_9] *2.2738 + [Variable_10] *-7.9186 + 6.668 AS Prediction</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
If you think it looks complex, look again, it's just a set of input variables multiplied by specific weights (as found by model-building) and then added together. This is easy work for the database. More complicated models will have more complex expressions, you may see logs, exponents, trig., perhaps an if..then..else statement. Nothing the database will find difficult to execute if it's expressed in the right language.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Unless models change with every input of new data (and so need re-building) there is no excuse not to score the model directly against the data.</strong> How you execute the model scoring is a different question and you have some options:</div>
<ul style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; outline: 0px; padding: 0px 0px 0px 3em; vertical-align: baseline;">
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">you may load the model, new data and score directly in your analytic tools. This is using a sledgehammer to crack a nut, but it's easy to do if a little heavyweight/slow.</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">for simpler models converting the model into SQL is not that difficult (though you do need to know SQL pretty well and have permission to build it into the database as a view, stored procedure or user defined function. This is probably the most difficult but fastest to execute.</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">try converting the model to PMML (predictive model markup language) and use a server based tool designed to execute PMML against your database. (Many analytic tools have an option to export models as PMML.) A PMML enabled DSR would be a great enhancement for the Next Generation.</li>
</ul>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Bring your analytics to the data ,</strong> spend more time doing analytics and less data time wrangling.</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-25540195318784479732014-08-12T14:10:00.001-07:002014-08-12T14:28:50.218-07:00Next Generation DSRs - Analytic Freedom !<img alt="" class="left" src="http://m.c.lnkd.licdn.com/mpr/mpr/p/7/005/078/0fa/078be3c.jpg" height="227" style="background-color: white; border: 0px; color: #333333; float: left; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline;" width="302" /><span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px;"></span><br />
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Current Demand Signal Repositories don't play well with others. Their data is locked away behind layers of security and you can only access it through the shackles of their chosen front-end for reporting. There is no good way to get that rich dataset into other tools: you have to copy it into a new database and new data structures. (In some cases you may have to do this twice, once to rearrange the data from the DSR into a format you can understand, then again to match the data structure needs of the downstream tool.)</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
For small-scale models (do we do those anymore?) that sip data from the original repository you can do this through the reporting engine and live with the pain, for large scale modeling it's really not an option.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">I want freedom. Freedom to analyze with whatever tools I need</strong>: The freedom to report in Business Objects, visualize in Tableau, analyze in R and run existing applications (order-forecasting, master-data-checking, clustering, assortment optimization, etc.) directly against this data. <em style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">(I'm not endorsing any of these tools and you can replace the named software above with anything you deem relevant - that's kind of the point).</em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Much of this freedom comes from a simplified data model, enabled by new database technologies (massively parallel processing, scale-out, in-memory and columnar). See more details at <a href="https://www.linkedin.com/today/post/article/20140414152519-11894214-next-generation-dsrs-data-handling?trk=mp-author-card" style="border: 0px; color: #7b539d; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">data handling</a>.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
It also needs a security model that is handled by the database NOT the reporting layer or as soon as you get to the underlying data you can see lot's of interesting things you shouldn't :-)</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
I suppose I could live with a little less freedom if a DSR offered all the tools I need but I don't think that's realistic. Not all DSR reporting layers are equal, data visualization is hit and miss, and as I posted in <a href="https://www.linkedin.com/today/post/article/20140804163037-11894214-next-generation-dsrs-an-analytic-name-is-not-enough?trk=mp-author-card" style="border: 0px; color: #7b539d; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">An Analytic name is not enough</a> while there are some good DSR based analytic applications you will find many use pseudo-analytics and some have no analytic basis at all.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Do you think, perhaps, that the Next Generation DSR will provide the best reporting, visualization and analytic tools available? Sorry, I don't think so. DSRs cover a dizzying array of analytic need and developing robust, flexible analytic applications, even assuming easy access to the data, is an expensive proposition for any DSR vendor to do alone. I anticipate a few strong analytic "flag-ship" tools will emerge alongside more me-too/check-the-box applications packed with pseudo-analytics.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
So, what can the Next Generation DSR do to help?</div>
<ul style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; outline: 0px; padding: 0px 0px 0px 3em; vertical-align: baseline;">
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">make it (much) easier to get at the data in large quantities,</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">make it (much) easier to bring analytics to bear on that data. (Perhaps with an integrated analytic toolset)</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">open up the system to whatever analytic tools work best for you</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">make it easy for other software vendors to provide add-in analytics on the DSR data/analytics platform.</strong></li>
</ul>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Think about that last point for a moment, no DSR vendor is big enough to provide state of the art analytic applications in all areas, but make it easy enough to integrate with and it could enable specialist analytics vendors to offer their tools as add-ins to the platform. (This could be good news for the analytics vendor too, it removes the need for them to install and maintain their own DSR just to enable the analytics)</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Let's look at an example.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Today if you want assortment-optimization capability, you can</div>
<ul style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; outline: 0px; padding: 0px 0px 0px 3em; vertical-align: baseline;">
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">wait for your DSR vendor to develop it and hope they use real analytics; or</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">search for another solution and work to interface the (very large) quantities of data you need between the applications;</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">write your own (always fun, but you had better know what you are doing) and you will still need to interface the data.</li>
<li style="border: 0px; font-family: inherit; font-size: 15.555556297302246px; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">decide not to bother</li>
</ul>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
All but the last one of these are slow - I'm guessing 12 months plus.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15.555556297302246px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
In the NextGen world, if you want to new analytic capability, you could still write your own, it's easy to hook up the analytic engine, or, just go to the DSR's analytic market-place and shop for it.</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-58646763474359840482014-08-04T07:10:00.000-07:002014-08-04T07:10:00.399-07:00Next Generation DSRs - An Analytic name is not enough<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
You need not always build your analytic tools, sometimes you should buy in. If the chosen application does what you need that often makes good economic sense... as long as you know what you are buying.</div>
<h4 style="color: #333333; font-family: Arial, sans-serif; font-size: 1.17em; font-weight: normal; line-height: 21.059999465942383px; margin: 1.33em 0px;">
Let's be clear, an Analytic name does NOT mean there are any real Analytics under the hood.</h4>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<img alt="" class="left" data-mce-src="http://m.c.lnkd.licdn.com/mpr/mpr/p/8/005/077/3c0/08ec692.jpg" height="218" src="http://m.c.lnkd.licdn.com/mpr/mpr/p/8/005/077/3c0/08ec692.jpg" style="float: left; margin-bottom: 1em; margin-right: 1em; max-width: 606px;" width="320" /></div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
For many managers, Analytics is akin to magic. They do not know how an analytics application works in a meaningful way and have no real interest in knowing. At the same time, there is no business standard for what makes up "forecasting", "inventory optimization", "cluster analysis", "pricing analysis", "shopper analytics", "like products" or even (my favorite) "optimization". <strong>Don't buy a lemon!</strong></div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<strong></strong></div>
<a name='more'></a><strong>In the worst examples, there is nothing under the hood at all.</strong> One promotion-analytic tool I came across recently proudly proclaimed that you (the user) could calculate the baseline and lift for each promotion however you saw fit and then just enter the result into their system. They presented this as a positive feature, but calculating a meaningful baseline and lift is the difficult part!!<br />
<br />
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
I've seen similar approaches for:</div>
<ul style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; padding-left: 3em;">
<li>off-shelf alerting tools that ask you how long of a period of zero sales is abnormal (so they can report exceptions)</li>
<li>supply chain systems that need you to enter safety-stocks or re-order-points (so they can figure out when to order). </li>
<li>assortment optimization tools that want you to input product substitution rates.</li>
</ul>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
Hmmm, is a car without an engine still a car?</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<strong>Many applications use pseudo-analytics.</strong> After all, how hard can it be? "cluster analysis" , that's finding groups of things right? I reckon I can figure that out, no stats required. Yeah, right, of course you can... FYI - meaningful, useful clusters may be a little more difficult. It's not that cluster analysis is particularly hard, but neither is it something you can knock together without the right tools or any statistical understanding.</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
Sadly, I have seen real world examples of pseudo-analytics too in pricing analytics, off-shelf alerting, demographic analyses, inventory optimization and forecasting.</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<strong>The right tool for the right job. </strong>There are many good analytic applications available, but you can still make it useless if it does not suit the task you have in mind. Using a time-bucket oriented optimization program to schedule production runs with sequencing comes to mind. OK, relatively few people are going to understand that one and it's not a DSR application, but it is real, the software vendor did not come out shouting that there would be a problem and 2 years down the line that project was abandoned.</div>
<h4 style="color: #333333; font-family: Arial, sans-serif; font-size: 1.17em; font-weight: normal; line-height: 21.059999465942383px; margin: 1.33em 0px;">
Are DSRs worse than other applications?</h4>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
I think this kind of feature-optimism, is a general issue in buying any analytic app but my perception is that it is a bigger problem in the DSR space. Perhaps because the DSR is trying to offer so much analytic functionality to so many functional areas? Is a DSR really going to handle forecasting, pricing-analytics, cluster-analysis, weather-sensitivity-modeling, promotional analytics, inventory optimization, assortment selection and demographic analysis (note - not a complete list), all as packaged software, for $50K a year? Not unless they can scale that investment across a huge user-base. Some will be good, others not so much - be warned. </div>
<h4 style="color: #333333; font-family: Arial, sans-serif; font-size: 1.17em; font-weight: normal; line-height: 21.059999465942383px; margin: 1.33em 0px;">
Spotting a lemon</h4>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
An expert in the field (with analytic and domain knowledge) can spot a lemon from quite a distance. If you do not possess one you would be wise to invest in some consulting to bolster your purchasing team. For those applications that pass the sniff-test, the proof of any analytic system is in it's performance. <strong>Define rational performance criteria, test, validate, pilot and never, ever, ever rely on a software vendor ticking the box in your RFP.</strong></div>
<div>
<strong><br /></strong></div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-33626480083396413762014-07-30T06:10:00.000-07:002014-07-30T09:12:59.004-07:00Next Generation DSRs - Analytic power !<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
<img alt="" class="left" data-mce-src="http://m.c.lnkd.licdn.com/mpr/mpr/p/8/005/077/39e/1f048dc.jpg" height="189" src="http://m.c.lnkd.licdn.com/mpr/mpr/p/8/005/077/39e/1f048dc.jpg" style="float: left; margin-bottom: 1em; margin-right: 1em; max-width: 606px;" width="284" />To handle real Analytics (see my recent post <a data-mce-href="https://www.linkedin.com/today/post/article/20140728184532-11894214-next-generation-dsr-reporting-is-not-analytics" href="https://www.linkedin.com/today/post/article/20140728184532-11894214-next-generation-dsr-reporting-is-not-analytics" style="color: #006699; outline: medium;" target="_blank">Reporting is NOT Analytics</a>) you need real Analytic power. BI tools are based on the language they use to interrogate the database (typically SQL) and with no library of analytic tools - it's not nearly enough.</div>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
<br /></div>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
</div>
<a name='more'></a><br />
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
We use <a data-mce-href="http://en.wikipedia.org/wiki/SQL" href="http://en.wikipedia.org/wiki/SQL" rel="nofollow" style="color: #006699; outline: medium;" target="_blank">SQL </a>(Structured Query Language) to query relational databases like SQLServer, Oracle, MySQL and Access. SQL is a great tool for handling large quantities of data, joining tables, filtering results and aggregating data. However, SQL's math library is only sufficient for accounting (sum, product, division, count) and while I do know it can do a few more things, it's not enough to be useful for Analytics. Even getting it to <a data-mce-href="http://andrewg-crabtreeanalytics.blogspot.com/2012/04/bringing-your-analytical-guns-to-bear.html" href="http://andrewg-crabtreeanalytics.blogspot.com/2012/04/bringing-your-analytical-guns-to-bear.html" rel="nofollow" style="color: #006699; outline: medium;" target="_blank">calculate a simple correlation-coefficient</a> is a big challenge. Want to build a simple regression model? That's just not going to happen in base SQL, we need something designed for the task.</div>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
R, SAS, SPSS, Statistica, and a good number of others, are the real deal and the difference between any of them and what you can do in SQL (or Excel) is vast! With these tools it's no longer a question of "can you build a regression model?" now it's "which particular flavor of regression do you need?". What! There's more than one? Oh yeah!</div>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
I'm not getting into which analytic tool is the best. I use R, and that's what I'll talk to, but I have good friends, analytic-powerhouses who insist on using SAS or SPSS. These tools have different strengths and weaknesses and within the analytic community a lot of time, blog posts and misinformation go into arguing the relative merits of one vs. another. My take is that for most business-analytic purposes any of them will get the job done. The one you choose should be driven most heavily by <strong>your ability to get the analytic tool working against your data</strong>.</div>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
The problem is that these analytic tools do not generally reside in the same space as your database or BI tool, so you spend a lot of time interfacing data between systems. It's slow, sometimes very slow, and requires replication in your resources.</div>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
In recent years many database and BI tools have started offering integration with statistical tools (Oracle, SAP Hana, Tableau, Spotfire, MicroStrategy). The ideal here is in-database analytics where we run the complex stats in-tandem, indeed in the same memory space as the database. That is very attractive but I would look very carefully at the depth of integration offered before getting too excited. In some cases, I think, vendors have done just enough to tick the box without making it truly useful. As examples:</div>
<ul style="font-size: 15.555556297302246px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; padding-left: 3em;">
<li>One vendor limits the transfer of data between database and R to simple table structures. Now, imagine running a regression model. What goes into the regression is very likely a simple table - check! What comes out is anything but: it's a complex object combining multiple tables of different dimensionality and named values (like r-sq). We need this data to determine the validity of the model and make future predictions. Force me to return just one table structure and I must throw most of the information and capability away. Before anyone asks, no, this is not unique to regression models.</li>
<li>Another vendor has integrated R into the reporting layer. This is relatively functional as long as the data you want to work with can be generated in a report. If you need very large amounts of input data you may well exceed reporting limits. If you want to build a separate model for each product in your database, you may have to run the report separately for each one.</li>
<li>Standard R was not originally designed for parallel execution (though you can get around this with a little coding help). Current processors (CPUs) even on low-level laptops are multi-cored. Servers routinely run more cores per CPU, more CPUs per server and we want to scale-out across multiple servers. A BI offering that only offers single core R execution is wasting your resources and time.</li>
</ul>
<div style="font-size: 15.555556297302246px; margin-bottom: 1em; margin-top: 1em;">
<strong>Bottom line, to do real Analytics, you need real Analytic tools. But even the best tools must be able to get at the data to be useful. Choose carefully,</strong></div>
</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-57945648838898669862014-07-28T11:52:00.000-07:002014-07-28T11:52:26.678-07:00Next Generation DSRs - Reporting is NOT analytics<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6iX397xuI91zYA04h2bmKnoNbJfaZTfcjVYx-HVWEdJ_Vmpvx00mzJZc8-FgSmMwF9kyIlE9HG1DRfpFZAjZgyMgq-dsjEe8Y3P7NGqp9Gdgay9CdlgwJr-BnjRe0pkJh987dVa-jZGI/s1600/problem-analysis-solution.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6iX397xuI91zYA04h2bmKnoNbJfaZTfcjVYx-HVWEdJ_Vmpvx00mzJZc8-FgSmMwF9kyIlE9HG1DRfpFZAjZgyMgq-dsjEe8Y3P7NGqp9Gdgay9CdlgwJr-BnjRe0pkJh987dVa-jZGI/s1600/problem-analysis-solution.jpg" height="197" width="320" /></a></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
I've written a number of posts now on the next generation of Demand Signal Repositories. DSRs are the specialized database and reporting tools primarily used by CPGs for retail Point of Sale data.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
So far, I've looked at the challenges (and big opportunities) around handling the large quantities of data involved: better database technologies, scale-out platforms, true multi-retailer environments, effective data blending and dramatic simplification of data structures.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Taken as a whole this get's the necessary data into one place where it is relatively simple to overlay it with the BI or analytic tools of your choice and still get good performance. This is the starting point.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Now, we can get to the fun stuff, Analytics. Let's start by addressing a widespread misunderstanding</div>
<h3 style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 1.5em; font-weight: normal; line-height: 27px; margin: 1em 0px; outline: 0px; padding: 0px; vertical-align: baseline;">
Reporting is NOT analytics<a name='more'></a></h3>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
I've blogged on this <a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/04/reporting-is-not-analysis.html" rel="nofollow" style="border: 0px; color: #7b539d; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">before</a>, actually one of my very first blog posts, but it bears repeating and extending from the original</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Reporting is about <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">"what happened"</em>; Analytics is concerned with <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">"why?"</em>, <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">"what if?"</em>and <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">"what's best?"</em></strong><span style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">.</strong></span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">You need reports.</strong> Hopefully they are well constructed, with appropriate metrics, good visualization and exception highlighting. Perhaps they are also interactive so you can drill-down, pivot and filter. These are useful tools for exploratory "what happened" work, but, almost exclusively, reports leave it up to the reader to construct the "why".</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Great reporting can pull together facts that you think are related for visual inspection (e.g. weekly temperature and ice-cream sales by region). Perhaps you can see a pattern, sort of, but reports will not quantify or test the validity of the pattern that's up to you, the reader, to guess at.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Even great reports can't help you much with more complex relationships. In reality, ice-cream sales are also dependent on rainfall, pricing, promotions, competitor activity etc. Who knew? Well we all did of course, but there is no reasonable way to visualize this in a standard report. Want to predict sales next week given weather, price and promo data for all products in all regions? Your going to need some good analytics.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">You need Analytics too</strong>. In some cases, basic, high-school, math is all you need. In most, it doesn't even get you close to the 80% solution beloved of business managers. </span>"Winging it" in Excel, Access, PowerPivot etc. can give you very bad answers that are seriously dangerous to your success and/or employment.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Want to understand and predict the impact to sales of promotions, pricing or weather events? You need Analytics for that.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Wan't to know where you can safely reduce inventory in your supply chain while increasing service level? You need Analytics.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Wan't to alert when sales of your product are abnormally low? Analytics!</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Want to know how rationalizing products across retailers would impact your supply chain? Yep, Analytics.</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<span style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Want to know which shopper demographics are most predictive of sales velocity? I think you get it...</span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">If your business question is something other than "what happened" you need Analytics.</strong></div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-82195785027424893882014-07-07T08:06:00.000-07:002014-07-07T08:06:45.865-07:00Next Generation DSRs - data blending (part 2)<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">My most recent post on<a href="http://andrewg-crabtreeanalytics.blogspot.com/search/label/Next%20Generation%20DSR" target="_blank"> Demand Signal Repositories</a> bemoaned their general lack of ability to rapidly ingest new and interesting data sources</strong> (e.g.: promotions, Twitter feeds, Sentiment analysis, Google trends, Shipment history, master data, geographic features, proximity to competitor stores, demographic profiles, economic time series, exchange rates etc.).</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<img alt="" class="left" height="209" src="https://media.licdn.com/mpr/mpr/p/7/005/071/052/23b4256.jpg" style="border: 0px; float: left; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline;" width="167" /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
As a result, analysts spend far too much time collecting/copying data into ad-hoc data marts to enable useful modeling work. At the extreme, you can move Terabytes of low-level data from a perfectly good database into another one (probably on a lower powered machine) so as to manually merge it with a few hundred records of new data you need for analysis. This is slow (actually very slow), error prone and leaves very little time to do valued added work.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Based on questions from blog readers via email, I think that I failed to call out how big the gap is between where we are now and where we should be. Let me spell it out:</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
If I go to my (or your) IS department now and ask <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">"how long would it take to integrate XXX data into the DSR so it is loaded, cleaned, gap-filled, matched to appropriate dimensions and ready for some interesting analytic work."</strong></em> I would expect to hear back <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">"<strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">between 6 and 12 months" </strong></em>and that's assuming they have both some developer availability and the necessary access to add/modify data stuctures - some DSRs are locked down tight. If I went to the DSR vendor, it may be a little faster, depending on just how tightly the data structure is tied into their product release schedule. But here's the thing - <strong style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">I want to do this, myself, in real-time and certainly in less than a day.</strong></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Tools such as <a href="http://www.alteryx.com/" rel="nofollow" style="border: 0px; color: #006699; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Alteryx </a>are designed to do data blending. Alteryx in particular, seems to do especially well handling geo/demographical data, some of which comes with it as standard. They also have a number of pre-defined macros to help you get at standard data source like Google Trends and Twitter. If I understand it correctly, it does this by loading all data sources into memory. Perhaps it constructs it's own data repository on the fly, but, certainly, it does not touch the source database's data structure at all.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
This would work well for relatively small quantities of data. Let's say you are examining annual sales for a product group by store - you aggregate that to a few thousand records of POS data in the DSR, load it into Alteryx, geocode the locations, match up the geo/demographic data you are interested in and you are ready to run some math. I doubt that would take more than a couple of hours. There is also some analytic power in the platform and at least some level of R integration if you wish to extend it further. For ad-hoc small (sub 10 million record?) data analytics this looks really good.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
What if you want to do your modeling at a much lower level of detail though? Do you have the capacity to match across billions of records outside the DSR? Perhaps, but it's going to cost you and why move it all into another database on another expensive server when you've already paid for those in your DSR? What if you to run analytics repeatedly, do you really want to do geocoding and ad-hoc matching every time you want to use census data in an analysis? Chances are the stores haven't moved :-) and the most frequently updated census data, I think, isn't updated any more often than annually.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Better to do it once, load it into new data structures in the DSR and enable it for ongoing reporting/analytics or... did you want to force even basic reporting through the data blending platform because that's the only place you can match up multiple data sources ? I didn't think so.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
If would definitely look at something like Alteryx for ad-hoc work. If you can also use it to source, transform, handle dimensional matching, deal with missing data etc. and load the results back into your DSR (where you just defined new data structures to receive it), I think you might have something.</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-46057216110918990102014-06-30T06:33:00.000-07:002014-06-30T06:33:04.604-07:00Next Generation DSRs - data blending<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
Over the last few months I've written a series of posts on Demand Signal Repositories. These are the specialized database and reporting tools primarily used by CPGs for reporting against retail Point of Sale data. </div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
There are a number of good tools in the market-place and you can derive substantial value from them today but the competitive landscape is changing...fast. Existing tools found a market because they are capable of sourcing, loading and reporting against vast amounts of data quickly. To do so they have employed a variety of complicated architectures that are now largely obsolete with recent advances in technology that can make solutions: faster, cheaper and more flexible.</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
Cheaper alone may be a win in the market today, but if all we do with this new power is report on "what I sold last week" more quickly and at a lower price-point I think we are missing the point. </div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
The promise of a DSR has always been to explain what happened but much more importantly <em>why</em> and existing tools struggle with this:</div>
<ul style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; padding-left: 3em;">
<li>they do not hold a rich enough repository of data to test out hypotheses.</li>
<li>their primary analytic tools are report-writers and pivot-tables (by which I mean that they really don't have any)</li>
</ul>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<b>We'll come to analytics in a later post, but for now let's think data because without that there isn't very much to analyze.</b></div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
Imagine that I've spent a few hundred thousand acquiring point of sale data into my own DSR and now I want to really figure out what it is that drives my sales. </div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
How about weather. Ignore for the moment whether or not a future forecast is useful, but how about using weather data to explain some of the strange sales in history so that I don't trend them forwards into the coming year? I can get very detailed weather data from a number of sources, but can I, a system user, get that data into my DSR to start reporting against it and better yet, modeling? Probably not</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
How about SNAP, the US government 's benefit program that funds grocery purchases for roughly 1 in 6 US households? SNAP can drive huge spikes in demand for key products and I can easily go to <a data-mce-href="http://www.fns.usda.gov/snap/snap-monthly-benefit-issuance-schedule" href="http://www.fns.usda.gov/snap/snap-monthly-benefit-issuance-schedule" rel="nofollow" style="color: #006699; outline: medium;">usda.gov</a> and find out exactly when SNAP dollars are dropped into the marketplace by day of the month and by state. With a little time on Google I can even see when this schedule has changed in the past few years. Can I, a system user, get this data into the DSR for reporting/modeling? Nope.</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
The same is true for many additional data sources you wish to work with (Promotional records, Twitter feeds, Sentiment analysis, Google trends, Shipment history, master data, geographic features, proximity to competitor stores, demographic profiles, economic time series, exchange rates etc.). </div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
These are all relatively easy to source datasets but if the DSR vendor has not set it up as part of the standard product, you are out of luck: the technical sophistication necessary to source, load and , especially, match key fields data is beyond what a super-user, and in many cases, a system administrator can handle. Can it be done? Maybe, depending on your system, skill-level and security-access, but it's going to cost you in time and money. <br />
<br />
Matching data in particular can be a real bear - it will be rare that you are matching products at the same level of granularity (item, location, date) and with the exact same key fields. Far more common to be matching weekly or monthly data to daily, state or county data to zip-codes and product groups to shoppable items. And do it without losing any data, sensibly handling missing data and flagging suspect data for manual follow-up.</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<b>So if you really want to do some analysis against e.g. SNAP what must you do? Download a small ocean of detailed POS data so you can (carefully) join it to your few hundred records of SNAP release data in a custom database or analytic app, build the models and then (because you can't write the results back out to the DSR) build a custom reporting engine against these results. This makes no sense to me</b>. </div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
<img alt="" class="left" data-mce-src="https://media.licdn.com/mpr/mpr/p/8/005/06d/3a2/32f51dd.jpg" src="https://media.licdn.com/mpr/mpr/p/8/005/06d/3a2/32f51dd.jpg" style="float: left; margin-bottom: 1em; margin-right: 1em; max-width: 606px;" /></div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
The solution is something called <span data-mce-style="overflow:hidden;line-height:0px" data-mce-type="bookmark" id="mce_7_start" style="line-height: 0px; overflow: hidden;"><a data-mce-href="https://www.google.com/search?q=data+blending&rlz=1C1CHMO_enUS468US469&oq=data+blending&aqs=chrome..69i57j69i65l2j0l3.3726j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=%22data+blending%22" href="https://www.google.com/search?q=data+blending&rlz=1C1CHMO_enUS468US469&oq=data+blending&aqs=chrome..69i57j69i65l2j0l3.3726j0j7&sourceid=chrome&es_sm=93&ie=UTF-8#q=%22data+blending%22" rel="nofollow" style="color: #006699; outline: medium;">data-blending</a></span> which tries to reduce the pain of integrating multiple data sources to a level that you could contemplate it in near real-time. While I have not yet seen a solution I would call perfect the contrast with the standard, locked-down, DSR scenario is impressive. </div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
Much of what I have seen so far happens at the individual's level: where you are doing the match in-memory and without impacting the underlying database or fellow users in any way. In many cases, particularly for exploratory work, this is preferable, but it's far from an ideal solution if you need to process against the detail of the entire database or have multiple needs for the same data.</div>
<div style="color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em;">
The future, I think, will include such ad-hoc capability, but I suspect it also includes a more flexible data model that let's an administrator rapidly integrate new data sources into the standard offering.</div>
<div>
<br /></div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-63058515338849213502014-06-02T07:05:00.000-07:002014-06-02T07:05:00.040-07:00Averages work ! (At least for ensemble methods)After an early start, I was sitting at breakfast downtown enjoying a burrito and an excellent book on "ensemble methods". (Yes, I do that sometimes... don't judge)<br />
<br />
<div class="number" id="srNum_0" style="background-color: white; color: #666666; float: left; font-family: arial, helvetica, sans-serif; font-size: 10px; margin-bottom: 0px; width: 2.8em;">
<ol>
<li>1.</li>
</ol>
</div>
<div class="linePlaceholder" style="background-color: white; font-family: arial, helvetica, sans-serif; font-size: 11.111111640930176px; margin-bottom: 0px; padding: 0px;">
</div>
<div class="image imageContainer" style="background-color: white; float: left; font-family: arial, helvetica, sans-serif; font-size: 11.111111640930176px; margin-bottom: 0px; margin-right: 10px; position: relative; text-align: center; width: 160px;">
<a href="http://www.amazon.com/Ensemble-Methods-Data-Mining-Predictions/dp/1608452840/ref=sr_1_1?s=books&ie=UTF8&qid=1396540700&sr=1-1&keywords=ensemble+methods+elder" style="color: #996633; text-decoration: none;"></a><br />
<div class="imageBox" style="display: inline-block; margin-bottom: 0px; position: relative;">
<a href="http://www.amazon.com/Ensemble-Methods-Data-Mining-Predictions/dp/1608452840/ref=sr_1_1?s=books&ie=UTF8&qid=1396540700&sr=1-1&keywords=ensemble+methods+elder" style="color: #996633; text-decoration: none;"><img alt="Product Details" class="productImage cfMarker" src="http://ecx.images-amazon.com/images/I/41IN45E8NbL._SL160_PIsitb-sticker-arrow-dp,TopRight,12,-18_SH30_OU01_AA160_.jpg" style="border: 0px; height: 160px; width: 160px;" /></a></div>
<a href="http://www.amazon.com/Ensemble-Methods-Data-Mining-Predictions/dp/1608452840/ref=sr_1_1?s=books&ie=UTF8&qid=1396540700&sr=1-1&keywords=ensemble+methods+elder" style="color: #996633; text-decoration: none;">
</a></div>
<div class="data" style="background-color: white; display: table; font-family: arial, helvetica, sans-serif; font-size: 13.333333969116211px; margin-bottom: 0px; padding-bottom: 6px;">
<h3 class="title" style="font-size: 14.44444465637207px; font-weight: normal; margin: 0px; padding: 0px 0px 6px;">
<a class="title" href="http://www.amazon.com/Ensemble-Methods-Data-Mining-Predictions/dp/1608452840/ref=sr_1_1?s=books&ie=UTF8&qid=1396540702&sr=1-1&keywords=ensemble+methods+elder" style="color: #996633; font-size: 14.44444465637207px; font-weight: bold; text-decoration: none;">Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions (Synthesis Lectures on Data...</a> <span class="ptBrand">by <a href="http://www.amazon.com/Giovanni-Seni/e/B003BQUROY/ref=sr_ntt_srch_lnk_1?qid=1396540700&sr=1-1" style="color: #996633; text-decoration: none;">Giovanni Seni</a>, John Elder and Robert Grossman</span> <span class="bindingAndRelease">(Feb 24, 2010)</span></h3>
</div>
<br />
<br />
<br />
<br />
<br />
<br />
<a name='more'></a><br />
<br />
For those who have built a few predictive models: regression , neural-nets, decision trees,... I think this is an excellent read, outlining an approach that can deliver big improvements on hard to predict problems. The introduction provides a very good overview:<br />
<blockquote class="tr_bq">
Ensemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. <span style="background-color: white;">They combine multiple models into one usually more accurate than the best of its components.</span> Ensembles can provide a critical boost to industrial challenges...</blockquote>
Ensemble models use teams of models. Each model uses a different modeling approach or different samples of the available data or emphasizes different features of your data-set and each is built to be as good as it can be. Then we combine ("average") the prediction results and, typically, get a better prediction than any of the component team members.<br />
<br />
When I was first learning predictive modeling as an under-graduate the emphasis was on finding the <i>best</i> model from a group of potential candidates. Embracing ensemble methods, initially, just felt wrong, but the proof is in the performance.<br />
<br />
It sounds easy, but, clearly, this is more complex than building a single model and if you can get a good-enough result using simple approaches you should. You'll know when it's worth trying something more high powered.<br />
<br />
With thanks to my friend Matt for this simplification, this may be one of the few contexts where we can say<b> "Averages work!!" </b> <br />
<br />
<i>As a reminder that working with averages (or aggregations of any kind) is generally dangerous to your insight, take another look at this post on why you should be using <a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/10/do-you-really-need-daily-point-of-sale.html" target="_blank">daily point-of-sale data</a>.</i><br />
<i><br /></i>
<i>Or, consider this...</i><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5eziRnass4OnWTwZy7ixWYY8cekyihz_t_sW0ZS6LQaMpU3tmUNXUwL5jcxyoqlkGLjWrQuieqpjF58704qVF0XeLf_elUYKw9IHpxkpmHuFIgIFH-yHVMem9ZcmE4FFILbn-Zjp4m2c/s320/AveragesCanHurt.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5eziRnass4OnWTwZy7ixWYY8cekyihz_t_sW0ZS6LQaMpU3tmUNXUwL5jcxyoqlkGLjWrQuieqpjF58704qVF0XeLf_elUYKw9IHpxkpmHuFIgIFH-yHVMem9ZcmE4FFILbn-Zjp4m2c/s320/AveragesCanHurt.png" /></a></div>
<br />Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-63872467811823226092014-05-19T07:10:00.000-07:002014-05-19T07:10:00.068-07:00The right tools for (structured) BIG DATA handling - more Redshift<div class="tr_bq">
In my recent post on <a href="http://andrewg-crabtreeanalytics.blogspot.com/2014/04/the-right-tools-for-structured-big-data.html" target="_blank">The right tools (structured) BIG DATA handling</a>, I looked at using <a href="https://aws.amazon.com/redshift/" target="_blank">AWS Redshift</a> to generate summaries from a large fact table and compared it to previous benchmark <a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/the-right-tools-for-big-structured-data.html" target="_blank">results</a> using a columnar database on a fast, SSD drive.</div>
<br />
RedShift performed very well indeed, especially so as the number of facts returned by the queries increased. In this initial testing I was aggregating the entire fact table to get comparable tests to the previous benchmark, but that's typically not how a reporting (or analytic) system would access the data. In this follow-up post then, let's look at how Redshift performs when we want to aggregate across particular records.<br />
<br />
<a name='more'></a><br />
<br />
<h3>
Test setup</h3>
For this test, I am using the same database as before (simulated Point of Sale data at item-store-week level with item, store and calendar master tables) on 4 'dw1.xlarge' AWS nodes. For each query I am summarizing 5 facts from the main fact table, joining to each of the master tables and using a variety of filters to restrict the records I want to aggregate over.<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrIaCype-0GFKw0SgHan8FjirzePHx5B_I8k7Q2essRu96m2-JAs9gcN1XdkQdqIu2WRfe89CBwuptcoN97k_zkAmjrkYjvB2_CSrYdsfGhjZ_vJ2R-im0MzkEDi33mCxVShriRSTqc1A/s1600/AWS+Redshift+WHERE+Clauses.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrIaCype-0GFKw0SgHan8FjirzePHx5B_I8k7Q2essRu96m2-JAs9gcN1XdkQdqIu2WRfe89CBwuptcoN97k_zkAmjrkYjvB2_CSrYdsfGhjZ_vJ2R-im0MzkEDi33mCxVShriRSTqc1A/s1600/AWS+Redshift+WHERE+Clauses.png" height="130" width="640" /></a></div>
<div>
<br /></div>
<div>
The first record shows performance when we have no filters at all, summarizing all data in the fact table. That's 416 million records in just over 30 seconds with an average speed of 13.4 million records per second. Very respectable !<br />
<br />
The second row uses a filter - WHERE category = 'Type 2' - based on a field in the item master table, which is associated with roughly 20% of the fact table records. Aggregating 83 million records in 26 seconds is almost as slow as aggregating across all records. Not good.<br />
<br />
The third row filters on a field from the Calendar master table to return only those weeks in the year 2011: 50 million records in 2.9 seconds. This is quick and in speed terms, faster at 18.4 million records/second than the original query.<br />
<br />
<h3>
What's going on ? </h3>
This apparently odd behavior is driven by my choices when defining the table for <i>distkey</i> and <i>sortkey (see the SQL below) </i>. <br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbOEWlak5KFDzS06rOgJ8aB_cIF2wYDw7I_Y9Yf5vzuylpEAMKr8zwviG396gD3i7_m3JyYBmGvd-7D322mWnvJx6SK6bSr4mfUcPle4tDxBD68ryzAgfh6LGrkPr_0IE_XVkTqYV3Fpg/s1600/sort+and+dist+key.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbOEWlak5KFDzS06rOgJ8aB_cIF2wYDw7I_Y9Yf5vzuylpEAMKr8zwviG396gD3i7_m3JyYBmGvd-7D322mWnvJx6SK6bSr4mfUcPle4tDxBD68ryzAgfh6LGrkPr_0IE_XVkTqYV3Fpg/s1600/sort+and+dist+key.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Note that Redshift doesn't use indexes or partitions as I am used to seeing them in relational databases so, in many ways, table definition is a lot simpler. Remember that Redshift is running on a cluster of processing nodes, not just one machine.<br />
<br />
<i><b>distkey</b></i> defines how the data in this fact table should be spread across the multiple nodes in the cluster. In this instance I chose to spread it out based on the store identifier (storeid). Redshift will try to put records with the same storid on the same node. (More details om selecting a distkey <a href="http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html" target="_blank">here</a>). Note that this would primarily help with faster joins. I did not add the same distkey to the store master table, but as that is small, just a few hundred records, copying it between nodes to make a join should not be especially impactful.<br />
<br />
<i><b>sortkey</b> </i>defines how records will be sorted on each node. Redshift uses this information to optimize query plans and will (hopefully) skip past entire sections of data that are not within the filter. I could have used multiple fields in the sortkey but chose to get started with just 1, the week identifier in the fact table and associated calendar master table, periodid . (More details on selecting a sortkey <a href="http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-sort-key.html" target="_blank">here</a>)<br />
<br />
So with this in mind let's look at the results table again.<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrIaCype-0GFKw0SgHan8FjirzePHx5B_I8k7Q2essRu96m2-JAs9gcN1XdkQdqIu2WRfe89CBwuptcoN97k_zkAmjrkYjvB2_CSrYdsfGhjZ_vJ2R-im0MzkEDi33mCxVShriRSTqc1A/s1600/AWS+Redshift+WHERE+Clauses.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrIaCype-0GFKw0SgHan8FjirzePHx5B_I8k7Q2essRu96m2-JAs9gcN1XdkQdqIu2WRfe89CBwuptcoN97k_zkAmjrkYjvB2_CSrYdsfGhjZ_vJ2R-im0MzkEDi33mCxVShriRSTqc1A/s1600/AWS+Redshift+WHERE+Clauses.png" height="130" width="640" /></a><br />
<br />
I don't think I'm benefiting from the distkey at all in this test set as I set the distkey to be storeid and none of these filters are store-based. The filters are either based on time (the sortkey) or category, an item attribute which is not part of either sortkey or distkey. And yet, the speed difference between row 2 (which presumably sees no benefit from either setting) and row 3 (enhanced just by the sortkey) is dramatic: almost a 6-fold speed increase!<br />
<br />
That speed drops for the 4th and 5th records is, I think, more to do with some latency in query execution, rather like we saw in the previous <a href="http://andrewg-crabtreeanalytics.blogspot.com/2014/04/the-right-tools-for-structured-big-data.html" target="_blank">tests</a>. These queries hit significantly less data and as the data quantity falls any latency becomes an increasingly large proportion of the whole. <br />
<br />
I did not put a lot of thought into choosing distkey and sortkey values for this test but it certainly seems as though choosing these correctly could have a dramatic impact to the speed of queries. <b> Truthfully, there isn't very much to tweak here, so optimizing within these boundaries should not take too long. I could really grow to like simple.</b><br />
<br />
More testing to follow,<br />
<br />
<br /></div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-12051173463216332542014-05-12T08:47:00.002-07:002014-05-12T08:47:59.089-07:00Next Generation DSRs - it's all about speed !<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJZhXIndGQavpDSSS0G1CTy4zijJFg0kDWHnCz2t5HWbdivcrDLkjYwKFUpWSDi9wz2e0QCosnXMkvyxSnvsp9NnsOVk1FMMhdGQmcxwBo0k-kJOSNr9vhfdUAFY5wTd0YCuI19DZmjrU/s1600/speed-limit-5mph.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJZhXIndGQavpDSSS0G1CTy4zijJFg0kDWHnCz2t5HWbdivcrDLkjYwKFUpWSDi9wz2e0QCosnXMkvyxSnvsp9NnsOVk1FMMhdGQmcxwBo0k-kJOSNr9vhfdUAFY5wTd0YCuI19DZmjrU/s1600/speed-limit-5mph.jpg" /></a>Recently, I have been working with a new-to-me BI tool that has reminded me just how much speed matters. I'm not mentioning any names here, and it's not a truly bad tool, it's just too slow and that's an insight killer!<br />
<br />
Continuing my series on <a href="http://andrewg-crabtreeanalytics.blogspot.com/search/label/Next%20Generation%20DSR" target="_blank">Next Generation DSRs</a>, let's look at how speed impacts the exploratory process and the ability to generate insight and, more importantly, value.<br />
<br />
Many existing DSRs do little more than spit out standard reports on a schedule and if that's all you want, it doesn't matter too much if it takes a while to build the 8 standard reports you need. Pass off the build to the cheapest resource capable of building them and let them suffer. Once built, if it takes 30 minutes to run when the scheduler kicks it off, nobody is going to notice.<br />
<br />
Exploratory, ad-hoc, work is a different animal and one that can generate much more value than standard reports. It's a very iterative/interactive process. Define a query, see what results you get back and kick off 2-3 more queries to explain the anomalies you've discovered: filter it, order it, plot it, slice it, summarize it, mash it up with data from other sources, correlate, .., model. This needs speed.<br />
<br />
<a name='more'></a><br />
<br />
For a recent project, I was pulling data to support analytics: descriptive, inventory-modeling and predictive models. Define a query based on the features I am searching for, submit it to run, then wait... 20 minutes to an hour to get a result. When the results come through (or fail to do so with an error message that defies understanding) I have long since moved on to some other task so as not to completely destroy my productivity. It takes time to get my head back in the game and to remember what I was trying to achieve and productivity takes a dive. I didn't need just one query of course, more like 10, so I would have 3-4 running simultaneously and extensive notes scribbled on a scratch pad to try and keep track.<br />
<br />
Admittedly, what I am doing here is complex and the tasks I was using to fill-in gaps with were also relatively complex (e.g. simulating a large-scale, retail, supply-chain replenishment and forecasting system in R), but still, it took 2 days of fighting with the beast to get what I needed. Progress was painfully slow on everything I attempted in this time period and my frustration levels were off the scale.<br />
<br />
This system is forcing me to multitask. According to one <a href="http://psychology.about.com/od/cognitivepsychology/a/costs-of-multitasking.htm" target="_blank">study</a>, this can reduce your productivity by 40%. A 40% decline in productivity is a bad thing, but, frankly, it felt worse: I did not measure it and I'm not about to create a study to prove it, but switching between highly complex tasks and with a BI tool that kept interrupting me felt much worse than a 40% drop.<br />
<br />
Whether my perception is right or not, it's perception that drives behavior. If using the system in this manner is painful it will inevitably be used less often by fewer people and more of the insights buried in the data will stay there.<br />
<br />
Not that things haven't improved. One of my first jobs after college was to build computer simulations of factory production lines to test out changes in new equipment or layouts before incurring any significant capital expense. Some of these studies were very successful, but very complex to build and, running on the hardware of the time, I would start a simulation run when I went home and check the results when I got in the following morning. Some mornings could be very depressing; realizing that I had an error in a part of the model, had no useful results to build on and no chance to run again until that evening. Consequently, studies that took 1-2 weeks of work time, could take elapsed months to execute.<br />
<br />
If you've been following this <a href="http://andrewg-crabtreeanalytics.blogspot.com/search/label/Next%20Generation%20DSR" target="_blank">series</a> you'll know that I am strong proponent of using newer database technologies (mpp, memory, columnar, ...) to both simplify the data architecture AND provide substantial speed increases over existing systems.<br />
<br />
If you still just want your standard reports, don't worry about it, just hope your competition is doing the same.Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-90562657796022793002014-05-08T08:28:00.001-07:002014-05-08T10:41:52.522-07:00Visualizing Forecast Accuracy. When not to use the "start at zero" rule ?<div style="clear: both; text-align: left;">
I recently joined a discussion on Kaiser Fung's blog Junk Charts , <a href="http://junkcharts.typepad.com/junk_charts/2014/04/when-to-use-the-start-at-zero-rule-.html" target="_blank">When to use the start-at-zero rule</a> concerning when charts should force a 0 into the Y-axis. BTW - If you have not done so, add his blog to your RSS feed, it's superb and I have become a frequent visitor.</div>
<div style="clear: both; text-align: left;">
<br /></div>
<div style="clear: both; text-align: left;">
On this particular post, I would completely agree with his thoughts was it not for this one metric I have problems visualizing, Forecast Accuracy.</div>
<div style="clear: both; text-align: left;">
</div>
<a name='more'></a><br />
<br />
<div style="clear: both; text-align: left;">
Forecast Accuracy is a very, very widely used sales-forecasting metric that is based on a statistical one, so let's start there. </div>
<div style="clear: both; text-align: left;">
<br /></div>
<div style="clear: both; text-align: left;">
The statistical metric (Mean Absolute Percentage Error) looks at the average absolute forecast error as a percentage of actual sales. Some of the errors will be positive and some negative but by taking the absolute value we lose the sign and just look at the magnitude of error. (We handle optimism or pessimism in the forecast with a different "bias" metric). </div>
<div style="clear: both; text-align: left;">
<br /></div>
<div style="clear: both; text-align: left;">
There is occasionally heated discussion in the sales forecasting community about exactly how this should be calculated but let's save that for another day as all forms I am familiar with have the same properties with regard to plotting results.</div>
<ul>
<li>perfect forecasts would have no error and return 0% MAPE, this is our base.</li>
<li>there is no effective upper bound on the metric</li>
</ul>
<div>
If we were to look at this across a range of product groups (A thru K) it might look something like this. The Y-axis is forced to start at 0 and the length of the bars have meaning, Product D really does have almost twice the error rate of product A. This plots out very nicely, it's hard to misunderstand and the start-at-zero rule certainly does apply.</div>
<div style="text-align: center;">
<img border="0" height="270" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidMjz_eUKjMmNSpqrzJYEXocBJFMF1NFwtw96dhxw1chDguaHFqqd3HUv-tIG4dm0IIR_DgsuKGMWeaJ9ixrdn1vvkFOA2GjYgLnNd7zOR0P50DsF95X_nGR87UVVwegXrxVHoMf7gLzc/s1600/FA1.png" width="400" /></div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: center;">
<br /></div>
Now convert MAPE into a Forecast Accuracy with this simple calculation.<br />
<blockquote class="tr_bq">
Forecast Accuracy = 1 - MAPE</blockquote>
I can only assume this metric was created in the sense of <i>"bigger numbers are better"</i>. It's in widespread use, it's part of the business forecasting language, and no, I can't change it. As you can see below, perfect forecasts are now at 100% and there is no lower bound on the metric, it can easily be negative.<br />
<blockquote class="tr_bq">
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; text-align: center; width: 207px;">
<colgroup><col style="mso-width-alt: 2779; mso-width-source: userset; width: 57pt;" width="76"></col>
<col style="mso-width-alt: 4790; mso-width-source: userset; width: 98pt;" width="131"></col>
</colgroup><tbody>
<tr height="40" style="height: 30.0pt;">
<td class="xl66" height="40" style="height: 30.0pt; width: 57pt;" width="76"><b>MAPE</b></td>
<td class="xl66" style="width: 98pt;" width="131"><b>Forecast Accuracy</b></td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">0%</td>
<td class="xl65" style="text-align: right;">100%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">20%</td>
<td class="xl65" style="text-align: right;">80%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">40%</td>
<td class="xl65" style="text-align: right;">60%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">60%</td>
<td class="xl65" style="text-align: right;">40%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">80%</td>
<td class="xl65" style="text-align: right;">20%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">100%</td>
<td class="xl65" style="text-align: right;">0%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">120%</td>
<td class="xl65" style="text-align: right;">-20%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">140%</td>
<td class="xl65" style="text-align: right;">-40%</td>
</tr>
<tr height="20" style="height: 15.0pt;">
<td class="xl65" height="20" style="height: 15pt; text-align: right;">160%</td>
<td class="xl65" style="text-align: right;">-60%</td></tr>
</tbody></table>
</blockquote>
This causes me a problem. Check out the chart below: this is the same data as before but now expressed as Forecast Accuracy rather than MAPE in a standard Excel chart. Excel is trying to help (bless it) and put the 0 value in without my help. Work in supply-chain and you will see a lot of these.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbBHkw1azY3D8BnP7fvvvqmbYHSTYbfWpOQO70SO5NfX8tN-XsG_PqM6yEeqrPbV3erCbAhwPDR-f5e-pHDQ4QrOxB34RLEYeBluK5S-uQlpFCc2XkblHw-_XO1wLIBPwD1bPV5B8bJEU/s1600/FA3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbBHkw1azY3D8BnP7fvvvqmbYHSTYbfWpOQO70SO5NfX8tN-XsG_PqM6yEeqrPbV3erCbAhwPDR-f5e-pHDQ4QrOxB34RLEYeBluK5S-uQlpFCc2XkblHw-_XO1wLIBPwD1bPV5B8bJEU/s1600/FA3.png" height="267" width="400" /></a></div>
The zero value has no special meaning on this metric, so starting at 0 is very misleading: 80% accuracy (20% MAPE) is not twice as good as 40% accuracy (60% MAPE).<br />
<br />
Allowing the minimum of the y-axis to float does not solve this either (below)<br />
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbDMUb6-wLJW6387j5o-3VokbEEpPK94RmkOn-NWRibOAS6IvomRYTM5oqIr89t2XDwGHyWKiPyozfU6OLbjljeU796Q87nnrY8EbaOf7x91CNWcvqriYzeLdrd0ZVwNZnxJcHRI6RS2Y/s1600/FA2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbDMUb6-wLJW6387j5o-3VokbEEpPK94RmkOn-NWRibOAS6IvomRYTM5oqIr89t2XDwGHyWKiPyozfU6OLbjljeU796Q87nnrY8EbaOf7x91CNWcvqriYzeLdrd0ZVwNZnxJcHRI6RS2Y/s1600/FA2.png" height="270" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
I really don't know what this is trying to tell me... some product groups are better than others perhaps ? Certainly, relative size is meaningless.<br />
<br />
"Abandon it" you say "go to a line chart". Line charts often have floating axes and yes they do not emphasize relative size nearly as much as a bar-chart does (below). <br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqPPpAD8YTj1n-NF0HrxyMCn3Vavlb20_TcQ2IOP5POdlqH2QFFRXJA4wCgk8hlrTaORAppCH6eEL0PsJj2O-0E-ts2oovnWdhl-OVwVnVH-BKi4bqe-MYNZMPPLJZOwYDhEvWl8pVqes/s1600/FA4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqPPpAD8YTj1n-NF0HrxyMCn3Vavlb20_TcQ2IOP5POdlqH2QFFRXJA4wCgk8hlrTaORAppCH6eEL0PsJj2O-0E-ts2oovnWdhl-OVwVnVH-BKi4bqe-MYNZMPPLJZOwYDhEvWl8pVqes/s1600/FA4.png" height="255" width="400" /></a></div>
<br />
Perhaps it's less confusing/misleading than the previous charts but I still don't like it. because there is data I want to compare relative sizes for (the MAPE) and line-charts seem most useful when trying to show patterns. I have no reason to expect a useful pattern to form from product categories: I just sorted then alphabetically.<br />
<br />
My thanks to the contributors on Junk Charts for helping me clarify my thinking on this. I don't know that there is a great answer but as it's one I run into all the time I do want to find a better solution. (FYI - It's just hit me that there are another set of supply-chain metrics for order fill-rates than have the exact same problem)<br />
<br />
The best I have been able to do with it so far is shown below, by forcing the upper limit on the Y-axis to 100% and letting the lower limit float, I am trying to emphasize the negative space between the top of the bar and 100%, essentially the error rate.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiS17hZzDNpmedFe-Shi6FDAmcTUejF3W945QlaYA-NF6vKXmmWkYRzeY0fkQuP6fg4mgM7yXcYnO9oCbMJTiXME4sasKijCnKJU_pCmvsckqf2SI1XINY6XlL9Ohg6K66khZmgI-9q6I8/s1600/FA5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiS17hZzDNpmedFe-Shi6FDAmcTUejF3W945QlaYA-NF6vKXmmWkYRzeY0fkQuP6fg4mgM7yXcYnO9oCbMJTiXME4sasKijCnKJU_pCmvsckqf2SI1XINY6XlL9Ohg6K66khZmgI-9q6I8/s1600/FA5.png" height="258" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
I'm not entirely happy though, those heavy bars do draw the eye, how about a dot-plot instead ?</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjInYKCJNYzfEaoYWSHoWRPrkLswHOfFSBkoWloav0Bm0B6AQgWeU_khMTrGTf4_W-duJ-LDyO3Lj6LxSDTWPsgCvPGKMXxDp5RbjwkbOQXzqQuKmjk98rZXjvSiBe9mm65aecM34tKnfc/s1600/FA6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjInYKCJNYzfEaoYWSHoWRPrkLswHOfFSBkoWloav0Bm0B6AQgWeU_khMTrGTf4_W-duJ-LDyO3Lj6LxSDTWPsgCvPGKMXxDp5RbjwkbOQXzqQuKmjk98rZXjvSiBe9mm65aecM34tKnfc/s1600/FA6.png" height="258" width="400" /></a></div>
<br />
You would still have to learn how to read it properly ...<br />
<br />
Or how about this? Inspiration or desperation? I'm now plotting the bars down from the 100% mark, emphasizing MAPE while still using the Forecast Accuracy scale. I'm not entirely sure yet, but I <i>think </i>I like it and if I generalize the "start at 0" idea to "start at base" it may even fit the rule.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDf8iRTkfO1JkrUpIM7xgSlB6HoEYLV-nokl5EHJg935YHHz_-KiyV52acbchVA4F7haSqfJO6geCPXf_fK2wieIUMcHM0YaQCDKdcWROZ8gMPI79JsE2eW60B-ZZ-zqZ-G61RxyqlZ90/s1600/FA7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDf8iRTkfO1JkrUpIM7xgSlB6HoEYLV-nokl5EHJg935YHHz_-KiyV52acbchVA4F7haSqfJO6geCPXf_fK2wieIUMcHM0YaQCDKdcWROZ8gMPI79JsE2eW60B-ZZ-zqZ-G61RxyqlZ90/s1600/FA7.png" height="256" width="400" /></a></div>
<br />
What do you think? Which version best handles the compromise between a user's desire to see the metric they know and my desire to show them relative error rates? Have you a better idea? I would love to hear it - this one really bugs me ! Can you think of any other examples of metrics where 0 is meaningless?<br />
<br />
<!-- Blogger automated replacement: "https://images-blogger-opensocial.googleusercontent.com/gadgets/proxy?url=http%3A%2F%2F2.bp.blogspot.com%2F-Xt8Ur71hcDQ%2FU2uREQj_miI%2FAAAAAAAAB6Q%2FTwfhiklGyzA%2Fs1600%2FFA1.png&container=blogger&gadget=a&rewriteMime=image%2F*" with "https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidMjz_eUKjMmNSpqrzJYEXocBJFMF1NFwtw96dhxw1chDguaHFqqd3HUv-tIG4dm0IIR_DgsuKGMWeaJ9ixrdn1vvkFOA2GjYgLnNd7zOR0P50DsF95X_nGR87UVVwegXrxVHoMf7gLzc/s1600/FA1.png" -->Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com1tag:blogger.com,1999:blog-3663282854138469774.post-73080816684069953952014-05-05T06:58:00.004-07:002014-05-05T07:12:17.417-07:00Recommended Reading: The Definitive Guide To Inventory ManagementA little over 15 years go now, I was set the task to model how much inventory was needed for all of our, 3000 or so, products at every distribution center. Prior to this point, inventory targets had been set at aggregate level based off experience and my management felt it was likely we had too much inventory in total and what we did have was probably not where it was most needed. (BTW - they were absolutely right and we were ultimately able to make substantial cuts in inventory while raising service levels).<br />
<br />
I came to the project with a math degree, some programming expertise, practical experience simulating production lines, optimizing distribution networks, analyzing investments and with no real idea of how to get the job done. The books I managed to get my hands on gave you some idea how to use such a system but no real idea how to build it. They left out all the hard/useful bits I think. So, I set about to work it out for myself with a lot of simulation models to validate that the outputs made sense.<br />
<a href="http://www.amazon.com/Definitive-Guide-Inventory-Management-Professionals-ebook/dp/B00J4N8TQQ/ref=sr_1_1?ie=UTF8&qid=1399296781&sr=8-1&keywords=the+definitive+guide+to+inventory+management" style="clear: right; color: #996633; float: right; margin-bottom: 1em; margin-left: 1em; text-decoration: none;"><img alt="Product Details" class="productImage cfMarker" src="http://ecx.images-amazon.com/images/I/51FVZsBoW8L._SL160_PIsitb-sticker-arrow-dp,TopRight,12,-18_SH30_OU01_AA160_.jpg" style="border: 0px;" title="" /></a><br />
I still work occasionally in inventory modeling and I'll be teaching some components this fall, so I have been eagerly awaiting this new book : <a href="http://www.amazon.com/Definitive-Guide-Inventory-Management-Professionals-ebook/dp/B00J4N8TQQ/ref=sr_1_1?ie=UTF8&qid=1399296781&sr=8-1&keywords=the+definitive+guide+to+inventory+management">The Definitive Guide to Inventory Management: Principles and Strategies for the Efficient Flow of Inventory across...</a> by CSCMP, Waller, Matthew A. and Esper, Terry L. (Mar 19, 2014)<br />
<a name='more'></a><br />
Full disclosure here: one of the authors, Dr Matt Waller is a friend and colleague of mine. He brings an astonishing level of expertise to many areas of supply-chain management and inventory modeling is clearly no exception.<b> Together Matt and Terry Esper have produced a book that (had I possessed it 15 years before it was published) could have short-cut my inventory modeling project by approximately 6 months.</b><br />
<br />
This is not a long book, not quite 200 pages in fact, but it is no lightweight. If you just want an overview of the topic you could skip the math, but my guess is that if you do that, you will never really understand. The math is not particularly hard and it's presented in a sort of hybrid math/Excel fashion that I find easy to follow. I'll also say that I hit my first "ahah!" moment before I got to page 20. I won't embarrass myself by telling you what it was but something that had bothered me for years suddenly clicked into place.<br />
<br />
Unlike most discussion on this topic, this book looks at inventory modeling from the manufacturer's or supplier's point of view right through to the retail shelf. They also provide a number of means to estimate components of inventory from historical data so you can assess how well your planning and execution system are tracking to plan: something I was aware of but had never really thought through how useful it could be. Details on how to conduct your own simulation studies in Excel and an overview to the most commonly used forecasting approaches that feed the inventory models round it out.<br />
<br />
It's all here, what you need to understand (and if you so wish, build) a system to optimize your inventory holding. I highly recommended it.<br />
<br />
<br />Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-9814716554745554622014-04-28T07:15:00.000-07:002014-04-28T07:15:00.164-07:00Next-Generation DSRs (multi-retailer)This post continues my look at the <a href="http://andrewg-crabtreeanalytics.blogspot.com/2014/03/the-next-generation-dsr.html">Next Generation DSR</a>. Demand Signal Repositories collect, clean, report-on and analyze Point of Sale data to help CPGs drive increased revenues and reduce costs.<br />
<br />
<b>Most CPG implementations of a DSR support just one retailer's POS data.</b> OK before someone get's back to me with <i>"but we have multiple retailers' POS data in our system"</i>, I'll clarify:<br />
<ul>
<li>Having Walmart and Sam's Club data in the same DSR does not count (as the data comes from the same single source, RetailLink) and I bet you are still limited as to what you can report on across them.</li>
<li>If you have multiple-retailer's POS data set up in isolated databases using the same front-end... it does not count</li>
<li>If you have the data in the same database but without common data standards ... it does not count.</li>
<li>If you have the data in the same database but with no way to run analysis or reports across multiple retailers at once... it does not count.</li>
</ul>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjlCpQEdcaBvTcyVdR056zXu_LBQYEa6ZY3Eps7DJ6d_eu7YyRjsBONQVrdaJqFrspzxzEYOT7mf9qefBiqVt8c5Eh5C1S2YfD6k3zYRgKWJHD3aMYCqVbRW92hivHTvaT6iBX4TjcDGg/s1600/shopping-carts.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjlCpQEdcaBvTcyVdR056zXu_LBQYEa6ZY3Eps7DJ6d_eu7YyRjsBONQVrdaJqFrspzxzEYOT7mf9qefBiqVt8c5Eh5C1S2YfD6k3zYRgKWJHD3aMYCqVbRW92hivHTvaT6iBX4TjcDGg/s1600/shopping-carts.jpg" height="240" width="320" /></a></div>
So, yes, a number of CPGs have DSRs that support multi-retailer POS data sources, very, very few (if any?) have integrated that data into a single database with common data standards so they can report and analyze across multiple POS sources at the same time.</div>
<div>
<br /></div>
<div>
<b>Does it matter? I think so, multi-retailer ability opens up big opportunities around promotional-effectiveness, assortment planning, supply-chain forecasting (demand sensing) and ease of use.</b><br />
<b><br /></b>
<a name='more'></a></div>
<div>
<br /></div>
<div>
So, why are we not doing this already?<br />
<br />
From a historical perspective, you can track most DSR's back to starting out with a particular retailer's data and supporting CPG sales-teams for that retailer. The sales-team were the folks with the checkbook and they were not very interested in what the system could do with any other retailer's data. DSR solutions are still often sold to individual sales-teams which is why CPGs support numerous DSR implementations.</div>
<div>
<br /></div>
<div>
Can these solutions support multiple-retailers - yes - sort of - maybe - probably not. The key issues to resolve are data-volume, data-standardization, localization and security.</div>
<div>
<br /></div>
<h3>
Data Volume</h3>
<div>
From my previous post (<a href="http://andrewg-crabtreeanalytics.blogspot.com/2014/04/next-generation-dsrs-data-handling.html" target="_blank">Next-Generation DSRs - data handling</a>) I was stressing how circa 2010 technology was struggling to handle the volume and velocity of data involved in a DSR. And that was with single retailer solutions. Newer database applications gives us the capability to maintain or improve performance while handling substantially more data through columnar, massively parallel and in-memory technology. I fully acknowledge I may be missing a few ideas on that list, it doesn't matter - the point being that a 10 fold increase in data volume is no longer something to be worried about, Trade up to new technology and you can handle it. </div>
<h3>
Data Standardization</h3>
This is dull, really dull, it's right up there with <a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/03/data-cleansing-boring-painful-tedious.html">Data Cleansing (boring, painful, tedious and very, very important)</a>. There is no standard for what data a retailer chooses to share with their CPG suppliers. There is overlap, yes of course, but no actual standards. They will:<br />
<ul>
<li>call the same facts (e.g. point of sale units) by different names.</li>
<li>report facts in different time buckets (weekly, daily)</li>
<li>report facts that are 100% unique to a particular retailer (some of which may be useful)</li>
<li>have similar but (subtly different) meanings for what appears to be the same fact</li>
<li>not provide key facts that seem essential (like on-hand inventory at stores)</li>
</ul>
<div>
And through all this you are trying to find enough common ground to generate reports and analytics that work across retailers. I can hear the cries now of <i>"but Retailer-X is completely unique, that won't work for us". </i>Ignoring for the moment the impossibility of degrees of "unique-ness", they are wrong, this really can be done. All retailers sell, order, hold inventory and promote (to list but a few things). What is common between data sources is huge, but it takes real discipline to find the commonality wherever it exists and map it to a single data-structure for reporting/analytic purposes. And when you do find something unique, that's ok: map it to a new fact, store it and wait. Perhaps it's only unique because you haven't seen it in another retailer's data feed... yet.</div>
<div>
<br /></div>
<div>
Bottom line - It's dull (I did warn you about that right?) but it can be done.</div>
<h3>
Localization</h3>
When I'm generating a report for retailer X, they call the Point of Sale revenue fact 'POS Sales', retailer Y calls it 'Point of Sale $', retailer Z calls it 'POS Revenue'. Internally, and when reporting across multiple retailers, we call it just 'POS'. How can we support this?<br />
<br />
I've coded custom solutions for this before, it's not that hard, but it strikes me that this is just another example of "language" and if we can have the same application work in English, German, Italian, Spanish and Russian, how hard should it be to translate between variations on the same language.<br />
<h3>
Security</h3>
<div>
Is Retailer X allowed to see data from Retailer-Y - no way ! Are the Retailer-X sales-team allowed to see Retailer-Y point of sale data - very probably not. Are my sales folks allowed to see any competitor sales data provided to category managers - nope. Do I want the sales-team to see the profit margin on the products they sell? (This sounds sensible, but actually some CPGs do not want this. I guess, if they don't know, they can't tell the customer).<br />
<br />
These are all issues with DSR's as they stand today and are all resolved already with solid user account management. If this process is done well, security is not a problem. If the processes around security management are sloppy, it's already a problem. Adding more data into the system really doesn't make a difference one way or another.<br />
<h3>
Bottom line</h3>
</div>
<div>
If a DSR was designed from scratch to support multiple retailers, it would have one single data model and all new data sources get mapped to this single model. <br />
<br />
Localization means that the same report for Retailer-X and Retailer-Y is shown with their own naming preferences.<br />
<br />
Security controls who is allowed to see what.<br />
<br />
And what's in it for you ?<br />
<br />
<ul>
<li>You now have the ability to rapidly leverage learnings (in the form of new analytics and reports) across all retailers and sales teams.</li>
<li>As team members move from one sales-team to another they do not need to learn a new system or even necessarily, a new "langauge".</li>
<li>You get to maintain, develop, learn and train against just one system</li>
<li><b>And the really big pay-off is that you can now start to run value-added analytics that require access to multiple retailer's POS data . Think about significantly enhanced promotional-effectiveness, assortment planning and supply-chain forecasting (demand sensing) More on this very soon.</b></li>
</ul>
<div>
<b><br /></b></div>
</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-46575005620907993622014-04-21T07:10:00.000-07:002014-04-21T07:10:00.398-07:00The right tools for (structured) BIG DATA handling - columnar, mpp and cloud - AWS RedshiftToday, I'm coming back a little closer to the series of promised posts on the <a href="http://andrewg-crabtreeanalytics.blogspot.com/search/label/Next%20Generation%20DSR" target="_blank">Next Generation DSR</a> to look at some benchmark results for the Amazon Redshift database. Some time ago I wrote a couple of quite popular posts on using columnar databases and faster (solid state) storage to dramatically (4100%) improve the speed of aggregation queries against large data sets. As data volumes even for ad-hoc analyses continue to grow though, I'm looking at other options.<br />
<div class="post-header" style="background-color: #fcfbf5; color: #333333; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 12.727272033691406px; line-height: 1.6; margin: 0px 0px 1em;">
<div class="post-header-line-1">
</div>
</div>
Here's the scenario I've been working with: you are a business analyst charged with providing reporting and basic analytics on more data than you know how to handle - and you need to do it without the combined resources of your IT department being placed at your disposal.<br />
<a name='more'></a><div>
<br />
Previously, (<a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/the-right-tools-for-big-structured-data.html" style="background-color: #fcfbf5; color: #7d181e; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 12.727272033691406px; line-height: 16.545454025268555px; text-decoration: none;" target="_blank">here</a><span style="background-color: #fcfbf5; color: #333333; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 12.727272033691406px; line-height: 16.363636016845703px;">) </span> I looked at the value of upgrading hard-drives (to make sure the CPU is actually busy) and the benefit of using columnar storage which let's the database pull back data in larger chunks and with fewer trips to the hard-drive. The results were ..staggering. A combined 4100% increase in processing speed so that I could read and aggregate 10 facts from a base table with over 40 million records on laptop in just 37 seconds. (I'm using simulated Point of Sale data at item-store-week level just because it's an environment I'm used to and it's normal to have hundreds of millions or even billions of records to work with)</div>
<div>
<br />
I then increased the data volume by a factor of 10 (<a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/the-right-tools-for-structured-big-data.html" target="_blank">here</a>), repeated the tests and got very similar results without further changing the hardware. The column-storage databases were much faster, scaling well to both extra records (the SQL 2012 column-store aggregating 10x the data volume in less than 6x the elapsed time) and to more facts (see below).</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjoOQmSTRwQcvB8aOLjSeKR0eBXxM7_P2H-j7yKs6tppTZxCKvBKRqyv3RtJa3CFZ19q4xLiMaL5a0QddflQMfPZvuStmNaEuCqMeOed-P8L8Vn3XDeZ0CyD-17cyYhZkR2u5pMaGels/s1600/Results_chart.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjoOQmSTRwQcvB8aOLjSeKR0eBXxM7_P2H-j7yKs6tppTZxCKvBKRqyv3RtJa3CFZ19q4xLiMaL5a0QddflQMfPZvuStmNaEuCqMeOed-P8L8Vn3XDeZ0CyD-17cyYhZkR2u5pMaGels/s1600/Results_chart.png" height="372" width="640" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<div class="" style="clear: both;">
400 million records (the test set I used) is not enormous but it's certainly big enough to cause 99.2% of business analysts to come to a screeching halt and to beg for help. It's also enough to tax the limits of local storage on my test equipment when I have the same data replicated across multiple databases.</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
<b>I've been considering Amazon Redshift </b>for some time - it's cloud-based, columnar, simple to set up, uses standard SQL and it <b style="font-style: italic;">enables parallel execution and storage across multiple computers</b> (nodes) in the cloud.</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
First let's look at a simple test - the same data as before but now on Redshift. I tested 2 configurations using their smallest available "dw1.xlarge" nodes currently costing $0.85 per hour per node. These nodes each have 2 processor cores, 2TB of (non SSD) storage and 15GB of RAM. I'm going to drop the "SQL 2012 Base" setup that I used previously from the ongoing comparison - it's just not in the race.</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="" style="clear: both;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBjPG499yMzWy7omDlPcKu_jv8zWdX44Qkrk5sh4Yt9_bDtBwmmx4kIoT8iaYqukCcjejRK5G_-Ylc5bijyhubyBgU7PXkacWoTSYVqN8AVH91XS_8tXI00JjAaAZM-2E9gRjf3kSKYdY/s1600/RedShift+-+InfiniDb+-+SQL2012.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBjPG499yMzWy7omDlPcKu_jv8zWdX44Qkrk5sh4Yt9_bDtBwmmx4kIoT8iaYqukCcjejRK5G_-Ylc5bijyhubyBgU7PXkacWoTSYVqN8AVH91XS_8tXI00JjAaAZM-2E9gRjf3kSKYdY/s1600/RedShift+-+InfiniDb+-+SQL2012.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
SQL Server 2012 (with the ColumnStore Index) was the clear winner in the previous test and for a single fact query it still does very well indeed. The 2-node Redshift setup takes almost twice as long for a single fact, but, remember that these AWS nodes are not using fast SSD storage (and together cost just $1.70 per hour) so 41 seconds is a respectable result. Note, also, that it scales to summarizing 10 facts very well indeed, taking about 50% of the time that SQL Server did on my local machine.</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
How performance scales to more records and more facts is key and, ideally, I want something that scales linearly (or better): 10x the data volume should result in no more than 10x the time. Redshift here is doing substantially better than that - is that suggesting a better than linear scaling ? Let's take a closer look. </div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
For this test I extended the base table to include 40 fact fields against the same 3 key fields (Item, Store and week). I then ran test aggregation queries against the full database for 1, 5, 10, 20 and 30 facts</div>
<div class="" style="clear: both;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyqvgRmn_qa9XNrrY8bzGeomFRPoLvr7fG6O1GwOj2-eJq8602oMyIbpo4rc6LDFAdFdcwmkRefprUVWD4LXBXpg3kfd1WNEnGjSO4ZG5hVdQN8ghZiLXeAYA9IXddIPONeWmEeY5Y0TA/s1600/RedShift+scaling+by+number+of+facts.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyqvgRmn_qa9XNrrY8bzGeomFRPoLvr7fG6O1GwOj2-eJq8602oMyIbpo4rc6LDFAdFdcwmkRefprUVWD4LXBXpg3kfd1WNEnGjSO4ZG5hVdQN8ghZiLXeAYA9IXddIPONeWmEeY5Y0TA/s1600/RedShift+scaling+by+number+of+facts.png" height="405" width="640" /></a></div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
The blue dots show elapsed time (on the vertical axis) against the number of facts summarized in each query for the 2 node setup.</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
The red dots show the same data but for the 4 node setup.</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
For both series, I have included a linear model fit and they are very definitely linear. (R-squared values of 0.99 normally tell you that you did something wrong, it's just too good, but this data is real.) However, there appears to be a substantial "setup" time for query processing:- 31.943 seconds in the case of the 2 node system and 10.391 seconds for the 4 node system. These constants are the same whether you pull 1 fact, 5 facts or 30 on this basic aggregation query. Now, as all these queries join to the same item, and period master tables and aggregate on the same category and year attributes from those tables that should not be a big surprise. Change that scope and this setup time will change too. (more on that later)</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
Note also that as the number of nodes was doubled, processing speed (roughly) doubled too.</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
<b>Redshift is a definite contender for large scale ad-hoc work It's easy to setup, scales well to additional data and when you need extra speed you can add extra nodes directly from the AWS web console. (It took about 30 minutes to resize my 2 node cluster to 4 nodes.) </b></div>
<div class="" style="clear: both;">
<b><br /></b></div>
<div class="" style="clear: both;">
<b>When the work is done, shut down the cluster, stop paying the hourly rate and take a snapshot of the system to cheap AWS S3 storage. You can then restore that snapshot to a new cluster whenever you need it.</b><br />
<b><br /></b>
Is it the only option? Certainly not, but it is fast, easy to use and to scale out. That may be hard to beat for my needs, but I will also be looking at some SQL on Hadoop options soon.</div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
<br /></div>
<div class="" style="clear: both;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br /></div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-57982109515569222742014-04-18T06:00:00.000-07:002014-04-18T06:11:43.265-07:00Data Visualization - are pie-charts evil ?<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;">I'll be speaking next week at the <a href="http://scmr.uark.edu/conference.asp" target="_blank">Supply Chain Management Conference at the University of Arkansas </a>on how data-visualization enables action. </span><br />
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0OGJrYHvsHfxrYNLLYfZiIrh5crYJD1-jvId_x6RT4-6F_kQEpRIS3MDM-vTdO_Pc4xIzQE_8vO39H1NvuAD3IHFtCyIODC_E2Ps50qYc1V-p7hDRvCghmqLbc23J7ItWv4hpErXwL5c/s1600/pie+9.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0OGJrYHvsHfxrYNLLYfZiIrh5crYJD1-jvId_x6RT4-6F_kQEpRIS3MDM-vTdO_Pc4xIzQE_8vO39H1NvuAD3IHFtCyIODC_E2Ps50qYc1V-p7hDRvCghmqLbc23J7ItWv4hpErXwL5c/s1600/pie+9.png" height="192" width="320" /></a><span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;">Good visualization is fairly easy, unfortunately, building bad visualizations that are hard to use, easy to misunderstand and that obscure and distort the data you are trying to present is even easier - many analysts can do it without trying to.</span><br />
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;"><br /></span>
<br />
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;">In honor of the event, I'm resurrecting a post I created a couple of years ago "<a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/05/are-pie-charts-evil-or-just.html" target="_blank">Are pie charts evil or just misunderstood</a>". I wrote this around the time I was moving away from a trial and error approach (and 20 years of trial and error effort does get you cleaner visuals) to attempting to understand <i>why </i>some visuals so clearly work better than others. </span><br />
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;">It turns out that there are some great frameworks to help in building better visuals. Join me next week and we'll talk about human graphical perception, chart junk and non-data ink.</span><br />
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;">Enjoy !</span><br />
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Arial, sans-serif; font-size: 15.454545021057129px; line-height: 20px;"><br /></span>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-88307997035273209772014-04-17T06:00:00.000-07:002014-04-17T06:35:50.222-07:00Data Visualization - enabling action<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
I'll be speaking next week at the Supply Chain Management Research Center Conference at the University of Arkansas on how data-visualization enables action.<br />
<a href="https://media.licdn.com/mpr/mpr/p/8/005/056/2c2/2fc72db.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img alt="" border="0" height="200" src="https://media.licdn.com/mpr/mpr/p/8/005/056/2c2/2fc72db.jpg" style="border: 10px solid white;" width="154" /></a></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 3px; margin-top: 3px; outline: 0px; padding: 3px; vertical-align: baseline;">
The basic premise (and one I firmly believe) is that the hardest part of any analytic project is not defining the problem, doing the analytics or finding the "solution", it's <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">enabling action</em>. Far too many otherwise excellent analytic projects, tools and reports go unused because the results are presented in a way that is somewhere between difficult-to-understand and incomprehensible.<br />
<br />
<br />
<br />
<br />
<br /></div>
<a name='more'></a><br />
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Manager's typically do not have the time to just figure it out or double check their understanding, or re-work the results to something they can work with.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
By making your analytics easy to consume (through good visualization practice) you make it possible for decision-makers to find what is important, understand it correctly and make good decisions, quickly.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Frankly many analytics providers don't try very hard to make their results easy to consume and their outputs are confusing, hard to use, easy to misunderstand and a long, long way from enabling decisions.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
For those that do try, there is a tension between making things look "cool" or "interesting" and having them function well. Ideally we want both, but very few examples deliver well on both fronts. Indeed, a lot of the attempts to provide interest seem to be designed to obfuscate or distort meaning.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Here are some examples I plucked from a leading visualization vendor's web site. Each and every one of these charts is difficult to read because of limitations in our visual perception. We'll talk more about that in the conference next week.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<img alt="" class="left" src="https://media.licdn.com/mpr/mpr/p/7/005/056/2c8/3c551b0.jpg" style="border: 0px; float: left; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline;" /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
And trying to make charts more interesting/attractive/eye-catching typically makes things worse. This "Funnel Chart" (below) is hilarious !. It's being terribly misused and gets almost everything wrong. I defy use to use this and make sensible decisions.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<img alt="" class="left" height="268" src="https://media.licdn.com/mpr/mpr/p/7/005/056/2cc/3d9dc3d.jpg" style="border: 0px; float: left; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline; width: auto;" width="314" /></div>
<ul style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; list-style-image: initial; list-style-position: initial; margin: 1em 0px; outline: 0px; padding: 0px 0px 0px 3em; vertical-align: baseline;">
<li style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Color serves no purpose</li>
<li style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">It's very unclear whether values are represented by length, area or volume (thank goodness they included numbers)</li>
<li style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">The top value is (visually) about 100 times bigger than the bottom one but actually less than 5 times bigger in value.</li>
<li style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">I need another legend to tell me where all these regions are</li>
<li style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">Why, exactly, is it a funnel ? What does that imply? The NorthEast feeds the South which feeds into Central...</li>
<li style="border: 0px; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">It has no contextual information. Perhaps Northwest is the smallest because that is our smallest market ?</li>
</ul>
<div>
<span style="color: #333333; font-family: Arial, sans-serif;"><span style="font-size: 15px; line-height: 20px;"><br /></span></span></div>
<div>
<span style="color: #333333; font-family: Arial, sans-serif;"><span style="font-size: 15px; line-height: 20px;"><br /></span></span></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<img alt="" class="left" height="248" src="https://media.licdn.com/mpr/mpr/p/8/005/056/2ce/2420d35.jpg" style="border: 0px; float: left; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline; width: auto;" width="317" /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Here's an example we will be working with in the conference . It's very hard to read, slow to use, <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">easy to make mistakes with</em>and distinctly <em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">over-dressed.</em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><br /></em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><br /></em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><br /></em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<em style="border: 0px; clear: right; float: right; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin-bottom: 1em; margin-left: 1em; margin-top: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><img alt="" class="left" src="https://media.licdn.com/mpr/mpr/p/5/005/056/2cb/2169f74.jpg" style="border: 0px; float: left; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px 1em 1em 0px; max-width: 606px; outline: 0px; padding: 0px; vertical-align: baseline;" /></em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">And exactly the same data once it's been stripped bare (below). It's now easy/quick to read, practically error-proof, has no distracting "chart junk" and has contextual data (budget) to understand what "good" is.</em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<br /></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
My interest in visualization is in enabling action from my analytic work. As a consultant, you may think that I get paid whether a client implements my work or not. That may be true, but I like to get paid more than once by the same client.</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
<em style="border: 0px; font-family: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">If you're going to be at the conference next week, drop by and see me: Supply Chain, Analytics and Visualization are among my favorite discussion topics.</em></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
I'll be posting more on this over the next few months but if you're looking for more right now, here are some excellent resources:</div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Stephew Few's blog, <a href="http://www.perceptualedge.com/blog/" style="border: 0px; color: #006699; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Visual Business Intelligence</a></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Kaiser Fung's blog, <a href="http://junkcharts.typepad.com/" target="_blank">Junk Charts</a></div>
<div style="background-color: white; border: 0px; color: #333333; font-family: Arial, sans-serif; font-size: 15px; line-height: 20px; margin-bottom: 1em; margin-top: 1em; outline: 0px; padding: 0px; vertical-align: baseline;">
Nathan Yau's blog, <a href="http://flowingdata.com/" style="border: 0px; color: #006699; font-family: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; outline: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Flowing Data</a></div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-75785924984780425862014-04-14T07:00:00.000-07:002014-04-17T09:13:39.539-07:00Next Generation DSRs - Scale Out !!Last week, I posted my thoughts on how new technology enables a simpler and faster database to support your DSR applications.<a href="http://andrewg-crabtreeanalytics.blogspot.com/2014/04/next-generation-dsrs-data-handling.html" target="_blank">Next Generations DSRs (data handling)</a>. Over the next few posts I'll extend that idea to show how speed and simplicity are essential to your personal productivity, user experience and the ability to apply powerful analytic tools to your data.<br />
<br />
In the meantime though I came across a great post in Rob Klopp's "Database Fog Blog" regarding Redshift. Redshift is Amazon's cloud-based, columnar, parallel database. <br />
<br />
Remember that my interest in database technology is all about feeding my insatiable desire for data to drive value-added analytics, my own area of expertise. To that end, I have become adept in a number of programming languages and relational database systems and while I'm a lot better than "competent" I am not "expert". Rob clearly is an expert in this field and I will be following his posts carefully.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhptsgSmFZ6kjQZ-RzqTA0ij_nVVzSHYkL3yJg_ooo0opigBeJvRSHozSxrHdOTiwifYBx3Eo-zpFVyXYBqPXMX1WYSvqaJnZkDukh0OfmZyNiqLHd-R54vfK8tby33CBRpkDHUcNCmCDo/s1600/servers.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhptsgSmFZ6kjQZ-RzqTA0ij_nVVzSHYkL3yJg_ooo0opigBeJvRSHozSxrHdOTiwifYBx3Eo-zpFVyXYBqPXMX1WYSvqaJnZkDukh0OfmZyNiqLHd-R54vfK8tby33CBRpkDHUcNCmCDo/s1600/servers.jpg" height="320" width="300" /></a></div>
Here's a highlight from his post <a href="http://robklopp.wordpress.com/2013/03/11/thoughts-on-aws-redshift/" target="_blank">Thoughts on AWS Redshift</a>:<br />
<blockquote class="tr_bq">
... if you can add nodes and <a href="http://en.wikipedia.org/wiki/Scalability">scale out</a> to improve query response then why not throw hardware at performance problems rather than build a fragile infrastructure of aggregate tables, cubes, pre-joined/de-normalized marts, materialized views, indexes, etc. Each of these performance workarounds are both expensive to build and expensive to operate.</blockquote>
He goes on to talk about why scale-out has not been generally adopted and how Amazon Redshift changes the game by making it easy to acquire and release processing power on demand. <br />
<br />
The answer does not have to be <a href="http://aws.amazon.com/redshift/" target="_blank">Redshift</a>, perhaps it's <a href="http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html" target="_blank">Impala </a>or <a href="http://www.zdnet.com/microsoft-sql-server-2014-released-to-manufacturing-7000027439/" target="_blank">Hekaton</a> or... whatever. Bottom line for me is that new technology enables DSR's that are simpler and faster and that creates a fundamental shift in system capability.<br />
<br />
FYI - I have done some DSR-scale testing with Redshift and the results were very impressive. More on that soon.<br />
<br />
<br />Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com1tag:blogger.com,1999:blog-3663282854138469774.post-81028785696922738742014-04-07T07:00:00.000-07:002014-04-17T06:43:06.579-07:00Next-Generation DSRs - data handlingThis post continues my look at the <a href="http://andrewg-crabtreeanalytics.blogspot.com/2014/03/the-next-generation-dsr.html">Next Generation DSR</a>. A DSR (Demand Signal Repository) holds data, typically Point of Sale data, and that data volume is big, not Google-search-engine big, but compared to a CPG's transaction systems, it's huge. Furthermore, the system is required to rapidly load large quantities of new data, clean it, tie it into known data dimensions and report against it in very limited time-frames.<br />
<br />
But scale and performance needs aside, why have most (though not all) CPGs chosen to buy rather than build the capability? After all, it is primarily a business-intelligence/database application and most businesses run a number of them. One key reason is that it's challenging to get business reporting performance at this data scale from existing technology.<br />
<br />
This post looks at how this problem gets solved today and how newer database technology can change that landscape.<br />
<br />
<a name='more'></a><br />
<div>
<h3>
Handling the data volume (1) Cube it</h3>
<div>
One existing approach to the problem is to use 2 databases, one to store the detailed granular data in relational form and another with data "cubes" containing pre-calculated summaries (aggregations) of the relational data. Most uses of the data will involve working with summaries so you can save users a lot of time by pre-calculating them.</div>
<div>
<br /></div>
<div>
Once built, analyzing data within a cube is fast but you do have to decide a number of things up-front to populate the cube.</div>
<div>
<ul>
<li>what aggregation levels do you need e.g.:</li>
<ul>
<li>county, state, region for store locations</li>
<li>brand, pack-type, category for product</li>
</ul>
<li>what facts do you want included e.g.</li>
<ul>
<li>pos $ and units for the sales cube</li>
<li>on hand and in-transit inventory and forecasts for the supply chain cube.</li>
</ul>
<ul>
</ul>
</ul>
<div>
The more data (and aggregation levels) you add to the cube the longer it will take to build; to take hours is normal, days is not unknown. Additionally, once a cube is built, it is essentially disconnected from any changes in the underlying database until it is next rebuilt. If your master data is assigning the wrong category to a product, fixing it won't help your reports until you rebuild that cube.<br />
<br /></div>
</div>
<div>
<h3>
</h3>
<h3>
Handling the data volume (2) Hyper-complex data models</h3>
</div>
<div>
Logically we can do everything we want in a standard relational database like SQL Server or Oracle: the data structures are not actually that complex: we need master/lookup tables for product, location and time and one fact table to store all the information collected for each product, location and time bucket (POS sales, inventory, store receipts etc.) That's just 4 tables. Yes we could get more complex by adding other data sources with additional dimensions but it would still be a simple structure. Build this in your favorite relational SQL database and it will work but but it is <i>most definitely not fast</i>. </div>
<div>
<br /></div>
<div>
To get speed in these systems, developers have created some very complex, novel but nonetheless effective data-models. (Complex enough that an unwary developer taking their first look inside could be forgiven for a <strike>little</strike> lot of bad language.)</div>
<div>
<br /></div>
<div>
These data structures enable rapid reporting with no intermediary steps, no aggregations, no cubes. Once the data is loaded it is ready to go. Re-load some POS data or change a product category and it is immediately reflected in the next report. Now that is <i>very</i> cool, and for analytic or reporting projects where you need ad-hoc aggregation against product groups that did not exist this morning, and were not 100% correct until the 5th interation sometime this afternoon, a <i>very important feature</i>. </div>
<div>
<br /></div>
<div>
The complexity of the data model comes at a price though. </div>
<div>
<ul>
<li><i>You will probably only ever use the Business Intelligence tool supplied with the DSR</i>. This tool has have been extensively configured, customized, or even written, to handle the complex nature of the data structure it sits upon, Putting another tool on top is a huge investment and would most likely be need an additional, simpler database, either (slow) relational or (slow to build) cubes that would be populated from the main DSR occasionally but otherwise disconnected from the data source and subsequent changes. That rather defeats the point, doesn't it?</li>
<li>These models spread data across a multitude of tables in the database. Not a big problem for most reporting which aggregates each fact table to the desired level (e,g. brand by country) then stitches together the relatively small result sets for a human-readable report. For predictive analytics however, we want the lowest level of data and need all of the facts in the same table before we can start modeling. Sadly, the database just doesn't store it that way, so every analytic project starts with a complex data-manipulation project. </li>
<li><br /></li>
</ul>
<div>
<h3>
</h3>
<h3>
Handling the data volume (3) Next Generation</h3>
</div>
</div>
</div>
<div>
<b>Database technology is evolving rapidly and I believe we are at the point that it can now provide good performance with no pre-aggregation of data, no cubes and a data-model that is easily understood so you can bring your own Business Intelligence tools or analytic apps to bear on it.</b></div>
<div>
<br /></div>
<div>
I'm an analyst not a database expert so I would not want to put too much money on which of the competing approaches will win out longer term but I think the key words to follow here are "columnar", "massively parallel", "in memory" and maybe, perhaps, possibly..."Hadoop".</div>
<div>
<br /></div>
<div>
<b>Columnar databases </b>change the way that data is stored in the database. This makes them relatively slow for transaction updates but dramatically faster for report-style aggregations even with a simple data-model. (See my previous post <span style="color: #d52a33; font-family: Georgia, Utopia, 'Palatino Linotype', Palatino, serif; font-size: x-small;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/the-right-tools-for-structured-big-data.html" style="color: #d52a33; font-family: Georgia, Utopia, 'Palatino Linotype', Palatino, serif;">here</a></span> for an example: )</div>
<div>
<br /></div>
<div>
Existing systems typically run on a single server. Want it to run faster? Then you need to buy a bigger server. MPP (<b>Massively Parallel Processing</b>) systems use clusters of hardware, dividing up the work across multiple, relatively cheap, servers (nodes). If you need more performance add more nodes to the cluster. Do this with a cloud-based service and you can flex the number of nodes in your cluster to meet processing demand: double-up as needed for your data-load or your predictive-model run.</div>
<div>
<br /></div>
<div>
<b>In memory</b> databases deliver speed increases by pulling the data off disk storage and loading the whole thing into memory (and accessing data in memory is certainly much faster than reading it off disk.). I've not tried one of these yet and I would be interested to hear comments from those that have. It sounds good but I don't think the price-point is yet where I could justify the use. 10TB of RAM is certainly uch cheasper than it was 10 years ago, but my gut-feel is that the economics will suggest a hybrid RAM and disk/SSD model for some time to come. There is a thoughtful blog post on SQL Server's new in-memory offering, including a few limitations, <a href="http://www.enterprisetech.com/2014/03/18/microsoft-turbocharges-transactions-hekaton-memory/" target="_blank">here</a>.</div>
<div>
<br />
Finally, let's talk <b>Hadoop. </b>I know it's "sexy" and often appears in the same sentence as "Big-Data" but I'm not yet convinced that it's appropriate for this use where we want rapid response on a very large number of typically small and ad-hoc queries. I could be wrong though, a friend and colleague that I respect has recently moved to Cloudera after a lifetime of SQL/Oracle and is very excited about the possibilities using Hadoop/Hbase/Impala. Looking at these benchmark <a href="https://amplab.cs.berkeley.edu/benchmark/" target="_blank">results</a> comparing a number of Hadoop based systems to Redshift (columnar, mpp) he may well have a point. I will try to keep an open mind.<br />
<br />
Are there other options? You bet! A number I have deliberately ignored and, I'm certain, plenty out I have not heard of, but this set will do the job and if another technology will do it even better now or in 5 years time, great ! <b>The bottom line is that database speed and storage capability is growing faster than the amount of data you want in your DSR. </b> We need to take advantage of it,<br />
<br />
<h3>
So what does this get us ?</h3>
<div>
Using database technology to increase speed and to get a simpler data structure is a big win. Simpler, faster systems come with less maintenance, lower learning curves, more productivity and, I strongly believe, the capability for better insights. Slow response times are an "insight killer" (more on this in an upcoming blog post).</div>
<br />
The simpler data structure means that it's relatively easy to swap out the front-end for the BI, analytics or visualization tool of your choice. Want that data in Business Objects or Tableau? No problem! Connect from R/SPSS/SAS/RapidMiner? Absolutely! <br />
<br />
<h3>
What does this mean for DSR vendors ?</h3>
<ul>
<li>The ability to handle DSR-sized data volume is no longer a competitive advantage.</li>
<li>If it's easy to set up any new BI, visualization or analytic tool against the database providing the best" user interface is of limited value.</li>
<li>Rapidly loading new data is important</li>
<li>Providing clean data is important (and often overlooked)</li>
<li>Helping users navigate the data ocean to find the things that must be done via process specific exceptions and workflow is important.</li>
<li>Helping users drive better decisions by embedding the analytics against the right data in real-time... now that's really important.</li>
</ul>
</div>
<div>
<br /></div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com2tag:blogger.com,1999:blog-3663282854138469774.post-33504275796371213002014-03-31T07:00:00.000-07:002014-04-14T07:24:32.451-07:00The Next-Generation DSRCPGs have had access to Point of Sale (POS) data now for many years and many of them use a Demand Signal Repository (<a href="http://andrewg-crabtreeanalytics.blogspot.com/search/label/DSR" target="_blank">DSR</a>) to gather, clean and report on this data. <i>(Actually most of them use a number of DSR's and even when they do have just one, still can't handle truly cross-retailer analytics). </i><br />
<br />
I've been involved with a number of these systems as a software-buyer, a system-administrator, consultant and most recently, leading the analytic development at <a href="http://www.orchestro.com/" target="_blank">Orchestro</a>. <br />
<br />
There are some excellent tools available and, in their current form, they can help you drive both additional revenue and reduced costs when used well. However, in my experience many of these tools have been sold in under the guise of "saving time" through reporting automation. That's valuable, but it's not <i>"finding a new sales opportunity"</i> valuable.<br />
<br />
I think we are still in the infancy of DSR development: systems are operating at the limits of the technology they were built on and necessary trade-offs mean that being good at one thing (e.g. speed) makes it more challenging to be good at others (e.g. analytics).<br />
<br />
The next generation of DSR can be dramatically more effective. In particular, it will be:<br />
<a name='more'></a><br />
<ul>
<li>much faster while handling much more data</li>
<li>much easier to use</li>
<li>easy to load with CPG's data</li>
<li>easily integrated with additional data feeds (weather, economic time-series, google-trends, twitter feeds, geo-demographic data)</li>
<li>truly cross-retailer</li>
<li>easily integrated with your chosen BI, visualization and ad-hoc analytics tools.</li>
<li>couple rapid data-handling with effective predictive analytics to drive discovery, insight and <b>better decisions.</b></li>
</ul>
<div>
<i>I'm not going to tell you that these ideas are new (</i>or mine). This list provides a very high standard and against it, DSRs have consistently under-delivered.</div>
<div>
<br /></div>
<div>
<i>What I am saying, </i>is that the technology now exists to deliver on the promise. <br />
<br />
<b>Over this upcoming series of posts we'll look at developments in database technology, analytics and visualizaton that will enable DSR 2.0. (Or should that be DSR 3.0?). Sign up for the blog feed and make sure you don't miss it.</b></div>
<form action="http://feedburner.google.com/fb/a/mailverify" method="post" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=blogspot/TkOIj', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true" style="border: 0px solid #ccc; padding: 0px; text-align: left;" target="popupwindow">
<br />
<br />
<br /></form>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com2tag:blogger.com,1999:blog-3663282854138469774.post-46967513042862842962014-03-28T11:27:00.000-07:002014-03-28T12:11:10.396-07:00Back to blogging on "Better Business Analytics"It's been quite a while, just over 12 months in fact since my last blog post. In that time, I've been hard at work developing analytic applications for the <a href="http://www.orchestro.com/">Orchestro</a> DSR. (Orchestro's off-shelf alerting tool is especially cool and something I am very proud of contributing to). I enjoyed my time at Orchestro, they're a good team and have big plans, but one key thing I found out about myself is that I prefer working real-life problems to developing software for someone else to have all the fun :-) <br />
<br />
So, I'm now back full-time on consulting and I will occasionally blog on topics of interest to me. Expect to see more soon on:<br />
<br />
<ul>
<li>Next-generations DSRs (Demand Signal Repositories)</li>
<li>Retail supply-chain analytics</li>
<li>Handling (BIG-ish) data for analytics</li>
<li>The right tools for the job (Predictive Analytics, Business Models, Optimization)</li>
<li>Some more thoughts on store-clustering</li>
<li>Inventory modeling at retail (and why it's different, again)</li>
<li>Order forecasting using POS data</li>
<li>Further thoughts on SNAP and other ignored demand drivers</li>
<li>and if there is something you would like to hear more on ... just drop me a line.</li>
</ul>
<div>
<br /></div>
<br />
<br />Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-70311236158336779952013-02-21T05:59:00.000-08:002013-05-31T09:58:17.083-07:00Business Analytics - finding the balance between complexity and readability<div class="separator" style="clear: both; text-align: left;">
In this blog I try to present analytic material for a non-analytic audience. I focus on point of sale and supply chain analytics: it's a complex area and frankly, it's far too easy whether writing for a blog or presenting to a management-team to slip into the same language I would use with an expert. </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
So, I was inspired by a recent post on Nathan Yau's excellent blog <a href="http://flowingdata.com/" target="_blank">FlowingData</a> to look at the "readability" of my own posts and apply some simple analytics to the results.</div>
<a name='more'></a><br />
<div class="separator" style="clear: both; text-align: left;">
I've followed Nathan's blog for a couple of years now for the many and varied examples of data-visualization he builds and gathers from other sources. One that particularly caught my eye was this one published by the Guardian just before the recent State of the Union address in the United States (click to enlarge).</div>
<blockquote class="tr_bq">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7M-zCGklc5FKxmmC209D9XN6Cwy4bDWanRyNJFXPNZ1_WxCNuSRMMZIjvb-6bIn7r3C-4f1k46N_-4WXcannIeIUp80xBxV2zsdx1MnshYXbSSoztWImrC-vgQjcgRxY8bKBZOBD87Bc/s1600/GuardianAnalysis.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="349" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7M-zCGklc5FKxmmC209D9XN6Cwy4bDWanRyNJFXPNZ1_WxCNuSRMMZIjvb-6bIn7r3C-4f1k46N_-4WXcannIeIUp80xBxV2zsdx1MnshYXbSSoztWImrC-vgQjcgRxY8bKBZOBD87Bc/s640/GuardianAnalysis.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: justify;"><em style="background-color: white; font-family: Arial, sans-serif; font-size: 13.333333015441895px; line-height: 19.79166603088379px; text-align: start;">The Guardian</em><span style="background-color: white; font-family: Arial, sans-serif; font-size: 13.333333015441895px; line-height: 19.79166603088379px; text-align: start;"> </span><a href="http://www.guardian.co.uk/world/interactive/2013/feb/12/state-of-the-union-reading-level" style="background-color: white; color: #821122; font-family: Arial, sans-serif; font-size: 13.333333015441895px; line-height: 19.79166603088379px; text-align: start;" target="_blank">plotted the Flesch-Kincaid grade levels for past addresses</a><span style="background-color: white; font-family: Arial, sans-serif; font-size: 13.333333015441895px; line-height: 19.79166603088379px; text-align: start;">. Each circle represents a state of the union and is sized by the number of words used. Color is used to provide separation between presidents. For example, Obama's state of the union last year was around the eighth-grade level, and in contrast, James Madison's 1815 address had a reading level of 25.3.</span><span style="font-size: xx-small; text-align: start;"> </span></td></tr>
</tbody></table>
</blockquote>
Neither the original post nor Nathan's go into much detail around why the linguistic standard has declined. Within this period, the nature of the address and the intended audience has certainly changed. Frankly, having scanned a few of the earlier addresses I think we can all be grateful not to be on the receiving end of one of them.<br />
<br />
So, <b>I was inspired to find out the reading level of my own blog</b>. It's intended to present analytic concepts to a non-analytic audience. I can probably go a little higher than recent presidential addresses (8th-10th grades, roughly ages 13-15) but I don't want to be writing college-level material either.<br />
<br />
All the books my kids read are graded in this (or a very similar) way but I had never thought about how such a grading system could be constructed. The <a href="http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_test" target="_blank">Flesch-Kincaid</a> grade level estimate is based on a simple formula:<br />
<br />
<div style="text-align: center;">
<img alt="
0.39 \left ( \frac{\mbox{total words}}{\mbox{total sentences}} \right ) + 11.8 \left ( \frac{\mbox{total syllables}}{\mbox{total words}} \right ) - 15.59
" src="http://upload.wikimedia.org/math/a/3/a/a3a80e6e52fda2b5f7647a451c9c6c13.png" /></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
That's just a linear combination of : </div>
<div style="text-align: left;">
</div>
<ul>
<li>average words per sentence;</li>
<li>average syllables per word</li>
<li>a constant term.</li>
</ul>
<div>
In fact (though I have not yet found details of how it was constructed) it looks to be the result of a regression model. (Simple) data science in action from the 1970's.</div>
<div>
<br /></div>
<div>
<b>Note that Flesch-Kincaid says nothing about the length of the book or the nature of the vocabulary it's all down to long sentences and the presence of multi-syllabic words.</b> <br />
<br />
(BTW - the preceding sentence has a Flesch-Kincaid grade score of <span style="font-family: sans-serif; text-align: -webkit-right;">13.63,</span> calculated with this online <a href="http://www.online-utility.org/english/readability_test_and_improve.jsp" target="_blank">utility</a>). Now that's pretty high, worthy of an early 1900's president and (supposedly) understandable by young college students. The sentence is longer than typical; 31 words vs. my average of 18 (see below) and words like "vocabulary", "sentences" and "multi-syllabic" are not helping me either.</div>
<div>
<br /></div>
<h3>
Approach</h3>
<div>
I could have used copy/paste into the online utility I used above, recorded the results in a spreadsheet and pulled some stats from that. That would work, but if I ever want to repeat the exercise or modify it, perhaps to use a different readability index, I must do all that work again. At the time of writing, there are currently 44 published posts on this blog - there must be a better way.</div>
<div>
<br /></div>
<div>
Actually there are probably many better ways but as I also wanted to flex some <a href="http://en.wikipedia.org/wiki/R_(programming_language)" target="_blank">R</a>-programming muscle I built a web-scraper in R to do the work for me and analyze the results (more on this in a later post).<br />
<br /></div>
<h4>
</h4>
<h3>
</h3>
<h3>
Results</h3>
<div>
Let's start with some simple summaries of the results I collected.</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUj463-J_hdrFtBvOhAJFKC6H3vtCfLTdt0w8OsApDzQphcpBDPdkqKMXeG4xcrV1_gBOI7BzwvY6j9JJdwuEkTWkngb2lS7c1wUrOoGbvLg0YmRLwzE8NZsuYJWRS8q64_QqQ3aTowLM/s1600/histograms.png" imageanchor="1" style="margin-left: auto; margin-right: auto; text-align: center;"><img border="0" height="539" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUj463-J_hdrFtBvOhAJFKC6H3vtCfLTdt0w8OsApDzQphcpBDPdkqKMXeG4xcrV1_gBOI7BzwvY6j9JJdwuEkTWkngb2lS7c1wUrOoGbvLg0YmRLwzE8NZsuYJWRS8q64_QqQ3aTowLM/s640/histograms.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><div style="text-align: justify;">
<b>Histograms showing the % of posts from this blog (prior to 2/14/13)</b>, the average (mean) value shown in red. <span style="font-size: xx-small; text-align: start;">There is some variety in the grade reading level indicated by Flesch-Kincaid for my blog posts, averaging around 10 but ranging from 7 through 14. I average about 750 words, but occasionally go much longer and have a number of very short "announcement" style posts. Average words per sentence of 18.</span><br />
<span style="font-size: xx-small; text-align: start;"><br /></span></div>
</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
OK, so now I know, but is that good? I don't know that I have a definitive source but according to at least one <a href="http://www.kerryr.net/webwriting/tools_readability.htm" target="_blank">source</a> the target range on Flesch-Kincaid for Techical or Industry readers is 7-12, so I'm feeling pretty good about that.</div>
<div>
<br /></div>
<div>
I did wonder whether there was any other, hidden, structure to the data though. I know the equation is based on words per sentence and syllables per word so there is no point looking at those, obviously I'll find a relationship. But is my writing style influenced by anything else?</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmdb56GrJsX07N5tcMBSX0c93uzqnU3H_QDeZ7iYuWYkx8wjidHtXDhD2iEtLmekv5mrfG55yas5wWHehnOGCQZ86M0aB-7NkvZwkztMOH0PAGqQObchk1dwqLVLeli45-IK_MP9VAJbM/s1600/gradeLevelVWords.png" imageanchor="1" style="margin-left: auto; margin-right: auto; text-align: center;"><img border="0" height="515" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmdb56GrJsX07N5tcMBSX0c93uzqnU3H_QDeZ7iYuWYkx8wjidHtXDhD2iEtLmekv5mrfG55yas5wWHehnOGCQZ86M0aB-7NkvZwkztMOH0PAGqQObchk1dwqLVLeli45-IK_MP9VAJbM/s640/gradeLevelVWords.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><div style="text-align: left;">
<b><span style="font-family: inherit;">Flesch-Kincaid grade level vs. the number of words by post on this blog. </span></b> <span style="font-size: xx-small; text-align: start;">Other than<span style="font-family: inherit;"> a h</span>andful of long posts that rate lower in the range 8-10, I don't see much going on here.</span></div>
</td></tr>
</tbody></table>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCYXyCFxWG8Xlhy3Kd14Ev1bTFmV4dXl0b0H0C6jEbe_MouR-TrmUEb93EePiX79jX8g0HtlNDxZR6IwageEFSVslcGsnjjMGmlFDa3M-RLnP7omYGiUoVTosinC-hWRFsq8JetBVzQ9c/s1600/gradeLevelVPublicationWithTrendAndLegend.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="515" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCYXyCFxWG8Xlhy3Kd14Ev1bTFmV4dXl0b0H0C6jEbe_MouR-TrmUEb93EePiX79jX8g0HtlNDxZR6IwageEFSVslcGsnjjMGmlFDa3M-RLnP7omYGiUoVTosinC-hWRFsq8JetBVzQ9c/s640/gradeLevelVPublicationWithTrendAndLegend.png" width="640" /></a></td></tr>
<tr><td class="tr-caption"><b>Flesch-Kincaid grade level vs. the publication date by post on this blog. </b> The size of each post (in words) is shown by the area of each point, color is used purely to help visually differentiate each of the points. Apart from a couple of recent "complex" posts this does seem to be showing a trend, so I added a regression line and labeled the more extreme posts. Point (b) is a very short "announcement" style post (you can hardly see the point at all) and I could probably ignore it completely. Point (e) is a more fun piece I did around using pie-charts that's probably not very representative of the general material either.<br />
<br /></td></tr>
</tbody></table>
</div>
<div style="text-align: left;">
</div>
<div>
<br />
<br />
<br />
<br />
<br />
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
<br />
<div>
<div>
If you want to compare readability for yourself here are the top (and bottom) posts ranked by Flesch-Kincaid grade level<br />
<div style="text-align: center;">
<b><br /></b></div>
<table border="0" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; margin-left: 4.65pt; width: 613px;"><tbody>
<tr style="height: 39pt;"><td style="border: 1pt solid windowtext; height: 39pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-size: 10pt;">Rank<o:p></o:p></span></b></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 39pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
<b><span style="font-size: 10pt;">Post<o:p></o:p></span></b></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 39pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
<b><span style="font-size: 10pt;"> Flesch-Kincaid grade level<o:p></o:p></span></b></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 39pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
<b><span style="font-size: 10pt;">words<o:p></o:p></span></b></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 39pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="center" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: center;">
<b><span style="font-size: 10pt;">sentences<o:p></o:p></span></b></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">1<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/05/analytic-tools-so-easy-10-year-old-can.html"><span style="color: blue;">Analytic tools "so easy a 10 year-old can use it" </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">13.3<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">784<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">33<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">2<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/01/point-of-sale-analytics-newsletter.html"><span style="color: blue;">Point of Sale Analytics - newsletter released </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">13.1<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">82<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">4<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">3<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/06/point-of-sale-data-category-analytics.html"><span style="color: blue;">Point of Sale Data – Category Analytics </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">12.8<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">676<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">29<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">4<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/10/how-to-save-real-money-in-truckload.html"><span style="color: blue;">How to save real money in truckload freight (Part I) </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">12.8<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">723<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">31<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">5<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/04/primary-analytics-practitioner.html"><span style="color: blue;">The Primary Analytics Practitioner </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">12.7<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">541<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">29<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">6<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/04/reporting-is-not-analysis.html"><span style="color: blue;">Reporting is NOT Analytics </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">12.4<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">891<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">43<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">7<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/06/point-of-sale-data-sales-analytics.html"><span style="color: blue;">Point of Sale Data – Sales Analytics </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">12.1<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">478<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">24<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">8<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/03/right-tool-for-job.html"><span style="color: blue;">Data handling - the right tool for the job </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">11.9<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">762<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">38<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">9<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/03/data-cleansing-boring-painful-tedious.html"><span style="color: blue;">Data Cleansing: boring, painful, tedious and very, very important </span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">11.8<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">297<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">16<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">10<o:p></o:p></span></div>
</td><td style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/06/point-of-sale-data-supply-chain.html"><span style="color: blue;">Point of Sale Data – Supply Chain Analytics</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">11.6<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">958<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">41<o:p></o:p></span></div>
</td></tr>
<tr style="height: 9pt;"><td nowrap="" style="height: 9pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"></td><td nowrap="" style="height: 9pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"></td><td nowrap="" style="height: 9pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"></td><td nowrap="" style="height: 9pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"></td><td nowrap="" style="height: 9pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"></td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border: 1pt solid windowtext; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">35<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/the-right-tools-for-big-structured-data.html"><span style="color: blue;">The right tools for (structured) BIG DATA handling</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 9.0<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">1878<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: solid solid solid none; border-top-color: windowtext; border-top-width: 1pt; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">114<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">36<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/10/better-weekly-point-of-sale-reports.html"><span style="color: blue;">Better Point of Sale Reports with "Variance Analysis": Velocity...</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 8.9<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">1264<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">78<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">37<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/12/better-point-of-sale-reports-with.html"><span style="color: blue;">Better Point of Sale Reports with Variance Analysis (update)</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 8.5<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">177<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">10<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">38<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/11/better-business-reporting-in-excel.html"><span style="color: blue;">Better Business Reporting in Excel - XLReportGrids 1.0 released</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 8.4<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">70<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">5<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">39<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/08/what-drives-your-sales-snap.html"><span style="color: blue;">What's driving your Sales? SNAP?</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 8.3<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">651<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">42<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">40<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/10/do-you-really-need-daily-point-of-sale.html"><span style="color: blue;">Do you need daily Point of Sale data?...</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 8.2<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">1395<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">83<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">41<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/11/snap-analytics-1-funding-and-spikes.html"><span style="color: blue;">SNAP Analytics (1) - Funding and spikes.</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 8.1<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">531<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">32<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">42<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/11/snap-analytics-2-purchase-patterns.html"><span style="color: blue;">SNAP Analytics (2) - Purchase Patterns</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 7.9<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">773<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">44<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">43<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/business-analytics-right-tool-for-job.html"><span style="color: blue;">Business Analytics - The Right Tool For The Job</span></a><o:p></o:p></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 7.6<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">483<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">36<o:p></o:p></span></div>
</td></tr>
<tr style="height: 15pt;"><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-left-color: windowtext; border-left-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid; height: 15pt; padding: 0in 5.4pt; width: 31.55pt;" valign="bottom" width="42"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-size: 10pt;">44<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 284.2pt;" valign="bottom" width="379"><div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<u><span style="color: blue; font-size: 10pt;"><a href="http://andrewg-crabtreeanalytics.blogspot.com/2012/05/are-pie-charts-truly-evil-or-just.html"><span style="color: blue;">Are pie charts truly evil or just misunderstood ?</span></a></span></u></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 49.5pt;" valign="bottom" width="66"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;"> 7.1</span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 40.5pt;" valign="bottom" width="54"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">1097<o:p></o:p></span></div>
</td><td nowrap="" style="border-bottom-color: windowtext; border-bottom-width: 1pt; border-right-color: windowtext; border-right-width: 1pt; border-style: none solid solid none; height: 15pt; padding: 0in 5.4pt; width: 0.75in;" valign="bottom" width="72"><div align="right" class="MsoNormal" style="margin-bottom: 0.0001pt; text-align: right;">
<span style="font-size: 10pt;">70</span></div>
</td></tr>
</tbody></table>
</div>
<h3>
</h3>
<h3>
Conclusions</h3>
<b>It appears that my material is (largely) written at a level that should be accessible to the reader. </b> And I am using more readable language in recent blogs which sounds like a good thing. <br />
<br />
But there remains a key question for me that these stats can't really answer.<b> Am I getting better at explaining the </b><b>complex (my goal) or just explaining simpler things ? What do you think ?</b><br />
<b><br /></b>
<b>In case you are wondering, this post has a Flesch-Kincaid grade level of about 8. So if you can follow the "State of the Union" address you should have been just fine with this.</b><br />
<br /></div>
<div>
<br /></div>
<div>
</div>
</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com2tag:blogger.com,1999:blog-3663282854138469774.post-79544553571933658522013-02-18T10:05:00.000-08:002013-05-31T09:59:12.728-07:00The right tools for (structured) BIG DATA handling (update)A couple of weeks ago, I ran a somewhat rough benchmark to show just how much faster large database queries can run if you use better tools. <br />
<blockquote class="tr_bq">
<a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/the-right-tools-for-big-structured-data.html" target="_blank">The right tools for (structured) BIG DATA handling</a><span style="background-color: #fcfbf5; color: #333333; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 12.727272033691406px; line-height: 16.363636016845703px;"> Here's the scenario: you are a business analyst charged with providing reporting and basic analytics on more data than you know how to handle - and you need to do it without the combined resources of your IT department being placed at your disposal. Sounds familiar?</span></blockquote>
I looked at the value of upgrading hard-drives (to make sure the CPU is actually busy) and the benefit of using columnar storage which let's the database pull back data in larger chunks and with fewer trips to the hard-drive. The results were ..staggering. A combined 4100% increase in processing speed so that I could read and aggregate 10 facts from a base table with over 40 million records on my laptop in just 37 seconds.<br />
<br />
At the time I promised an update on a significantly larger data-set to see whether the original results scaled well. I also wanted to see whether query times scaled well to fewer facts. Ideally querying against 5 facts should take about 50% of the original 10 fact aggregation queries.<br />
<br />
<a name='more'></a><br />
<br />
<h3>
Test environment</h3>
My test environment remains the same, a mid-range laptop, quad-core AMD CPU, with 8 GB of RAM running Windows 7 (64 bit) and with a relatively cheap (<$400) fast solid-state drive.<br />
<br />
<b>This time though I increased the data-quantity 10-fold to 416 million records</b><br />
<br />
Then I ran the same aggregation SQL to pull back summaries of 10 facts from this table.<br />
<blockquote class="tr_bq">
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.63636302947998px; line-height: 16.363636016845703px;">SELECT Item.Category, Period.Year, SUM(POSFacts.Fact1) AS Fact1, SUM(POSFacts.Fact2) AS Fact2, SUM(POSFacts.Fact3) AS Fact3, SUM(POSFacts.Fact4) AS Fact4, SUM(POSFacts.Fact5) AS Fact5, SUM(POSFacts.Fact6) AS Fact6, SUM(POSFacts.Fact7) AS Fact7, SUM(POSFacts.Fact8) AS Fact8, SUM(POSFacts.Fact9) AS Fact9, SUM(POSFacts.Fact10) AS Fact10 </span><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.63636302947998px; line-height: 16.363636016845703px;">FROM Item INNER JOIN POSFacts ON Item.ItemID = POSFacts.ItemID INNER JOIN Period ON POSFacts.PeriodID = Period.PeriodID </span><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px; line-height: 16.363636016845703px;">GROUP BY Item.Category, Period.Year</span></blockquote>
<div>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px; line-height: 16.363636016845703px;"><br /></span></div>
<div>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px; line-height: 16.363636016845703px;">I repeated this timed exercise 5 times for:</span></div>
<div>
<ul>
<li>standard (row-based) SQL Server 2012</li>
<li>SQL Server 2012 with the ColumnStoreIndex applied</li>
<li>InfiniDb (a purpose built column-store database)</li>
</ul>
</div>
<div>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px; line-height: 16.363636016845703px;"><br /></span></div>
<div>
Finally I ran it again on each configuration but just summarizing for 1 fact:</div>
<blockquote class="tr_bq">
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.63636302947998px; line-height: 16.363636016845703px;">SELECT Item.Category, Period.Year, SUM(POSFacts.Fact1) </span><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.63636302947998px; line-height: 16.363636016845703px;">FROM Item INNER JOIN POSFacts ON Item.ItemID = POSFacts.ItemID INNER JOIN Period ON POSFacts.PeriodID = Period.PeriodID </span><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px; line-height: 16.363636016845703px;">GROUP BY Item.Category, Period.Year</span></blockquote>
<br />
<h3>
Results</h3>
<div>
Before we get to the query timing let's look at what was happening to my machine while queries were running. </div>
<div>
<br /></div>
<div>
The first screen-shot below (click to enlarge) was taken while running queries with base SQL-Server (no column store indexes). You can see that the CPU is just not busy. In fact it's averaging only 30% and that's with the solid-state disk installed. The drive is busy, but only serving up about 50MB/s. (I say "only" but of course that's much better than the old hard-drive.)</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTyBCOBsVGvH124BURYcOVlhcSOEzhn_wUavV8VDTM5auPHZuIstJXxJXTZOeH4gDNyxl7MN3eLkJkbaitmwqaYYCXSC61BW-FrG8KiWJ4ovebgSriQZ_ALWmHB-ZvH8dGrpsiTazlTMY/s1600/Resource+Monitor+during+base+sql+2012+run.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTyBCOBsVGvH124BURYcOVlhcSOEzhn_wUavV8VDTM5auPHZuIstJXxJXTZOeH4gDNyxl7MN3eLkJkbaitmwqaYYCXSC61BW-FrG8KiWJ4ovebgSriQZ_ALWmHB-ZvH8dGrpsiTazlTMY/s400/Resource+Monitor+during+base+sql+2012+run.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">System resources running base SQL Server query</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
The next screenshot shows system resources while running a query with the ColumnStore Index applied. The CPU is now busy on average 77% of the time and peaking at 100% on occasion. The disk utilization chart may be misleading because it's now plotted on a much larger scale but the same disk is now hitting 200MB/s. I think we can expect great things !</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2EhGBh9ajXJOQ8PUI5xBuLFvsQ9LWB7na_tva330OdhH4Wf1rwr6NcyYcyhlvaAim91zkgkQZoDJdH5mHed9LiJCaUrH5TiG7AVWkFLkbSg4_PUX5UmuQPdX533b6rISxC69WABtzScs/s1600/Resource+Monitor+during+colstoreindex+sql+2012+run.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2EhGBh9ajXJOQ8PUI5xBuLFvsQ9LWB7na_tva330OdhH4Wf1rwr6NcyYcyhlvaAim91zkgkQZoDJdH5mHed9LiJCaUrH5TiG7AVWkFLkbSg4_PUX5UmuQPdX533b6rISxC69WABtzScs/s400/Resource+Monitor+during+colstoreindex+sql+2012+run.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">System resources running SQL Server with the ColumnStore Index</td></tr>
</tbody></table>
<br />
<br />
So, on to the timed results. I ran each scenario 5 times and all results were very consistent, within +/-10% of the average.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjoOQmSTRwQcvB8aOLjSeKR0eBXxM7_P2H-j7yKs6tppTZxCKvBKRqyv3RtJa3CFZ19q4xLiMaL5a0QddflQMfPZvuStmNaEuCqMeOed-P8L8Vn3XDeZ0CyD-17cyYhZkR2u5pMaGels/s1600/Results_chart.png" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-left: auto; margin-right: auto; text-align: center;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqjoOQmSTRwQcvB8aOLjSeKR0eBXxM7_P2H-j7yKs6tppTZxCKvBKRqyv3RtJa3CFZ19q4xLiMaL5a0QddflQMfPZvuStmNaEuCqMeOed-P8L8Vn3XDeZ0CyD-17cyYhZkR2u5pMaGels/s1600/Results_chart.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Test results against 416 million records</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="margin-left: 1em; margin-right: 1em; text-align: left;">
</div>
<br />
Again, SQL Server 2012 with the Columnstore Index is the clear winner. Just 217 seconds to aggregate all 10 facts and, amazingly, just 21 seconds to aggregate 1 fact across the same 416 million records. InfiniDb takes over twice as long against 10 facts and does not scale nearly as well with the single fact query. <br />
<br />
<br />
Now compare with the results we got last time to see how well each database scaled with the increase in data volume.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_MgKT7KLrGO-0dvZIbQsjVdhmlT10ORwE7OKqz5TMFOjiTITQXwjtIBNunfkrdujWNFoxy394K0-kuZVOWVlH1DnPqx0pRbjnfNXp1fEefcT5I4yLdoPMoxFeAG1NMEKOSk5q5HYBfyY/s1600/Benchmark_comparisons.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="150" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_MgKT7KLrGO-0dvZIbQsjVdhmlT10ORwE7OKqz5TMFOjiTITQXwjtIBNunfkrdujWNFoxy394K0-kuZVOWVlH1DnPqx0pRbjnfNXp1fEefcT5I4yLdoPMoxFeAG1NMEKOSk5q5HYBfyY/s640/Benchmark_comparisons.png" width="640" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
Data volume increase by a factor of 10 and:<br />
<br />
<ul>
<li>InfiniDb and base SQL server both increased query time by about a factor of about 10, roughly in proportion.</li>
<li>SQL server with the ColumnStore index only increased by a factor of 5.9 ! </li>
</ul>
<br />
<br />
To be fair I am comparing the (free) community edition of InfiniDB against (decidedly not free) SQL Server and neither tool is really intended to be run on a laptop. But if you need rapid aggregation of data and do not have access to a cluster of commodity servers - it is clear that columnar storage helps you get that data out <u>fast</u>.<br />
<br />
The other thing you may want to consider is that it took me substantially less time to load the data into InfiniDB (sorry I did not time it but we're talking minutes not seconds), and building that ColumnStore index in SQL actually took longer than the base query ~ 4500 seconds. You may not want to go to this trouble if you just need a couple of quick aggregations.<br />
<br />
Remember also that the table with the ColumnStore index is read-only after the index is applied. Want to make some updates? That would be easier in InfiniDB.<br />
<br />
<h3>
Conclusions</h3>
<div>
Ultimately I'm not trying to sell you on either option, but if you have a lot of structured data to feed your analytic project, a columnar database may well be the way to go right now. </div>
<div>
<br /></div>
<br />
<br />
<br />
<br />
<br />Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0tag:blogger.com,1999:blog-3663282854138469774.post-48928129301387528502013-02-11T06:35:00.000-08:002013-05-31T09:59:25.669-07:00The right tools for (structured) BIG DATA handlingHere's the scenario: you are a business analyst charged with providing reporting and basic analytics on more data than you know how to handle - and you need to do it without the combined resources of your IT department being placed at your disposal. Sounds familiar?<br />
<br />
Let's use Point of Sale data as an example as POS data can easily generates more data-volume than the ERP system. The data is simple and easily organized in conventional relational database tables - you have a number of "facts" (sales-revenue, sales-units, inventory, etc.) defined by product, store and day going back a few years and then some additional information about products, stores and time stored in master ("dimension") tables,<br />
<br />
The problem is that you have thousands of stores, thousands of products and hundreds (if not thousands) of days - <b>this can very quickly feel like "big data". Use the right tools and my rough benchmarks suggests you can not only handle the data but see a <u>huge</u> increase in speed.</b><br />
<b></b><br />
<a name='more'></a><br />
Let's see just how big this data could be:<br />
<blockquote class="tr_bq">
<i>If on each day, you collect 10 facts for 1,000 products at 1,000 stores that would be 10 million facts every day (10 x 1000 x 1000) . Look at it annually , that's 3.65 billion facts every year. </i></blockquote>
Is it big compared to an index of the world-wide-web? No it's tiny, but in comparison to the data a business analyst normally encounters it's not just "big" its "enormous". Just handling basic data manipulation (joins, filters, aggregation etc,) is a problem. <b>Trying to handle this in desktop tools like Excel, or Access is completely impossible</b>. <br />
<br />
As usual, there are better tools and worse tools - you must use a database, but even with a conventional server-based database like Microsoft's SQL*Server, you may have problems with speed. <b>I wanted to see how speed is impacted, firstly by upgrading the hard-drive and second by using two varieties of column-store databases. </b><br />
<br />
A couple of relatively simple changes and bench-marking shows a<b> 4100% increase in speed. </b>If a 4100% increase does not indicate to you that there may be a better tool for the job, I don't know what will.<br />
<div>
<br /></div>
Running analytics against this data (once delivered from a tool that has joined, filtered and aggregated) appropriately is another challenge that we will get to in a later post.<br />
<br />
<b>First a little disclosure:</b> I am first and foremost an analyst: my technologies of choice are statistics, mathematics, data-mining, predictive-modeling, operations-research,... NOT databases and NOT hardware-engineering. To feed my need for data I have become adept in a number of programming languages and relational database systems. I'm most comfortable in SQL Server just because I'm more familiar with that tool though I have used other databases too. Bottom line, I'm a lot better than "competent" but I am not "expert". <br />
<div>
<br />
<h3>
Test environment</h3>
I built a test database in SQL Server 2012 with 4 tables in a simple "star schema": 1 "fact" table with 10 facts per record and 3 associated "dimension" tables as follows:<br />
<div>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px; margin-left: 1em; margin-right: 1em;">
<img alt="Inline image 1" class="" src="https://mail.google.com/mail/u/0/?ui=2&ik=d567623692&view=att&th=13afb9956a3d571d&attid=0.4&disp=emb&realattid=ii_13afb8a6d5218979&zw&atsh=1" /></div>
<br />
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
The data itself is junk I generated randomly in SQLServer with appropriate keys and indexes defined. </div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white;">
<div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
This represents approximately 8 GB of data. Not enormous (and as you will see later) perhaps not big enough to test one of the options fully, but big enough to get started and much bigger than many analysts ever see.</div>
<div style="color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
I'm testing this on a mid-range laptop, quad-core AMD CPU, with 8 GB of RAM running Windows 7 (64 bit) that cost substantially less than $1000 new. You probably have something very like it sat on your desk.</div>
</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white;">
I then wanted to see how long it would take to take to perform a simple aggregation. My test SQL (below) joins the fact table to both the product and period dimension tables then adds each fact (1 thru 10) for each year and brand. Not very exciting perhaps but a very common question<span style="color: #222222; font-family: arial, sans-serif; font-size: x-small;"> </span><i style="color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">"what did I sell by brand by year".</i></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; width: 366px;"><tbody>
<tr height="20" style="height: 15.0pt;">
<td height="20" style="height: 15.0pt; width: 275pt;" width="366"><blockquote class="tr_bq">
SELECT
Item.Category, Period.Year, SUM(POSFacts.Fact1) AS Fact1, SUM(POSFacts.Fact2)
AS Fact2, SUM(POSFacts.Fact3) AS Fact3, SUM(POSFacts.Fact4) AS Fact4,
SUM(POSFacts.Fact5) AS Fact5, SUM(POSFacts.Fact6) AS Fact6,
SUM(POSFacts.Fact7) AS Fact7, SUM(POSFacts.Fact8) AS Fact8,
SUM(POSFacts.Fact9) AS Fact9, SUM(POSFacts.Fact10) AS Fact10 FROM Item INNER
JOIN POSFacts ON Item.ItemID = POSFacts.ItemID INNER JOIN Period ON
POSFacts.PeriodID = Period.PeriodID <span style="color: #222222; font-size: 13.333333969116211px;">GROUP BY Item.Category, Period.Year</span></blockquote>
</td></tr>
</tbody></table>
</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
Each run was repeated 5 times and the elapsed time for each run averaged to get the results shown below. While there was variation in run times, this was typically within about 10% of the average for each test.<br />
<br />
<h4>
Baseline</h4>
<div>
This is my starting point: SQL Server 2012 in its regular row storage mode. </div>
<div>
<br /></div>
<br />
<h4>
Faster Storage</h4>
<div>
This database query is going to need a lot of data from the hard-disk; actually almost all the data in the database. The hard-drive that came with my laptop was not especially slow but it was clear that it was a bottleneck on my system. While running this query the standard disk could not deliver data fast enough to keep the CPU busy - in fact the CPU was rarely operating at even 50% capacity. An option to upgrade the hard-drive seemed to be in order. ($350 for a 480 GB Solid State Disk).<br />
<br /></div>
<h4>
SQL Server 2012 ColumnStore Indexes</h4>
<div>
SQL 2012 has a new feature called a <a href="http://msdn.microsoft.com/en-us/library/gg492088.aspx" target="_blank">Columnstore Index</a>. Per the Microsoft website:</div>
<blockquote class="tr_bq">
<i> "An xVelocity memory optimized columnstore index, groups and stores data for each column and then joins all the columns to complete the whole index. This differs from traditional indexes which group and store data for each row and then join all the rows to complete the whole index. For some types of queries, the SQL Server query processor can take advantage of the columnstore layout to significantly improve query execution times... Columnstore indexes can transform the data warehousing experience for users by enabling faster performance for common data warehousing queries such as filtering, aggregating, grouping, and star-join queries."</i></blockquote>
<div>
To put that in plainer English - for data warehousing applications (like reporting and analytics) <b>a columnar database structure can pull its data with fewer trips to the disk - and that's faster, potentially a LOT faster.</b> (By the way if you want your database to support a transactional system where you will repeatedly be hitting it with a handful of new records or record changes - this could be an excellent way to slow it down <img goomoji="gtalk.328" src="https://mail.google.com/mail/u/0/e/gtalk.328" style="font-family: arial; font-size: small; margin: 0px 0.2ex; vertical-align: middle;" /> )</div>
<div>
<br /></div>
<div>
Now adding a ColumnStoreIndex does take a while but it's not exactly difficult. It's just a SQL statement that you run once:</div>
<blockquote class="tr_bq">
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColIndex_POSFacts] ON [dbo].[POSFacts] ([Fact1],[Fact2],[Fact3],[Fact4],[Fact5],[Fact6],[Fact7],[Fact8],[Fact9],[Fact10])WITH (DROP_EXISTING = OFF) ON [PRIMARY].</blockquote>
<div>
<i>Note: Once the ColumnStoreIndex is applied the SQL Server table is effectively read-only unless you do some clever things with partitioning. For one-off projects this doesn't matter at all of course. For routine reporting projects you may need a DBA to help out.</i><br />
<i><br /></i></div>
<h4>
InfiniDB columnar database</h4>
<div>
<div>
Columnar databases are not really "new" of course, just new to SQL Server so I also wanted to test against a "best of breed", purpose-built columnar database. </div>
</div>
<div>
<br /></div>
<div>
Why Infinidb? From my minimal <a href="http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/" target="_blank">research</a> it seems to test very well against other columnar databases, it's open source (based on MySQL), will run on Windows and comes with a free community edition. I actually found the learning curve relatively simple, in fact, as Infinidb handles it's own indexing needs it's perhaps even simpler than SQL Server . Frankly, the hardest part was remembering how to export 40 million records neatly from SQL Server so they could easily be read into InfiniDB using their (very fast) data importer.<br />
<br /></div>
<div>
<h4>
The Results</h4>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
Here are the (average) elapsed times to run this query under each disk and database configuration. </div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvaXeT0cqXZoEWrBZgO_4wCFbRkn9DenPF5ILMIulPntqFyRNRi4uWRDIITZZzHYxQPoWDif43ii3l0Ax7QTY8yTcrivRFz5QyxyvwZIw1BvhjvnNIegdOBZp-AQwF0ZUatedm1aD5pzA/s1600/ColumnStoreIndexResults_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvaXeT0cqXZoEWrBZgO_4wCFbRkn9DenPF5ILMIulPntqFyRNRi4uWRDIITZZzHYxQPoWDif43ii3l0Ax7QTY8yTcrivRFz5QyxyvwZIw1BvhjvnNIegdOBZp-AQwF0ZUatedm1aD5pzA/s1600/ColumnStoreIndexResults_1.png" /></a></div>
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
So the basic SQL Server 2012 configuration on a regular hard-drive took... 1,535 seconds to run my query. That's over 25 minutes. I can drink a lot of coffee in 25 minutes.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
Upgrade to the Solid State Disk (SSD) and it runs 460% faster in 5 minutes and 32 seconds. Now understand that my laptop does not use the fastest connection to this SSD, it's spec says it can handle 2.5 Gb per second. I believe newer laptops run at 6 Gbps. That being said at least now the quad-core CPU was being kept busy.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
If instead of upgrading the disk we add a ColumnStoreIndex to the fact table, we do even better reducing from 1,535 seconds to 126 - that's over 1200% faster !</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
So which option should we use? Both of course! I can now run a query that used to take 25 minutes in 37 seconds. That's 4100 % faster than when I started.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
Now let's take a look at that InfiniDb number. (I did not test with InfiniDB before swapping out the hard-drive so I only have data for it on the SSD.) Surprisingly it was not quite as good as the SQL Server speed with the Columnstore Index . I talked to the folks at <a href="http://www.calpont.com/" target="_blank">Calpont</a> that develop InfiniDB and they kindly explained that a key part of their optimization splits large chunks of data into smaller ones for processing. Sadly my 41 million record table was not even big enough to be worth splitting into 2 "small" chunks so this particular feature never engaged in the test. Still it's almost 3000% faster than base SQL even on this "small" dataset and the community edition is free. </div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
Based on the success of this test I think it's time to scale up the test data by a factor of 10 - watch this space.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
(Check out the following <a href="http://andrewg-crabtreeanalytics.blogspot.com/2013/02/the-right-tools-for-structured-big-data.html" target="_blank">update</a> post for more details.)</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
</div>
<h3>
Conclusions</h3>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
My test-bed for this benchmark was a mid-range laptop with a few nice extras (more RAM, solid state disk and 64 bit OS) but certainly not an expensive piece of equipment and it managed to handle an enormous amount of data with very little effort. This opens up possibilities for analyzing and reporting on much more data than was possible previously on your desk.</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
The implications are not just relevant to desktop tools though or to tools we think of as databases. Numerous other tools now claim to handle data storage in columnar form (see Tableau and PowerPivot for Excel).</div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<br />
Is this the best tool for the job? Perhaps, perhaps not: there is an enormous amount of activity and innovation in the database space right now and many. many other software providers. <span style="font-size: 13.333333969116211px;">It's certainly a lot faster for this specific purpose and a major step forward over more traditional approaches.</span><br />
<br /></div>
<div style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13.333333969116211px;">
<b>Look hard at columnar databases to speed up your raw data processing and don't spend any longer waiting on slow hard-drives. </b><span style="background-color: transparent;"> </span><br />
<span style="background-color: transparent;"><br /></span>
</div>
</div>
</div>
</div>
Andrew Gibsonhttp://www.blogger.com/profile/02721554488813333205noreply@blogger.com0