Recently, I have been working with a new-to-me BI tool that has reminded me just how much speed matters. I'm not mentioning any names here, and it's not a truly bad tool, it's just too slow and that's an insight killer!
Continuing my series on Next Generation DSRs, let's look at how speed impacts the exploratory process and the ability to generate insight and, more importantly, value.
Many existing DSRs do little more than spit out standard reports on a schedule and if that's all you want, it doesn't matter too much if it takes a while to build the 8 standard reports you need. Pass off the build to the cheapest resource capable of building them and let them suffer. Once built, if it takes 30 minutes to run when the scheduler kicks it off, nobody is going to notice.
Exploratory, ad-hoc, work is a different animal and one that can generate much more value than standard reports. It's a very iterative/interactive process. Define a query, see what results you get back and kick off 2-3 more queries to explain the anomalies you've discovered: filter it, order it, plot it, slice it, summarize it, mash it up with data from other sources, correlate, .., model. This needs speed.
For a recent project, I was pulling data to support analytics: descriptive, inventory-modeling and predictive models. Define a query based on the features I am searching for, submit it to run, then wait... 20 minutes to an hour to get a result. When the results come through (or fail to do so with an error message that defies understanding) I have long since moved on to some other task so as not to completely destroy my productivity. It takes time to get my head back in the game and to remember what I was trying to achieve and productivity takes a dive. I didn't need just one query of course, more like 10, so I would have 3-4 running simultaneously and extensive notes scribbled on a scratch pad to try and keep track.
Admittedly, what I am doing here is complex and the tasks I was using to fill-in gaps with were also relatively complex (e.g. simulating a large-scale, retail, supply-chain replenishment and forecasting system in R), but still, it took 2 days of fighting with the beast to get what I needed. Progress was painfully slow on everything I attempted in this time period and my frustration levels were off the scale.
This system is forcing me to multitask. According to one study, this can reduce your productivity by 40%. A 40% decline in productivity is a bad thing, but, frankly, it felt worse: I did not measure it and I'm not about to create a study to prove it, but switching between highly complex tasks and with a BI tool that kept interrupting me felt much worse than a 40% drop.
Whether my perception is right or not, it's perception that drives behavior. If using the system in this manner is painful it will inevitably be used less often by fewer people and more of the insights buried in the data will stay there.
Not that things haven't improved. One of my first jobs after college was to build computer simulations of factory production lines to test out changes in new equipment or layouts before incurring any significant capital expense. Some of these studies were very successful, but very complex to build and, running on the hardware of the time, I would start a simulation run when I went home and check the results when I got in the following morning. Some mornings could be very depressing; realizing that I had an error in a part of the model, had no useful results to build on and no chance to run again until that evening. Consequently, studies that took 1-2 weeks of work time, could take elapsed months to execute.
If you've been following this series you'll know that I am strong proponent of using newer database technologies (mpp, memory, columnar, ...) to both simplify the data architecture AND provide substantial speed increases over existing systems.
If you still just want your standard reports, don't worry about it, just hope your competition is doing the same.