One of the Drucker concepts that resonated with me is the concept of “purposeful abandonment”. He argues that it’s easy to take on more responsibility, do more products, support more customers, but the hard part is the “letting go” part. By taking a concerted and proactive approach to identifying “what you won’t do anymore” one creates the space needed to move forward in the areas that matter.
The concept is surprising relevant when applied to Data Science. Here’s my take on it:
1) Do you really need all those data fields and metrics?
The thrill of Big Data is having no limits on the number of fields that we have in our datasets. With space being so cheap, and an abundance of distributed computing power, there’s no need to scrutinize the fields that we’re tracking. But, isn’t this just a form of Parkinson’s law in action (i.e. Data expands to fill the space available for storage)? With every data field and metric comes the need to do quality assurance, test for face-validity, and understand the underlying quirks. Letting go of those “nice to have” data fields and metrics allows Data Scientists to better focus on the ones that really matter. Less time checking redundant fields and metrics equals more time for insightful and impactful analyses.
2) Do you really need all those records?
Just like the previous concept, what’s the big deal? Why not analyze all the data records in our data sets, all the time? There are certainly times when we really need the full dataset, but often this stage can wait until the first exploratory analyses have been done. Sadly, some analysts can get stuck in a mindset of always running analyses on the full dataset. And so, they spend lots of time and effort on using Big Data tools, when they could have used good old fashion statistical samples to just cut to the chase. Less time running all analyses on all of the data records can equal more time nimbly running exploratory analysis to find the hidden gems you’re looking for.
3) Do you really need all those reports?
Many Big Data groups are tasked with providing management reports and scorecards in response to the demands of the organization. But, there are always requests for new reports, new analyses, new dashboards and so on, and rare is the situation where old reports are retired. But with each published report comes the responsibility of getting the numbers right. By taking a “Google Analytics” approach to older reports, one can discover which ones are no longer relevant. Less time supporting irrelevant reports equals more time making reports that make an impact.
4) Do you really need all of those models?
It’s easy to get into a mindset of using a model or a technique, because “that’s how we’ve always done it”. It takes a critical eye to step back from one’s work and really question if it’s harder than it needs to be. While using an advanced analytical technique can make the work seem more sophisticated, at the end of the day, people care about performance. And if a simple, low-tech approach yields the same results with less effort, this can be an easy win. Less time using complicated advanced analytical techniques on problems that don’t need them equals more time focusing on better predictive performance.
I’m sure there will be people who disagree with the “less is more” concept as applied to Big Data – it is admittedly a contrarian perspective. If you disagree, please share your thoughts and ideas.