Applying “Purposeful Abandonment” to Big Data

I’ve recently been reading “Inside Drucker’s Brain” by Jeffrey Krames. I’ve read some of Drucker’s hits, but I found this book put his great ideas all together in an easy to digest format.

One of the Drucker concepts that resonated with me is the concept of “purposeful abandonment”. He argues that it’s easy to take on more responsibility, do more products, support more customers, but the hard part is the “letting go” part. By taking a concerted and proactive approach to identifying “what you won’t do anymore” one creates the space needed to move forward in the areas that matter.

The concept is surprising relevant when applied to Data Science. Here’s my take on it:

1) Do you really need all those data fields and metrics?
The thrill of Big Data is having no limits on the number of fields that we have in our datasets. With space being so cheap, and an abundance of distributed computing power, there’s no need to scrutinize the fields that we’re tracking. But, isn’t this just a form of Parkinson’s law in action (i.e. Data expands to fill the space available for storage)? With every data field and metric comes the need to do quality assurance, test for face-validity, and understand the underlying quirks. Letting go of those “nice to have” data fields and metrics allows Data Scientists to better focus on the ones that really matter. Less time checking redundant fields and metrics equals more time for insightful and impactful analyses.

Saying No

2) Do you really need all those records?
Just like the previous concept, what’s the big deal? Why not analyze all the data records in our data sets, all the time? There are certainly times when we really need the full dataset, but often this stage can wait until the first exploratory analyses have been done. Sadly, some analysts can get stuck in a mindset of always running analyses on the full dataset. And so, they spend lots of time and effort on using Big Data tools, when they could have used good old fashion statistical samples to just cut to the chase. Less time running all analyses on all of the data records can equal more time nimbly running exploratory analysis to find the hidden gems you’re looking for.
(more…)