Tips for Data Teams – The Consistency Check

Have you ever delivered an analysis, only to hear from your client that “these numbers can’t be right”? It’s hard to convince someone that your results are credible when they don’t even pass the first 5 seconds of review. As much as we may not want to admit it, sometimes the numbers are indeed wrong, so how do we avoid these situations from happening? One type of check that a Data Team can adopt is the “Consistency Check”. Here are some questions that you can ask yourself when doing a consistency check:

Consistent numbers

Question 1) Are the numbers consistent with themselves?
When building complicated analyses, different sections of the analysis can fall “out of sync” with each other if they are not all updated in the same way. When this happens it can produce inconsistent summary results (i.e. the cover page reports 255 conversions per hour, but the supporting details on other pages show 237 conversions per hour). Sometimes we place too much faith on our reporting tools and assume that they will report exactly as intended. In other situations it’s just a matter of being too close to the work. After a while the numbers are burned into your short term memory and you lose your ability to critically review them with an objective eye. Suggested work-arounds include:

  • Have another member of your team do a consistency check on the results, preferably someone who hasn’t been involved in the work.
  • Take an old school approach. Print out the results, and use different colored highlighters for each type of metric. Highlight the summary numbers that represent the same result, and confirm that they are indeed consistent. Continue until you’ve highlighted all summary numbers.
  • Take another old school approach. Get your calculator out or use a separate spreadsheet, and confirm that you can replicate the summary numbers just based on the results that are being presented. You may be surprised with how many of your clients are doing this with your results already.

Question 2) Are the numbers consistent with your previous analyses?
When a client receives a new set of results they often pull up the previous results that you gave them. They are asking the question “how much have things changed?” You can beat them to the punch by doing this consistency check yourself. To be more specific:

  • Start with the previous result that was presented or released. Compare the summary numbers from the previous results to your current summary numbers.
  • Assess if the changes are interpretable. If they are, then this interpretation will likely be part of what you communicate when you release the new results. If the changes are not interpretable, then it’s time to go back into your current results, or your previous results to diagnose why the changes aren’t explainable.

Question 3) Are the numbers consistent with other reports?
Stepping into the shoes of your audience, you can think about the other reports that they are referring to on an on-going basis. It doesn’t matter if the other reports that they use came from a completely different source – from their perspective all data from all sources is supposed to tell the same story. In a similar manner to Question 2, you can do some additional homework so that your results are valuable to your audience as possible. For example you could:

  • Ask your clients if they have any other reports that they use frequently, and if they would be willing to share them with you. You can frame it honestly – you want to make sure that your results are valid, and if they are different from other sources, you want to be able to explain why.
  • Do a little research on your own, in particular, reviewing any routine corporate reporting, or industry reporting. Sometimes, a skeptic can be won over by proving that you did your homework. Again if the numbers line up from other sources, it becomes something you can report as proof of consistency. If the numbers don’t line up and you can’t explain the difference, then it may be an indication that you need to review your analysis.

Question 4) Are you telling the right story?
Taking all of the above into account, you should be able to deliver your results confidently. You should now know that the numbers in the report are consistent amongst themselves, that the analysis is consistent with previous analyses, and that the results are interpretable in comparison to other sources. This now can become part of your summary and presentation of your stunning new work. Or at least it can form as an addendum to the email, or the presentation that shows your audience the efforts that you went through to ensure that the numbers are the right numbers. Then you have the foundation to begin telling the actual story of the analysis (the “so what” message).

These are just a few tips, but I’m sure there are many of experts out there who have many more great ideas. If you have suggestions, or alternate points of view, please weigh in.

Note: What is a Data Team?
When we refer to “Data Teams” it’s a catch all for groups of technical, statistical, and subject-matter domain experts that are involved in providing information to support their organization. These teams are sometimes called “Business Intelligence”, “Decision Support”, or “Information Management”, but they can also be internal consultants such as “Operations Analysts”, “Strategic Information” or “Research”. Many of these concepts equally apply to teams of Data Scientists.


Reducing Rework in a Data Team

As much as we’d all like to get things done right the first time, with analysis and modeling it’s not always possible.

When delivering results, it’s fairly common to receive requests for minor revisions – and most of that we can all handle. But every so often the situation catches you by surprise. You’re delivering what you think is a great piece of work only to learn that it missed the mark completely. You hear statements like “This isn’t what I asked for!” or “You misunderstood what I asked for!” and you wonder where things went wrong.

Sometimes you can rightfully blame the person who requested the analysis, and then conveniently changed their mind. But more often the breakdown happens around communication and agreeing on expectations.

Final version

So what do you do? Here are some coping strategies:

1) Ask the question “What does a job well done look like?”
The next time you’re asked to run a major analysis where you feel that you don’t have an adequate understanding of what is being asked, try this script:

“I want to make sure that I give you what you want. Would you mind if I grabbed a couple of minutes to clarify a few things?”

Then ask your clarifying questions. For example:

  • What’s the business question that this analysis is supporting you with?
  • Do you just want the summary, or did you want the supporting details?
  • Is this analysis just for your reference, or is it going to be distributed?
  • How accurate does this need to be?

The answers to these questions can make a big difference in determining the final deliverable. If you only have time for one question, the first question is the best one to ask.

If you’re lucky enough that the person making the request is willing to spend more than a couple minutes with you, then you can try to get crystal clear on “What does a job well done look like?” The following are some of the statements that you might hear:

  • It will help me answer this questions …
  • The numbers will be consistent with our annual report
  • The summary of results will be jargon-free
  • The results will be delivered by Friday morning at 10 am, both by email as well as a color print out on my desk

2) Put your understanding in writing
Now, with your heightened clarity you can now put it into writing. A short follow up email of the form “Thanks for clarifying. So, just to recap I will …” will provide one more opportunity for corrective feedback.

In many situations you won’t be able to do the first step (getting clear on “what a job well done looks like”) because the person making the request is too busy. But even in these situations it’s still worthwhile putting into writing. You can write the same short email, but this time it will have an opening line of the form “I know you’re too busy to discuss the analysis, so I’ll make the following assumptions when I do it …” And then, you can add a closing line “Hopefully that captures it. If I don’t hear otherwise from you, I’ll deliver results based on this understanding.”

3) When delivering your result, include the original request
You’ve done the hard work of clarifying expectations, you’ve done the analysis, and now this is the easy part. When summarizing the results, make sure that you attach your analysis to the clarifying email. If you’re delivering it in hard copy, you can attach a print out of the clarifying email to the top.

Using this approach the person making the request will be able to see their role in the entire process. It won’t take long for people to see the value of slowing down and spending a few minutes getting clear on the request.

4) Follow up after the fact
The worst situations are when you’ve put in the hard work, but it wasn’t really what the requester wanted, and so they don’t use it. They’ve wasted their time, your time, and they still didn’t get what they want. Because they feel embarrassed about not using the work, they will often not bother giving you feedback.

So, it’s up to you to solicit feedback after each major deliverable. A brief check-in after the fact can yield great feedback. If you’re not getting rave reviews about the great work you did, you can ask “What could I have done to make it even better?” This seemingly innocent question prompts the requester to give candid feedback, and demonstrates that you really care about the value of your work.

How's my analysis?

These coping strategies are not for everyone, and are not needed in every situation (especially the quick and easy analyses). But it’s the times when we get it wrong where we really appreciate the value of clarifying expectations. If you have your own coping strategies, please weigh in.

Note: What is a Data Team?
When we refer to “Data Teams” it’s a catch all for groups of technical, statistical, and subject-matter domain experts that are involved in providing information to support their organization. These teams are sometimes called “Business Intelligence”, “Decision Support”, or “Information Management”, but they can also be internal consultants such as “Operations Analysts”, “Strategic Information” or “Research”. Many of these concepts equally apply to teams of Data Scientists.


Tips for Managing Priorities in a Data Team

We work with a lot of different Data Teams, and most of them are faced with the same challenge:

How do you handle all of these competing requests for information?

Below are some relatively easy-to-implement tips for dealing with this situation, but first let’s see why this can be so hard. The following are some of the more common reasons we’ve seen in the field:

  • Every request seems to be urgent. Most Data Teams are all too familiar with the expression “we need it yesterday”.
  • Every request seems to be very important. How can a Data Team not give priority to a request that comes from the CEO’s office or from the Board? What about situations where Public Relations needs good information to handle an emerging PR issue?
  • Requests for information are “free”, meaning that in most situations, the people requesting the information don’t have to pay for it. As a result, demand for information grows much faster than the capacity of the Data Team.

Overloaded Inbox

Here are some tips for Managing Priorities in a Data Team:

1) Keep a log of all active requests
As simple as it sounds, keeping an up-to-date log of all active requests is a “must have” enabler for managing competing requests in a Data Team. Many Data Team leads feel that they don’t need such a log, citing that they have it all under control, and that they are too busy to keep another list up to date. But such a log can help identify the capacity needed in the Data Team, and the skill mix that’s required. At minimum the Active Request Log should include the following information for each information request:

  • Who is asking for the information?
  • What are they asking for?
  • When did they ask for it?
  • Who in the Data Team is handling the request?
  • When did we promise to get it done?
  • What’s the status of the request (not started, active, completed, cancelled)?

In addition, the following information can be very helpful for planning purposes:

  • When was the information delivered?
  • How many hours of effort were involved in preparing it?
  • Was the due date pushed back? If so, how many times and by how many days?
  • Was there any feedback from person who requested the information?

This list can be as simple as a whiteboard, a shared spreadsheet, a SharePoint list, or a Google Doc. The hard part is having the discipline to keep it up to date.

2) Review the log as a Data Team every day
Having a daily 5 minute meeting as a Data Team may seem like a big burden. Who needs another meeting in their already-too-busy schedule? But if done right, a daily 5 minute meeting to review the Active Request Log can help a too-busy Data Team work together to make sure that the most important things are being worked on every day. Specific things that can be clarified during this 5 minute check-in include:

  • What must we get done today?
  • What must we get done in the next couple of days?
  • Who has the lead on each piece of work?
  • What requests need more support?
  • What counts as “good enough” for the requests that we’ll be working on today and tomorrow?

This quick meeting can set the entire Data Team in the right direction at the start of each day, and in doing so, go a long way to reducing the last-minute scramble, and make sure that the Data Team works to it’s full potential as a team.

3) When handling new requests, use the active request log to set expectations
If you have the discipline to do the above 2 steps, then after not too long you will have great information for managing expectations with new requests. For example, if there is a last minute urgent and important request for information, then at minimum you will now know:

  • How long will this really take us to complete?
  • Are there any recent requests for information that are similar to this one? If so, can that requests be modified to meet this urgent need?
  • Will any active requests not be completed on time, as a result of this new urgent request? If so, is the person making this new urgent request willing to take the heat?

In a lot of respects, most Data Teams are carrying out all of these three functions, but often it’s in people’s heads. By adding a little bit of tracking and daily discipline, the Data Team can significantly improve their work effectiveness, and at the same time better meet the needs of their customers.

We’re sure you have perspectives of your own on this subject. If you so, please share your thoughts and ideas.

Note: What is a Data Team?
When we refer to “Data Teams” it’s a catch all for groups of technical, statistical, and subject-matter domain experts that are involved in providing information to support their organization. These teams are sometimes called “Business Intelligence”, “Decision Support”, or “Information Management”, but they can also be internal consultants such as “Operations Analysts”, “Strategic Information” or “Research”. Many of these concepts equally apply to teams of Data Scientists.


Applying “Purposeful Abandonment” to Big Data

I’ve recently been reading “Inside Drucker’s Brain” by Jeffrey Krames. I’ve read some of Drucker’s hits, but I found this book put his great ideas all together in an easy to digest format.

One of the Drucker concepts that resonated with me is the concept of “purposeful abandonment”. He argues that it’s easy to take on more responsibility, do more products, support more customers, but the hard part is the “letting go” part. By taking a concerted and proactive approach to identifying “what you won’t do anymore” one creates the space needed to move forward in the areas that matter.

The concept is surprising relevant when applied to Data Science. Here’s my take on it:

1) Do you really need all those data fields and metrics?
The thrill of Big Data is having no limits on the number of fields that we have in our datasets. With space being so cheap, and an abundance of distributed computing power, there’s no need to scrutinize the fields that we’re tracking. But, isn’t this just a form of Parkinson’s law in action (i.e. Data expands to fill the space available for storage)? With every data field and metric comes the need to do quality assurance, test for face-validity, and understand the underlying quirks. Letting go of those “nice to have” data fields and metrics allows Data Scientists to better focus on the ones that really matter. Less time checking redundant fields and metrics equals more time for insightful and impactful analyses.

Saying No

2) Do you really need all those records?
Just like the previous concept, what’s the big deal? Why not analyze all the data records in our data sets, all the time? There are certainly times when we really need the full dataset, but often this stage can wait until the first exploratory analyses have been done. Sadly, some analysts can get stuck in a mindset of always running analyses on the full dataset. And so, they spend lots of time and effort on using Big Data tools, when they could have used good old fashion statistical samples to just cut to the chase. Less time running all analyses on all of the data records can equal more time nimbly running exploratory analysis to find the hidden gems you’re looking for.
(more…)

New Year’s Resolutions for Data Scientists

As a group, Data Scientists seem like the type of a people that would seize any opportunity to improve. So in the spirit of fun, the following are 4 “tongue in cheek” resolutions for this year.

1) Gain More Weight
Data Scientists are getting a lot of attention these days, which is great. We need to continue to gain our collective weight as people who help other people make sense of the ever-growing mass of data, translating what the numbers mean into something actionable for non-Data Scientists.

Data scientist

2) Keep Smoking!
Yes, really, keep smoking! The concept of the Data Scientist is smoking hot, and in a self-promotion kind of way, it makes sense to keep this momentum going. So this means doing things like being a good ambassador of Data Scientists as a group, and explaining to people (i.e. your mother, your neighbor, the person on the street) what the heck we do.

3) Learn a New Language … Spanish, SQL, R …
Data Scientists are human too, and so it’s not uncommon for a Data Scientist to get really comfortable with a set of analytical tools – almost too comfortable. This could be the year to broaden your horizons and try something new. Different technologies often have completely different ways of approaching the same problem, and some are better than others depending on the task at hand. Knowing the options can save a lot of time in the long run. The article Top Holiday Gifts for Data Scientists has some good references for books and other resources.

4) Learn How to Make Friends and Influence People
Data Scientists can suffer from being too analytical, too technical and just too darn scientific. The greatest insights in the world don’t matter if they can’t be communicated to people in way that they can be understood. Data Scientists often can do with a little help in this area. These are two books that I’d recommend for Data Scientists that are looking to improve their game at presenting:

And let’s not forget the “making friends” part. The Data Scientist community is a growing one, and as good friends there’s a lot we can learn from each other.

I’m sure there are more resolutions in store for Data Scientists – please share your suggestions and thoughts.


The Science of Data Scientists

The concept of the Data Scientist may very well be the next big thing in the field of analytics. Recently several industry leaders have weighed in on the question “What is a Data Scientist?”, but another way of looking at this is to ask the question “What is the Science of Data Scientists?”
            Data Scientist

A dictionary definition of science is a “systematic knowledge of the physical or material world gained through observation and experimentation”. So let’s look at the use of science in three areas that Data Scientists all need to do in carrying out their basic work:

  1. They transform the data into a format and structure that is conducive to analysis
  2. They carry out some kind of descriptive, interpretative, or predictive analysis
  3. They communicate their results

Using Science in Data Transformation:

Anyone who’s worked with data for a while knows that the data you have available is usually less than perfect. Missing data, inconsistently formatted data, and duplicate data are fairly routine obstacles, and then linking data from different sources is even more challenging. Data Scientists are also often required to work with “secondary data” that has been generated through an operational system or process. The data was originally designed to meet a functional requirement, rather than with the intention of it being analysed in the future. Even if the data is clean and error-free, there is a requirement to reorganize the data into a structure that is conducive to the analysis that needs to be performed.

So, in response, most Data Scientists develop skills in transforming data, and are quite good at it too. They use tools ranging from statistical analysis software to standard database technologies. Where the science comes in, is that there if often a lot of experimentation that takes place along the way, as the Data Scientist figures out how best transform the data while introducing little to no error along the way.

Many Data Scientists have learned the hard way that using a scientific method to prove that the data transformation has been done correctly ultimately saves time and reduces rework in the end.

            
Big Number

Using Science in Performing Analysis:

Here the use of scientific method is more obvious. It is taken as a given that Data Scientists conduct their analysis and modeling systematically, and that the essence of the work involves observation and experimentation. In carrying out the work, often “the proving” is a key component of what the Data Scientist does, so that they know they are drawing the right conclusions.

However, there is a wide range of scientific tools that Data Scientists can use to understand and interpret massive amounts of complex data. Data Scientists are not unlike other skilled experts, and can be sometimes be like a carpenter with a hammer who sees every problem as a nail. For example, some Data Scientists are truly exceptional when it comes to logistical regression modeling (making the best guess of a “yes/no” variable), but then are complete novices when it comes to multivariate analysis (such as condensing information captured in 1,000 correlated variables into 10 summary variables). As is often the case with niche skills, it takes a while to really get good at using them effectively, and it’s rare to find Data Scientists that are truly effective in all domains. The scientific connection here is that Data Scientists sometimes have to come to grips with the limits of their own skill set, and have to experiment in new directions to expand their knowledge base.

Using Science in Communicating Results:

This angle is less intuitive, but ultimately what’s the point of doing high-brow analysis, if nobody is able to understand the result, or even worse, if they can’t use the result to support a key decision?

Data Scientists that are in high demand are those that are able to truly understand the business question being asked, and why it’s being asked. Then they communicate their complex findings in a way that the decision-makers can actually do something with the result.

This important skill takes a while to develop, often through experimentation (i.e. what happens when I present it this way?), and then observation (i.e. what did the CFO do with the last findings I sent her?). Even better, is when the Data Scientist adopts basic market research approaches to their own work. Specifically, by following up with their clients and/or end-users of their work and discovering how the results could be even more useful. Or taking a more traditional approach, they can literally post their results with on-line reporting tools and run analytics to see how often and how deeply their results are being viewed.

The concept of the Data Scientist is still relatively new and will be shaped by those of us who work in and around in the industry. Please offer your own comments and feedback, even if you disagree with any of these ideas.