Data Science and Critical Thinking

Tags: data science

Published May 13, 2020, 8:55 p.m. by Morgan

So, data science is something I've been learning a lot about lately---this isn't to say that I had no prior knowledge of dealing with and interpreting data, only that I had no prior formal introduction to it. Besides the shiny and flashy aspects like machine learning, I have been thus far pleased to see an emphasis on consideration and asking the right questions. This is something that is often neglected in the use (or more accurately, abuse) of statistics in discussions. As a result, we have such gems as

"There are three kinds of lies: lies, damned lies, and statistics."

as far back as the turn of the 20th century---yup, people have been abusing stats for practically the entirety of recent memory.

Now, as mentioned above, I've only recently engaged in the formal study of data science, but I have practiced an important aspect of it for most of my adult life: critical thinking. And this is what I think merits discussion. And by discussion, I mean blog post.

Like any good discussion, we need to start with first principles and make sure we're on the same page in regard to what we're even talking about. Those with scientific background have likely heard this referred to as making an, "operational definition." So, for our purpose, here, what exactly is critical thinking?

The Oxford English Dictionary lists it as,

the objective analysis and evaluation of an issue in order to form a judgment.

But I find that to be a gross (not to mention imprecise) oversimplification. I trust we all have a, more or less, common understanding of the word, 'thinking,' but maybe we should dig deeper into the word, 'critical.' Looking at the definitions for this in the same dictionary, we find,

Interestingly, upon examination we find that all of the definitions for, 'critical,' are quite qualitative and subjective in nature, contrary to the associations with it in the context of science and, more to the point, the definition for critical thinking given by the same dictionary. Also interestingly, all of these can apply to our purpose.

The first, because constructive criticism is key to improvement (I'd also say recognition of what's good is quite important), the second and third because what you're going to do or decide is going to be based on your decision (and in the case of data science, that could mean time and money), and the fourth and fifth because it is often judgement rather than deduction. While you can break down some parts of critical thinking to a set of algorithms, pre-defined rules and methods (like decision trees), it often comes down to judgement based on experience (making an educated guess, for instance), which is a heuristic, a sort of short-cut or logical leap. While you can't really formalize this sort of heuristic, you can train your thinking to build by starting with the algorithm aspect, which we can do right now in our definition.

In my own words, I'd make our operational definition something along the lines of,

the careful consideration of a situation and its surrounding context, including identifying ambiguities and missing information, sources of bias, and the management of assumptions.

When it comes to ambiguities, something as simple as unclear grammar, also called 'amphiboly,' could be a source of serious confusion. I'd wager, though, that simply overlooking the need for operational definitions is a far more common pitfall. And the identification of missing information, while you may ultimately be unable to fill it in, is a crucial step because of confounding factors, unaccounted for contextual information impacting your results. For example, we might read, "this month Joe's Market had twice as many injuries as the previous month." If, however, you looked deeper at the data, you might find that these extra injuries were slip and falls outside and last month had been sunny while this one had many rainstorms. Maybe Joe should do something about the safety record, but it likely wouldn't be effective without knowing those key details. Further, "twice," is a relative term, and if the previous month had only one injury, we'd probably feel a bit misled by expressing it as, "twice as many." The missing information there, being the actual raw numbers.

Sources of bias I'm not going to go into since it merits its own discussion. Perhaps a discussion for another day.

I was originally going to phrase it as, "assumption management," but was both surprised and not surprised to find that this was a term already in use (so much for my ambitions in groundbreaking project management coaching), so I went with the above phrasing to avoid a source of potential ambiguity. Along those same lines, let's be explicit: managing your assumptions does not mean eliminating them all in the form of rigorous proof (that is, changing it from an assumption to a fact). Just as you may not be able to fill in all the missing data you identify, you may not be able to eliminate assumptions, and, in fact, might not care to. At a certain point, you have to go with what you have available, if only because of time constraints, and there is often little you can do about this. What you can do, however, is acknowledge assumptions. Even if you aren't going to do anything with this as you continue, keeping a list or structure in the back of your mind of what your assumptions are can help you if you run into trouble. For instance, let's say you are doing a math problem and are hitting a wall where you simply can't figure out how to proceed. One of the assumptions you've made is that the steps and calculations you've made up until that point are correct. The person that kept this in mind on some level would probably think to go back and check on that assumption, perhaps fixing an earlier mistake that changes how the problem played out. The person that did not might find themselves pounding their head against the wall until they gave up.

By this point, I hope it is self-evident how important critical thinking is to our decision-making, in and out of data science, so let's follow along with a thought experiment (credit to this book review for the brilliant comparison) on a piece of journalism:

Imagine you wanted to learn about the Exxon Valdez oil spill in Prince William Sound in 1989. You pick up a book written by a reporter for the New York Times who interviewed everyone involved in the spill. You have the following basic questions: Was Prince William Sound ecologically pristine or already spoiled prior to the spill? Was Prince William Sound considered a tricky run for tankers? What actions did the captain of the Valdez take immediately before it ran aground? Did Exxon prepare a risk assessment for this run? Did Exxon discuss double-hulled tankers specifically for this run to prevent oil spills? Did BP, Shell, or Chevron run tankers through this same run without incident? Is there a company that has always avoided spills?

In this case, the writer has identified some key contextual data that set the tone for the incident. Taking each datum, we can combine and further simplify the points to, "what is the historical trend for the environment and the action performed? What, if anything, was exceptional? How does it match up to other players in an apples to apples comparison?" Just by having these sorts of questions in mind, you can make your own starting procedure for critical thinking. The key is to remember that this is only a starting algorithm, and you need to consciously develop your judgment through the heuristic of trial and error. With that in mind, hopefully we will be asking the correct questions and avoid finding something like

Sticking with this thought experiment, about fifty pages into the book you realize the author has passed by these basic questions.

among the reviews for our work.

Similar posts

There are no similar posts yet.

1 comment

Comment 1 by 是我 (It's Me) Aug. 12, 2024, 8:39 p.m.

Begone, spam!

Turns out when it doesn't send you alerts for comments that spammers find your site and clutter things up when you're not looking.

-Morgan

Add a new comment