Lesson 5.2: Variant data

Access Lesson 5.2 slides here


As we were discussing in the last lesson, one of the challenges about doing web searches is that there is so much data out there.

One of the problems with so much data out there is that you will often get variant data; that is, different versions of the same piece of information.

Even a simple fact might have some more subtlety involved in it.

For example, compare the results for the [circumference of the earth].


Figure: Search results for the query [circumference of the earth].

One question to ask yourself before you search is: Does this fact vary based on some contextual information?

Perhaps you know that the circumference of the earth varies depending on whether you measure around the equator or around the poles. If you are not aware of that, you might believe that the earth is a perfect sphere and that a single number is a perfect description of its circumference. In reality, it’s very close to a perfect sphere, but there is variation.

Since Google search is so fast, it is easy to explore variations of ideas and understand whether or not your result is a fact that actually describes the circumstances of the context that you are trying to understand.

1. To verify the source of a piece of information, use the precise information you have.

One piece of advice that's given to every investigative reporter, that every detective follows, and that every searcher should follow, is to identify the sources of information. If you read something like, "63 percent of all kids looked at pictures of funny cats today," one of the first things you should be asking yourself is, “Where did that come from?”

To track a particular fact back to its source, try pulling out terms that you can be fairly sure will appear in any page that talks about your particular fact. In this case, terms like 63, kids, pictures, funny, and cats might all be useful. Try searching:

[63 percent kids pictures funny cats]


Figure: Search results for the query [63 percent kids pictures funny cats].

What you may notice here is there is no sign of the statistic you seek. You could play around some more, but would continue to be unable to find an original source--because this is not a real statistic. The fact that nothing comes up with this search starts to suggest that something is amiss.

Compare the above search with this scenario: you hear that it is estimated that the British were projected to win 62 medals in this summer’s Olympic games. You want to find out where that statistic originated, so you try a search like:

[62 british metals olympics]


Figure: Search results for the query [62 british medals olympics].

In these results, you can see that there is, indeed, a study making such a claim. You could click on these links and find out who wrote the study, and either follow links or run one more search to pull up the study itself.

This process says nothing about the quality of the information in the study, but it does allow you to identify the precise source of your original statistic.

Whenever you ask a question like this, you want to understand: Where did that number come from? How was it measured? What are the other factors that go into making that assertion true or not true? Then, you start considering: credible or not credible?

2. To confirm a fact, use a generic description for what you seek.

While identifying a source often includes searching for a particular number, name or other precise source of information, if you want to confirm that a fact is true, you need to take a different approach. Searching for [president alexander hamilton] may bring back documents which mistakenly identify the historical figure Alexander Hamilton as a former president of the United States, whereas searching for [alexander hamilton biography] or just [alexander hamilton] would allow you to find and read documents that would make it clear he never acted as President.

In another example, take a query like [the average length of an octopus is 18 inches].


Figure: Search results for the query [the average length of an octopus is 18 inches].

With a query like this, you get some very nice looking results. But then, if you look in the snippet, it says “13 to 18 inches in diameter” and “18 inches across.” People will scan these initial results, find the thing that they were asking about, in this case 18 inches, and think that their hypothesis is confirmed.

Don't be misled. The snippet is an extraction of the underlying web page.

If your query contains the answer you think you are looking for—that is, you're looking for confirmation—you fall suspect to what we call the confirmation bias: you're only looking for things you already believe to be true.

In fact what you should be doing is a query more like [average length octopus].


Figure: Search results for the query [average length octopus].

This gives you a wider range of results that you can then compare to your initial assumption.

So, keep in mind is that there are different ways you can phrase your query, depending on your desired outcome. You can look for confirmation of information you already know, or you can try to find the range of variation of things that are out there. What you should do as a good searcher is to try to gather a variety of information and then use your skills to synthesize, filter, and organize that information to get to the bottom of it.

Try to do that in the next activity.

Power Searching with Google © 2015 Google, Inc.  (9-3-15) DMR