How Google Flu Trends Works


GFT: Model Updates, Accuracy and the Big Data Trap
One of the problems with analyzing search data to determine illness trends is that it doesn’t account for people who aren’t sick, but are fretful about coming down with something.
One of the problems with analyzing search data to determine illness trends is that it doesn’t account for people who aren’t sick, but are fretful about coming down with something.
© Hemera/Thinkstock

Prior to each new year's flu season, the Google Flu Trends model is refreshed with 45 of the most useful influenza-related queries from years prior (those special search terms are chosen using logistic regression, but the exact queries and how they're weighted against others are kept top secret).

Additionally, GFT's post-season estimates are assessed against the traditional data surveillance reports used by the CDC to see how well the two match. Based on the prediction tool's ability to accurately estimate when that year's flu season begins, when the season will peak, and how severe it will be, the model may be updated. When it first launched in 2008, GFT had a mean correlation of 97 percent with CDC data [source: Ginsberg].

In September 2009, the model for the U.S. version of Google Flu Trends got its first update to include search query data from the H1N1 outbreak. This was because GFT's model had completely underestimated the H1N1 swine flu pandemic (which happened in the summertime). And then it continued to miss the mark.

During the 2011/2012 flu season, GFT overestimated the prevalence of flu by 50 percent. GFT also overestimated the 2012/2013 flu season, predicting as many as double the number of outpatient visits relating to ILI as the CDC actually reported. At the peak of the 2013/2014 flu season, GFT estimated that as many as 11 percent of the U.S. population had the flu. If that seems like a lot, it's because it is — the CDC, in comparison, reported 6 percent that season. Researchers report that the tool's accuracy may actually be much worse; they found that beginning in August 2011 GFT had overestimated in 100 out of 108 weeks [sources: Hodson, Walsh, Lazer].

The most common explanation for Google's flu prevalence overestimation is nothing more than our own jerkiness when flu season rolls around — you know, when you search the word "cough" in an effort to figure out if you're coming down with the flu, a cold or, maybe, wait, could it be pneumonia? Media use of phrases like "the worst flu season in years" and seasonal flu media reports also contribute to our cough-obsessed searches. The problem is that GFT doesn't know whether you're sick or just worried about getting sick; consider that only about 10 percent of all the people who seek medical care for the flu actually have influenza [source: Salzberg]. Google searches don't have context, and they don't know your intent.

But that might not be the complete answer.

In addition to ILI-related media hype inflating flu searches, working with big data can lead to making correlations that may not be accurate. It's the big data trap. While the results of mining the data may paint a relationship between seasonal search queries and, say, doctor visits, the sheer massiveness of the data set suggests that correlation's accuracy can't be trusted.

Another question about GFT's overestimation lies in Google's own search engine algorithm updates. Researchers propose that the introduction of the autosuggest feature in Google Search changed user behavior for the potential for overestimation in GFT; users searching for one flu symptom were now being encouraged to search for more (Google-recommended) flu-related terms, influencing overall ILI-related searches.

In 2012, the search engine began including possible conditions related to the symptoms queried, also potentially adding to the overestimation problem.

However, after poor performance again in the 2012/2013 flu season, GFT's algorithm was again updated. It would now downplay any media-driven irregularities and make its forecasts based on a statistical method called ElasticNet (which is a generalized linear model of regularized regression). But there was still room for improvement; the revised algorithm still overestimated by as much as 30 percent [source: Lohr].

In 2014, GFT engineers updated the GFT tool to include not only refreshed search data but also the traditional clinical and virological so-called small data from the CDC for the 2014/2015 flu season. Both engineers and scientists agree a combination of this information should lead to more accurate results.

Author's Note: How Google Flu Trends Work

What a week to immerse yourself in influenza; the day I was writing about how the CDC monitors and analyzes flu data was the same day CDC health officials announced that this year's flu season could be severe — because one of the virus strains (and the one that's most dominant so far this season) used in this year's vaccine has mutated. Keep your eye on Google Flu Trends.

Related Articles

More Great Links

Sources

  • Arce, Nicole. "Google Flu Trends got it wrong: Flu prediction tool gets updated." Tech Times. Nov. 1, 2014. (Dec. 5, 2014) http://www.techtimes.com/articles/19247/20141101/google-flu-trends-got-it-wrong-flu-prediction-tool-gets-updated.htm
  • Arthur, Charles. "Google Flu Trends is no longer good at predicting flu, scientists find." The Guardian. March 27, 2014. (Dec. 5, 2014) http://www.theguardian.com/technology/2014/mar/27/google-flu-trends-predicting-flu
  • Butler, Declan. "When Google got flu wrong." Nature. Feb. 13, 2013. (Dec. 5, 2014) http://www.nature.com/news/when-google-got-flu-wrong-1.12413
  • Centers for Disease Control and Prevention. "Deaths: Final Data for 2011." (Dec. 5, 2014) http://www.cdc.gov/nchs/data/nvsr/nvsr63/nvsr63_03.pdf
  • Centers for Disease Control and Prevention. "Influenza (Flu)." Dec. 4, 2014. (Dec. 5, 2014) http://www.cdc.gov/flu/
  • CNBC. "The world's 10 leading causes of death." (Dec. 5, 2014) http://www.cnbc.com/id/101388499/page/1
  • Copeland, Patrick. "Google Disease Trends: An Update." Google.org. (Dec. 5, 2014) http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/41763.pdf
  • Fox, Susannah. "The social life of health information." Pew Research Center. Jan. 15, 2014.(Dec. 5, 2014) http://www.pewresearch.org/fact-tank/2014/01/15/the-social-life-of-health-information/
  • Fung, Kaiser. "Google Flu Trends' Failure Shows Good Data > Big Data." Harvard Business Review. March 25, 2014. (Dec. 5, 2014) https://hbr.org/2014/03/google-flu-trends-failure-shows-good-data-big-data/
  • Ginsberg, Jeremy. "Letter: Detecting influenza epidemics using search engine query data." Nature. Vol. 457. Pages 1012-1014. Feb. 19, 2009. (Dec. 5, 2014) http://www.nature.com/nature/journal/v457/n7232/suppinfo/nature07634.html
  • Goldschmidt, Debra. "CDC: Flu shot less effective this year because current virus has mutated." CNN. Dec. 4, 2014. (Dec. 5, 2014) http://www.cnn.com/2014/12/04/health/flu-vaccine-mutated-virus/
  • Google.org. "Flu Trends." 2014. (Dec. 5, 2014) http://www.google.org/flutrends/
  • Harvard Medical School - Harvard University. "10 flu myths." (Dec. 5, 2014) http://www.health.harvard.edu/flu-resource-center/10-flu-myths.htm
  • Hodson, Hal. "Google Flu Trends gets it wrong three years running." NewScientist. March 13, 2014. (Dec. 5, 2014) http://www.newscientist.com/article/dn25217-google-flu-trends-gets-it-wrong-three-years-running.html
  • Lazer, David. "Google Flu Trends Still Appears Sick: An Evaluation of the 2013-2014 Flu Season." Social Science Research Network. March 13, 2014. (Dec. 5, 2014) http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2408560
  • Lazer, David. "The Parable of Google Flu: Traps in Big Data Analysis." Science. Vol. 343, No. 6176, Pages 1203-1205. March 14, 2014. (Dec. 5, 2014) http://www.sciencemag.org/content/343/6176/1203
  • Lohr, Steve. "Google Flu trends: The Limits of Big Data." The New York Times. March 28, 2014. (Dec. 5, 2014) http://bits.blogs.nytimes.com/2014/03/28/google-flu-trends-the-limits-of-big-data/
  • Oremus, Will. "Going Viral." Slate. Jan. 9, 2013. (Dec. 5, 2014) http://www.slate.com/articles/technology/technology/2013/01/flu_shot_time_google_flu_trends_predicts_worst_season_on_record.html
  • Salzberg, Steven. "Why Google Flu Is A Failure." Forbes. March 23, 2014. (Dec. 5, 2014) http://www.forbes.com/sites/stevensalzberg/2014/03/23/why-google-flu-is-a-failure/
  • Stefansen, Christian. "Google Flu Trends gets a brand new engine." Google Research Blog - Google. Oct. 31, 2014. (Dec. 5, 2014) http://googleresearch.blogspot.com/2014/10/google-flu-trends-gets-brand-new-engine.html
  • Stromberg, Joseph. "Why Google Flu Trends Can't Track the Flu (Yet)." Smithonian Magazine. March 13, 2014. (Dec. 5, 2014) http://www.smithsonianmag.com/ist/?next=/science-nature/why-google-flu-trends-cant-track-flu-yet-180950076/
  • Walsh, Bryan. "Google's Flu Project Shows the Failings of Big Data." Time. March 13, 2014. (Dec. 5, 2014) http://time.com/23782/google-flu-trends-big-data-problems/

More to Explore