Prior to each new year's flu season, the Google Flu Trends model is refreshed with 45 of the most useful influenza-related queries from years prior (those special search terms are chosen using logistic regression, but the exact queries and how they're weighted against others are kept top secret).
Additionally, GFT's post-season estimates are assessed against the traditional data surveillance reports used by the CDC to see how well the two match. Based on the prediction tool's ability to accurately estimate when that year's flu season begins, when the season will peak, and how severe it will be, the model may be updated. When it first launched in 2008, GFT had a mean correlation of 97 percent with CDC data [source: Ginsberg].
In September 2009, the model for the U.S. version of Google Flu Trends got its first update to include search query data from the H1N1 outbreak. This was because GFT's model had completely underestimated the H1N1 swine flu pandemic (which happened in the summertime). And then it continued to miss the mark.
During the 2011/2012 flu season, GFT overestimated the prevalence of flu by 50 percent. GFT also overestimated the 2012/2013 flu season, predicting as many as double the number of outpatient visits relating to ILI as the CDC actually reported. At the peak of the 2013/2014 flu season, GFT estimated that as many as 11 percent of the U.S. population had the flu. If that seems like a lot, it's because it is — the CDC, in comparison, reported 6 percent that season. Researchers report that the tool's accuracy may actually be much worse; they found that beginning in August 2011 GFT had overestimated in 100 out of 108 weeks [sources: Hodson, Walsh, Lazer].
The most common explanation for Google's flu prevalence overestimation is nothing more than our own jerkiness when flu season rolls around — you know, when you search the word "cough" in an effort to figure out if you're coming down with the flu, a cold or, maybe, wait, could it be pneumonia? Media use of phrases like "the worst flu season in years" and seasonal flu media reports also contribute to our cough-obsessed searches. The problem is that GFT doesn't know whether you're sick or just worried about getting sick; consider that only about 10 percent of all the people who seek medical care for the flu actually have influenza [source: Salzberg]. Google searches don't have context, and they don't know your intent.
But that might not be the complete answer.
In addition to ILI-related media hype inflating flu searches, working with big data can lead to making correlations that may not be accurate. It's the big data trap. While the results of mining the data may paint a relationship between seasonal search queries and, say, doctor visits, the sheer massiveness of the data set suggests that correlation's accuracy can't be trusted.
Another question about GFT's overestimation lies in Google's own search engine algorithm updates. Researchers propose that the introduction of the autosuggest feature in Google Search changed user behavior for the potential for overestimation in GFT; users searching for one flu symptom were now being encouraged to search for more (Google-recommended) flu-related terms, influencing overall ILI-related searches.
In 2012, the search engine began including possible conditions related to the symptoms queried, also potentially adding to the overestimation problem.
However, after poor performance again in the 2012/2013 flu season, GFT's algorithm was again updated. It would now downplay any media-driven irregularities and make its forecasts based on a statistical method called ElasticNet (which is a generalized linear model of regularized regression). But there was still room for improvement; the revised algorithm still overestimated by as much as 30 percent [source: Lohr].
In 2014, GFT engineers updated the GFT tool to include not only refreshed search data but also the traditional clinical and virological so-called small data from the CDC for the 2014/2015 flu season. Both engineers and scientists agree a combination of this information should lead to more accurate results.