As many as 72 percent of American adults admit they've looked up health information online in the past year — that's about 90 million people, mostly searching for information about specific conditions such as a cough or flu, or treatments such as antibiotics. And more than three-quarters of those who search online health information begin their inquiry at Google, Bing or Yahoo [sources: Fox, Ginsberg]. Think about what kind of information is sitting in those search engine databases. Well, Google did.
Google Flu Trends (GFT) is an Internet-based influenza surveillance tool that uses aggregated search query data to predict flu trends in more than 25 countries, including the U.S. The project began in 2008 as an initiative under Google's philanthropic arm, Google.org, after the idea sprung from observed seasonal spikes of certain types of search terms.
For example, when springtime allergies strike, we're more likely to search for antihistamines than during the winter flu season, when we're more likely to search for information about our cold and flu symptoms such as fever or chills.
Google engineers used five years of historical big data — and we mean big. They tapped into their database of 50 million of the most commonly used prefiltered search queries to establish a baseline of general flu activity. The initial algorithm for the prediction tool relied solely on regional flu-related search query data (regional based on IP address), including overarching topics such as general influenza symptoms, cold remedies and antiviral medications.
The algorithm compares real-time search query data — the word or phrase you used as your search term, such as "sore throat" — against the baseline to determine levels of regional flu activity, ranging among five classifications from minimal to intense. Theoretically, GFT could provide current-day reporting (near real-time) of flu activity and predict influenza outbreaks weeks before the CDC compiles a report.
According to GFT inventors, though, GFT's real-time reporting is meant to be used as complementary information to the clinical and virological data in traditional surveillance (the CDC and its networks). GFT's fast detection is intended to help with early detection of not only flu epidemics, but also viral strain identification and the potential for pandemics.