Study Finds Online Surveys Vulnerable to Bad Data

(CN) – Online political polls have surged in recent years as an alternative to traditional telephone surveys, but the Pew Research Center said Tuesday that a small but measurable percentage of online survey respondents provide bogus answers.

According to Pew’s analysis, more than 80% of political polls are now conducted online via an opt-in approach. Typically, opt-in respondents are either paid to take surveys through crowdsourcing websites or sample populations are gathered from various sources. In either case, Pew found that 4 to 7% of opt-in respondents did not answer questions about themselves truthfully or gave illogical responses.

For example, researchers found that those labeled as bogus respondents often answered questions as though the survey was for market research, such as answering with “great product” to a question about the Affordable Care Act, colloquially known as Obamacare after former President Barack Obama.

The data also suggested that such respondents were more likely to indicate approval on any given question, such as the health care law or approval of President Donald Trump’s job in office. In an aggregate of surveys conducted last year, Pew found a significant disparity between overall approvals and approvals from bogus respondents. Overall, an average of 41% of respondents approved of President Trump, but respondents flagged as bogus in quality checks indicated 78% average approval.

Pew also found that bogus respondents tended to take the same surveys multiple times, rush through the questions, give unrelated answers or even plagiarize answers to open-ended questions.

In one colorful example, a bogus respondent answered a question with, “Y’all need a panda tail to go to bed and go get food or drinks sugar or drinks and then I eat a chicken nuggets.”  In another open-ended question about policy direction in Washington, D.C., one respondent simply copied the Wikipedia summary for the state of Washington.

In sum, Pew found that bogus respondents were often trying to take many surveys for the small fees they earned. Analysts also indicated that a certain number of the respondents are bots, or artificial intelligence software, though they noted it is difficult to separate human respondents flagged in quality checks from outright bots.

Notably, this phenomenon is nonpartisan in nature. Analysts found that the bias bogus respondents appeared to hold was only toward giving positive answers to questions, rather than answering in ways that would support or detract from policy goals of either major political party. Overtly partisan questions that could not prompt a positive or negative answer reduced the likelihood of bogus data.

Analysts found that the most genuine data came from sample populations gathered through traditional methods, such as sampling residential addresses. These surveys were not opt-in, but were taken online after the sample population was compiled. Researchers found that only 1% of respondents were bogus using this method, rather than the 4 to 7% in opt-in surveys.

Measurable amounts of bogus data being included in online surveys, however, is a relatively minor problem overall. Bogus data tended to fit within the established margin of error, and subsequent polling diluted tainted data further by indicating broader trends.

Despite growing concerns about the veracity of polling as a useful metric in public discourse, analysts found that using traditional sampling methods paired with sheer volume will nonetheless render accurate results, even if the actual survey is conducted online.

%d bloggers like this: