Too Good to be True: How AI is Impacting Data Quality

[ad_1]

Picture this, you’re reviewing survey data and reading open-ended responses. You’ve just found the epitome of good open ends: a thorough, insightful answer, no spelling errors, no profanity. It’s perfect.

But is it… too perfect? This time last year, this probably wasn’t a question we would be asking ourselves.

We have tracked a steady decline in sample quality conversions since November 2022 due to an increase in suspected fraud activity and rising project level data quality removals. It’s probably no coincidence that ChatGPT, an artificial intelligence chatbot developed by OpenAI, launched on November 30th, 2022. ChatGPT exploded into the mainstream practically overnight with over 1 million users in just 5 days. Since then, we have been seeing an increase in suspicious open ends from historically valid panelists using AI to cut corners, more nefarious organizations bypassing fraud detection security systems, and a resurgence of the haunting ghost completes.

The use of artificial intelligence by fraudsters is one of the greatest examples of how fraud evolves over time, always keeping us on our toes. Quality checks that work now or worked in the past are not future proof. The research industry will continue to improve our tools to block fraud but on the other end, fraudsters also become more sophisticated in their ability to break into surveys. It will take a concentrated and collaborative effort across the industry to work against fraud enabled by AI.

While we do see an increase in possible fraud and questionable quality, the heightened awareness of these activities has also caused the everyday data cleaner to be suspicious of everything—not only the responses lacking insights, but also those with too much insight. Where we used to flag on too few words in an open end, we’re now suspicious of too many words, the usage of similar words, responses that have the exact same word count, responses with perfect punctuation, or multiple responses with similar misspellings.

With more thorough data cleaning measures being implemented (as they should be), there is cause for concern about the percentage of honest human responses that are being tossed out of an abundance of caution.

A couple of quick tips to aid in better spotting AI generated responses is to include an open-ended question that asks the opinion or feeling of a respondent. AI currently cannot respond with opinion or feelings on a topic so the responses are a bit easier to identify as they would all be fact driven. We also suggest including copy/paste detection in your survey. Blocking the respondent that attempts to paste an answer is not enough.

There is no good reason for a respondent to copy a response or question, either. Therefore, copying is a flag that can be added as a programmatic quality check as well. When AI is discovered, it is important to replay the details back to the recruitment source so they can take a deeper dive into the panelist’s behaviors.

Some market research organizations have already started acquiring, building, and implementing AI platforms to assist with their research. We anticipate good things as the industry collaborates to better understand AI technology and continues to strive toward quality. Moral of the story: don’t throw the babies out with the bathwater, but make sure the little guy gets a good scrub before we dress him up and send him out into the world.

[ad_2]

Source link

Advertising

Newsletter SignUp

Subscribe to our newsletter to get latest news, popular news and exclusive updates.

Please enable JavaScript in your browser to complete this form.