HiveBrain v1.2.0
Get Started
← Back to all entries
principleModerate

In what ways can we distinguish between a human and bot behavior?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
canhumanwhatwaysbehaviorbotdistinguishbetweenand

Problem

Updated based on comments:

In what ways can we distinguish a human being doing certain activities online and a bot programmed to do similar activities, say checking email, downloading some music files, shopping on ebay, searching on Google etc., or maybe trying to deface/hack a website, brute force a log-in password etc.

To limit the scope of the question and make it more clear, let us restrict our observations only to network-oriented behavior, some examples being- the amount of time spent doing XYZ thing online, the amount/type of data downloaded (say) from a file sharing website, the number of friends/followers on Social media websites, etc.

I guess it should possible to obtain some 'patterns' which will distinguish human behavior and programmed behavior.

The Turing test is not what I am looking for.

What techniques can be useful here? Machine learning? Game theory?

References to relevant academic/research articles will also be good.

Solution

The most common/obvious way is a challenge-response test that is easy for humans but hard for computers (of course, but not only, CAPTCHA).

This kind of test is very effective{1} but falls under the HIP (Human Interactive Proofs) area: it's not transparent.

Typical, "simple" approaches to distinguish human website traffic from Bot are:

-
time it takes to populate all the fields and click the submit button of an input form (frequently used but simple to bypass).

Watching the cadence / pace of the communication is a more secure alternative (this is one of the feature of Google's No CAPTCHA reCAPTCHA);

-
honeypots (i.e. traps for bots that consist of a link or field present on the page that isn't visible to the human eye)

-
analysis of maximal continuous session length (humans have to rest) and correlation with time of day (see Distinguishing Humans from Bots
in Web Search Logs)

It must be considered that bot characteristics exhibit a wide variability for different crawler / different sites, therefore it's difficult to derive simple, deterministic heuristics: rule based systems imply a long list of static rules that are difficult to define and maintain (even by experts).

Machine learning techniques are often used:

  • Web robot detection: A probabilistic reasoning approach constructs a Bayesian Network that classifies automatically log sessions as being crawler or human induced



  • Discovery of web robot sessions based on their navigational patterns


uses C4.5 decision tree algorithm (after deriving the session features)

  • Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks develops Bloom Filter-derived techniques.



  • Neural networks applied to speed cheating detection in online computer games adopts an artificial neural network for bot detection in MMORPGs



  • Using Sentiment to Detect Bots on Twitter: Are Humans more Opinionated than Bots? tries Gaussian naive Bayes, Support Vector Machines and Random Forests



Almost every available AI/ML "tool" has been experimented. The main problem using these supervised machine learning tools is labeling the training dataset.

Even restricting the analysis to network-oriented behavior, this is a question of tremendous scope, for this reason I'm giving some keywords for further searches.

Notes

  • Machine Learning based attacks are improving and CAPTCHAs also serves as a benchmark task for artificial intelligence technologies (e.g. The End is Nigh: Generic Solving of Text-based CAPTCHAs)



Keywords

HIP (Human Interactive Proofs), CAPTCHA, Keystroke dynamics, Keystroke cadence, typing dynamics, IDS (Intrusion Detection System), honeypot, click fraud, spambot

References

  • Designing Human Friendly Human Interaction Proofs (HIPs) by Kumar Chellapilla, Kevin Larson, Patrice Simard, Mary Czerwinski (Microsoft Research)



  • The End is Nigh: Generic Solving of Text-based CAPTCHAs by Elie Bursztein, Jonathan Aigrain, Angelika Moscicki, John C.Mitchell (2014)



  • Keystroke Dynamics User Authentication Based on Gaussian Mixture Model and Deep Belief Nets by Yunbin Deng, Yu Zhong (2013)



  • User Authentication Through Typing Biometrics Features by Lívia C. F. Araújo, Luiz H. R. Sucupira Jr., Miguel G. Lizárraga, Lee L. Ling, andJoão B. T. Yabu-Uti (2005)



  • Distinguishing Humans from Bots in Web Search Logs by Omer M. Duskin Dror, G. Feitelson



  • Web robot detection: A probabilistic reasoning approach by Athena Stassopouloua, Marios D. Dikaiakos (2008)



  • An investigation of WWW crawler behavior: characterization and metrics by M.D. Dikaiakos, A. Stassopoulou, L. Papageorgiou (Computer Communications, 2005)



  • Discovery of web robot sessions based on their navigational patterns by Pang-Ning Tan, Vipin Kumar (2002)



  • Telling humans and computers apart automatically by Luis von Ahn, Manuel Blum, John Langford (Comm. ACM, 2004)



  • Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks by Linfeng Zhang, Young Guan (IEEE, 2008)



  • Bots Problem in Online Games by Dewanshu Jain, Alok Gupta



  • Neural networks applied to speed cheating detection in online computer games by Gaspareto, Barone, Schneider (2008)



  • Quantifying Online Advertising Fraud: Ad-Click Bots vs Humans by Adrian Neal, Sander Kouwenhoven (2015). Other insights: https://news.ycombinator.com/item?id=9023939



  • Comparison of Classification Algorithms to tell Bots and Humans Apart by Christian Hadiwijaya Saputra, Erwin Adi, Shintia Revina, Bina Nusantara (2014)



  • Using Sentiment to Detect Bots on Twitter: Are Humans more Opinionated than Bots? by John P. Dickerson, Vadim Kagan, V.S. Subrahmanian (2014)

Context

StackExchange Computer Science Q#13580, answer score: 10

Revisions (0)

No revisions yet.