Clickstream User Behavior Models
Gang Wang
Xinyi Zhang
Shiliang Tang
Christo Wilson
Haitao Zheng
Ben Y. Zhao
ACM Transactions on the Web, Vol. 11, No. 4, Article 21, July 2017
[Full Text in PDF Format, 2.3MB]
Paper Abstract
The next generation of Internet services is driven by users and user-generated content. The complex nature of user behavior makes it highly challenging to manage and secure online services. On one hand, service providers cannot effectively prevent attackers from creating large numbers of fake identities to disseminate unwanted content (e.g., spam). On the other hand, abusive behavior from real users also poses significant threats (e.g., cyberbullying).
In this article, we propose clickstream models to characterize user
behavior in large online services. By analyzing clickstream traces
(i.e., sequences of click events from users), we seek to achieve two
goals: (1) detection: to capture distinct user groups for the detection
of malicious accounts, and (2) understanding: to extract semantic
information from user groups to understand the captured behavior. To
achieve these goals, we build two related systems. The first one is a
semisupervised system to detect malicious user accounts (Sybils). The
core idea is to build a clickstream similarity graph where each node is
a user and an edge captures the similarity of two users' clickstreams.
Based on this graph, we propose a coloring scheme to identify groups of
malicious accounts without relying on a large labeled dataset. We
validate the system using groundtruth clickstream traces of 16,000 real
and Sybil users from Renren, a large Chinese social network. The second
system is an unsupervised system that aims to capture and understand the
fine-grained user behavior. Instead of binary classification (malicious
or benign), this model identifies the natural groups of user behavior
and automatically extracts features to interpret their semantic
meanings. Applying this system to Renren and another online social
network, Whisper (100K users), we help service providers identify
unexpected user behaviors and even predict users' future actions. Both
systems received positive feedback from our industrial collaborators
including Renren, LinkedIn, and Whisper after testing on their internal
clickstream data.