I Introduction
Online services and social networks (Snapchat, Quora, Amazon, etc.) have become common means of engagement in human activities including socialization, information sharing and commerce. However, given their dissemination power and reach, they have been longexploited by bad actors looking to manipulate user perception, spread misinformation, and falsely promote bad content. Such malicious activities are motivated by numerous factors including monetary incentives [thomas2013trafficking] to personal [smith2008cyberbullying] and political interests [bessi2016social]
. Tackling online misbehavior is a challenging research problem, given its variance with respect to manifestation across platforms, incentive structures and time. Despite these challenges, researchers have made progress in a variety of scenarios including fake account creation
[cao2014uncovering, xiao2015detecting], bot follower detection [shah2014spotting, jiang2014catchsync], malware detection [chau2011polonium]and more. However, many of these solutions require large, labeled datasets, and offer little to no extensibility. Unfortunately, given the everincreasing multitude of new abuse vectors, applicationspecific antiabuse solutions and rich labeled sets are not always feasible or timely, motivating research towards flexible, unsupervised methods.
In this work, we propose an unsupervised solution for a wideclass of misbehavior problems in which we are tasked with discovering grouplevel suspicious behavior across attributed data. Our work is inspired by prior works, which have demonstrated that such behaviors often occur in lockstep and are better discernible on a group, rather than individual level [kumar2018false, shah2017many]. However, we tackle a problem which is left unaddressed by prior work:
Problem 1 (Informal).
How can we quantify and automatically uncover highlysuspicious entity groups with multiple associated attribute types and values?
This setting naturally extends to a number of abuse detection scenarios, such as discovering online ecommerce scammers given profile creation, posting and email address attributes, or pinpointing fraudulent advertisers given ad URLs, targeting criteria and key phrases. Our work leverages the insight that groups of entities who share too many, and too unlikely, attribute values are unusual and worth investigation. We build on this intuition by (a) casting the problem of mining suspicious entity groups over multiple attributes as mining cohesive subgraphs across multiple views, (b) proposing a novel metric to quantify group suspiciousness in multiview settings, (c) developing a scalable algorithm to mine such groups quickly, and (d) demonstrating effectiveness. Specifically, our contributions are as follows.
Formulation. We propose modeling multiattribute entity data as a multiview graph, which captures notions of similarity via shared attribute values between entities. In our model, each node represents an entity (e.g. account, organization), each view represents an attribute (e.g. files uploaded, IP addresses used, etc.) and each nonzero edge indicates some attribute value overlap (see Figure 0(a))
Suspiciousness Metric. We design a novel suspiciousness metric, which quantifies subgraph (entity group) suspiciousness in multiview graphs (multiattribute settings). We identify desiderata that suspiciousness metrics in this setting should obey, and prove that our metric adheres to these properties while previously proposed options do not.
Algorithm. We propose SliceNDice, an algorithm which scalably extracts highly suspicious subgraphs in multiview graphs. Our algorithm uses a greedy, locally optimal search strategy to expand seeds of similar nodes into larger subgraphs with more cohesion. We discuss design decisions which improve performance including careful seeding, contextaware similarity weighting and performance optimizations. Our practical implementation leverages all these, enabling efficient entity group mining.
Practicality. We evaluate SliceNDice both on realworld data and simulated settings, demonstrating high detection performance and efficiency. We show effectiveness in detecting suspicious behaviors on the Snapchat advertiser platform, with distinct attributes and over 230K entities (advertiser organizations). Figure 0(b) shows a large network of ecommerce fraudsters uncovered by our approach (nodes are advertiser organizations, and edge colors indicate shared attributes, ranked in suspiciousness). We also conduct synthetic experiments to evaluate empirical performance against baselines, demonstrating significant improvements in multiattribute suspicious entity group detection. Overall, our methods and results offer significant promise in bettering web platform integrity.
Ii Related Work
We discuss prior work in two related contexts below.
Mining entity groups. Prior works have shown that suspicious behaviors often manifest in synchronous grouplevel behaviors [shah2017many, kumar2018false]. Several works assume inputs in the form of a single graph snapshot. [prakash2010eigenspokes, jiang2016inferring]
mine communities using eigenplots from singular value decomposition (SVD) over adjacency matrices.
[charikar2000greedy, hooi2016fraudar, blondel2008fast] propose greedy pruning/expansion algorithms for identifying dense subgraphs. [chakrabarti2004fully, dhillon2003information] cocluster nodes based on information theoretic measures relating to minimumdescription length. Some prior works [atzmueller2016description, atzmueller2011efficient] tackle subgroup mining from a single graph which also has node attributes, via efficient subgroup enumeration using treebased branchandbound approaches which rely on specialized community goodness scoring functions. Unlike these works, our work entails mining suspicious groups based on overlap across multiple attributes and graph views, such that attribute importance is respected and an appropriate suspiciousness measure is used.Several works consider group detection in a multiview (also known as multilayer and sometimes multiplex) formulation. [kim2015community] gives a detailed overview of community mining algorithms in this setting. Our work is unlike these in that it is the only one which operates in settings with views, considers different view importances, and allows for flexible view selection in subgroup discovery. Moreover, our work focuses on detection of suspicious groups in a multiview setting, rather than communities. [mao2014malspot]
uses PARAFAC tensor decomposition to find network attacks.
[jiang2016spotting, beutel2013copycatch, shin2016mzoom] use greedy expansion methods for finding entity groups. [shah2015timecrunch, metwally2015scalable] use single graph clustering algorithms, followed by stitching afterwards. Our work differs in that we (a) cast multiattribute group detection into a multiview graph formulation, (b) simultaneously mine a flexible set of nodes and views to maximize a novel suspiciousness metric, and (c) use a compressed data representation, unlike other methods which incur massive space complexity from dense matrix/tensor representation.Quantifying suspiciousness. Most approaches quantify behavioral suspiciousness as likelihood under a given model. Given labeled data, supervised learners have shown use in suspiciousness ranking tasks for video views [chen2015analysis], account names [freeman2013using], registrations [xiao2015detecting] and URL spamicity [ma2009beyond]. However, given considerations of label sparsity, many works posit unsupervised and semisupervised graphbased techniques. [shah2016edgecentric, hooi2016birdnest] propose information theoretic and Bayesian scoring functions for node suspiciousness in edgeattributed networks. [shah2014spotting, akoglu2010oddball] use reconstruction error from lowrank models to quantify suspicious connectivity. [lamba2017zoo] ranks node suspiciousness based on participation in highlydense communities, [gyongyi2004combating, guacho2018semi] exploit propagationbased based methods with few known labels to measure account sybil likelihood, review authenticity and article misinformation propensity. Our work (a) focuses on groups rather than entities, and (b) needs no labels.
Several methods exist for group subgraph/tensor scoring; [lee2010survey] overviews. The most common are density [lee2010survey], average degree [charikar2000greedy, hooi2016fraudar], subgraph weight, singular value [prakash2010eigenspokes]. [jiang2016spotting] defines suspiciousness using loglikelihood of subtensor mass assuming a Poisson model. Our work differs in that we (a) quantify suspiciousness in multiview graphs, and (b) show that alternative metrics are unsuitable in this setting.
Iii Problem Formulation
Symbol  Definition 

Number of total attributes (graph views)  
Set of possible values for attribute  
Maps nodes to attr. valueset:  
Multiview graph over all views  
Indicator for multiview graph views,  
Single () view of multiview graph  
over chosen views,  
Number of nodes in  
Volume of graph :  
Mass of graph  
Density of graph :  
Edge weight between nodes in , s.t.  
Multiview subgraph over all views  
Indicator for multiview subgraph views,  
view of multiview subgraph  
over chosen views,  
Number of nodes in  
Volume of subgraph of :  
Mass of subgraph  
Density of subgraph :  
Constraint on chosen subgraph views,  
Massparameterized suspiciousness metric  
Densityparameterized suspiciousness metric 
The problem setting we consider (Problem 1) is commonly encountered in many real antiabuse scenarios: a practitioner is given data spanning a large number of entities (i.e. users, organizations, objects) with a number of associated attribute types such as webpage URL, observed IP address, creation date, and associated (possibly multiple) values such as xxx.com, 5.10.15.20, 5.10.15.25, 01/01/19 and is tasked with finding suspicious behaviors such as fake sybil accounts or engagement boosters.
In tackling this problem, one must make several considerations: What qualifies as suspicious? How can we quantify suspiciousness? How can we discover such behavior automatically? We build from the intuition that suspicious behaviors often occur in lockstep across multiple entities (see Section II), and are most discernible when considering relationships between entities. For example, it may be challenging without context to determine suspiciousness of an advertiser linking to URL xxx.com, logging in from IPs 5.10.15.20 and with creation date 01/01/19; however, knowing that 99 other advertisers share these exact properties, our perception of suspiciousness increases drastically. Thus, we focus on mining suspicious entity groups, where suspiciousness is governed by the degree of synchronicity between entities, and across various attributes.
We draw inspiration from graph mining literature; graphs offers significant promise in characterizing and analyzing betweenentity similarities, and are natural data structures for the same. Graphs model individual entities as nodes and relationships between them as edges; for example, a graph could be used to describe whopurchasedwhat relationships between users and products on Amazon. Below, we discuss our approach for tackling Problem 1, by leveraging the concept of multiview graphs. We motivate and discuss three associated building blocks: (a) using multiview graphs to model our multiattributed entity setting, (b) quantifying group suspiciousness, and (c) mining highly suspicious groups in such multiview graphs. Throughout this paper, we utilize formal notation for reader clarity and convenience wherever possible; Table I summarizes the frequently used symbols and definitions, partitioned into attributerelated, graphrelated and subgraphrelated notation.
Iiia Representing Multiattribute Data
In this work, we represent multiattribute entity data as a multiview graph (MVG). An MVG is a type of graph in which we have multiple views of interactions, usually in the form of distinct edge types; to extend our previous example, we could consider whopurchasedwhat, whorateswhat and whoviewedwhat relationships as different views between users and products. Each view of an MVG can individually be considered as a single facet or mode of behavior, and spans over the same, fixed set of nodes. In our particular setting, we consider a representation of multiattribute data in which we have a fixed set of nodes (entities), and their relationships across multiple views (attributes). Thus, edges in each graph view represent relationships between entities, and are weighted based on their attribute similarities in that attribute space.
Formally, we have a set of entities with associated attribute types over the attribute value spaces . For notational convenience, we introduce attributevalue accessor mapping functions for the attribute spaces respectively, such that . Effectively, denotes the subset of attribute values from associated with node . We can construe this as an MVG on nodes (entities) and views (attributes), such that is a set of individual graph views . For convenience, we introduce notations and , to refer to a specific graph view, and a specific subset of graph views ( is a length vector indicator) respectively. We consider an edge with in view to exist between nodes if , or informally and share at least one common feature value on the attribute. If (no overlap between feature values), we consider that no edge between exists in , or equivalently that . In general, we consider nonnegative weights, so that . We can consider many weighting strategies, but posit the notion that large weight between and indicates intuitively higher, or more rare similarities.
IiiB Quantifying Group Suspiciousness
Given the above representation, our next aim is to define a means of quantifying the suspiciousness of an arbitrary multiview subgraph (MVSG). This problem is important in practice, given its implications on manual review prioritization, and practitioner decisionmaking on enforcement and actioning against various discovered behaviors. Our basic intuition is that a subset of nodes (groups of entities) which are highly similar to each other (have considerable edge weights between them in multiple views) are suspicious. But how can we compare across different MVSGs with varying edge weights, sizes and views in a principled way? For example, which is more suspicious: 5 organizations with the same IP address and URL, or 10 organizations with the same postal code and creation date? The answer is not obvious; this motivates us to formally define criteria that any MVSG scoring metric should obey, and define principled ways of quantifying how suspiciousness increases or decreases in these contexts.
To do so, we first propose several preliminaries which aid us in formalizing these criteria. Informally, we consider an MVSG of nodes and views as a subset of nodes, views or both from ; we denote this relationship compactly as , and sometimes abuse notation (when clear) to refer to the associated node set as . We introduce similar indexing notation as in the MVG case, such that and refer to a specific subgraph view, and a subset of subgraph views ( is a length vector indicator) respectively. We define the mass of as , which represents the total sum of edge weights for all edges between nodes in . We define the volume of as n choose 2 , which denotes the possible number of edges between nodes. Note that the volume of is invariant to the view chosen and is only dependent on (thus, we drop the subscript). Finally, we define the density of as the ratio between its mass and volume, or . We define analogs for mass, volume and density of the associated MVG with , and respectively. In general, uppercase variables denote properties of , while lowercase letters denote properties of . Given these terms, which are summarized in Table 1, we propose the following axioms below which should be satisfied by an MVSG scoring metric. Note that and represent the desired MVSG scoring metric paramerized by subgraph mass and density, respectively.
Axiom 1 (Mass).
Given two subgraphs with the same volume, and same mass in all except one view s.t. , is more suspicious. Formally,
Intuitively, more mass in a view indicates increased attribute similarities between entities, which is more suspicious. For example, it is more suspicious for a group of users to all share the same profile picture, than to all have different profile pictures.
Axiom 2 (Size).
Given two subgraphs with same densities , but different volume s.t. , is more suspicious. Formally,
Intuitively, larger groups which share attributes are more suspicious than smaller ones, controlling for density of mass. For example, 100 users sharing the same IP address is more suspicious than 10 users doing the same.
Axiom 3 (Contrast).
Given two subgraphs , with same masses and size , s.t. and have the same density in all except one view s.t. , is more suspicious. Formally,
Intuitively, a group with fixed attribute synchrony is more suspicious when background similarities between attributes are rare. For example, 100 users using the same IP address is generally more rare (lower ) than 100 users all from the same country (higher ).
Axiom 4 (Concentration).
Given two subgraphs with same masses but different volume s.t. , is more suspicious. Formally,
Intuitively, a smaller group of entities sharing the same number of similarities is more suspicious than a larger group doing the same. For example, finding 10 instances (edges) of IP sharing between a group of 10 users is more suspicious than finding the same in a group of 100 users.
Axiom 5 (Crossview Distribution).
Given two subgraphs with same volume and same mass in all except two views with densities s.t. has and has and , is more suspicious. Formally,
Intuitively, a fixed mass is more suspicious when distributed towards a view with higher edge rarity. For example, given 100 users, it is more suspicious for 100 pairs to share IP addresses (low ) and 10 pairs to share the same country (high ), than vice versa. This axiom builds from Axiom 3.
Figure 2 illustrates these axioms via toy examples. Informally, these axioms assert that when other subgraph attributes are held constant, suspiciousness constitutes: higher mass (Axiom 1), larger volume with fixed density (Axiom 2), higher sparsity in overall graph (Axiom 3), higher density (Axiom 4), and more mass distributed in sparser views (Axiom 5). These axioms serve to formalize desiderata, drawing from intuitions stemming from prior works; however, prior works [jiang2016inferring, hooi2016fraudar] do not consider multiview cases, and as we show later are unable to satisfy these axioms. Notably, such metrics produce unexpected results when scoring MVSGs, and thus lead to misaligned expectations in resulting rankings. We propose the following problem:
Problem 2 (MVSG Suspiciousness Scoring).
We discuss details of our solution in Section IV.
IiiC Mining Suspicious Groups
Given an MVSG scoring metric , our next goal is to automatically extract MVSGs which score highly with respect to . This is a challenging problem, as computing for each possible MVSG in is intractable; there are nonempty candidate node subsets, and
nonempty view subsets over which to consider them. Clearly, we must resort to intelligent heuristics to mine highly suspicious MVSGs while avoiding an enumeration strategy. We make several considerations in approaching this task. Firstly, since suspiciousness is clearly related to shared attribute behaviors, we propose exploiting our data representation to identify candidate “seeds” of nodes/entities which are promising to evaluate in terms of
.Moreover, we focus on mining MVSGs (WLOG) given a constraint on the numbers of views such that
. This is for a few reasons: Firstly, it is not straightforward to compare the suspiciousness of two MVSGs with varying numbers of views, as these are defined on probability spaces with different numbers of variables. Secondly, we consider that (a) entities may not exhibit suspicious behaviors in all
views/attributes simultaneously, but rather only a subset, and (b) in evaluation, practitioners can only interpretably parse a small number of relationship types between a group at once; thus, in practice we choose a constraint is generally small and can be suitably chosen and adapted according to empirical interpretability. In effect, this simplifies our problem as we only consider evaluating and mining MVSGs defined over views, so that we consider (K choose z) total view subsets, rather than . We propose the following problem:Problem 3 (Mining Suspicious MVSGs).
Given an MVG on views, a suspiciousness metric over the set of candidate MVSGs, and an interpretability constraint , find the highest scoring MVSGs (WLOG) for which .
We detail our approach and implementation strategy for solving this problem in Section V.
Iv Proposed Suspiciousness Metric
We propose an MVSG scoring metric based on an underlying data model for in which undirected edges between the nodes are distributed i.i.d within each of the views. For single graph views, this model is widely known as the ErdösRényi (ER) model [newman2002random]. The ER model is a standard “null” model in graph theory, as it provides a framework to reason about a graph with pure atchance node interactions, and is a reasonable assumption given no prior knowledge about how a govem graph is structured. In our scenario, we consider two extensions to such a model unexplored by prior work: (a) we consider multiple graph views, and (b) we consider weighted cell values instead of binary ones. The first facet is necessary to support multiattribute or multiview settings, in which behaviors in one view may be very different than another (i.e. shared IP addresses may be much rarer than shared postal codes). The second facet is necessary to support continuous edge weights capable of richly describing arbitrarily complex notions of similarity between multiple entities (i.e. incorporating both number of shared properties, as well as their rarities). To handle these extensions, we propose the MultiView ErdösRényiExponential model (MVERE):
Definition 1 (Mvere Model).
A multiview graph generated by the MVERE model is defined such that for all edges in .
The MVERE
model is a natural fit in our setting for several reasons. Firstly, the Exponential distribution is continuous and defined on support
, which is intuitive as similarity is generally nonnegative. Secondly, it has mode 0, which is intuitive given that sharing behaviors are sparse (most entities should not share properties), and the likelihood of observing highsimilarity drops rapidly.Given that there are edges (including 0weight edges) in each view, we can derive the closedform MLE simply as . From this, we can write the distribution of singleview MVSG mass as follows:
Lemma 1 (MVSG subgraph mass).
The mass of a MVEREdistributed subgraph of follows
Proof.
This follows from convolution of the Exponential distribution; given , . ∎
This enables us to define the suspiciousness of a given MVSG across multiple views in terms of the likelihood of observing some quantity of mass in those views. Our metric is defined as
Definition 2 (MVSG Scoring Metric ).
The suspiciousness, , of an MVSG with and volume is the negative loglikelihood of its mass under the MVERE model:
We can write this metric in longer form as follows:
The last line is due to , after which we use Stirling’s approximation to simplify . It is sometimes convenient to write suspiciousness in terms of densities ; thus, we also introduce a soparameterized variant where we use and and simplify as
The intuition for this metric is that high suspiciousness is indicated by low probability of observing certain mass. Since we are interested MVSGs with unlikely high density (indicating synchrony between entities), we consider only cases where for all views, to avoid focusing on excessively sparse MVSGs.
Lemma 2 (Adherence to Axioms).
Proof.
We give the full proofs in Section VIII. ∎
We note that while the model and metric above considers independent views, we could easily consider adapting the assumed model to account for arbitrary factorizations of the joint distribution, given this knowledge. However, choosing a more complex factorization is a problemspecific inference task, and also raises issues with the curse of dimensionality which is broader than the scope of our work. We employ the independence assumption due to its limited estimation challenges under sparsity, generality, and demonstrated usefulness in a wealth of prior works despite their simplicity.
Iva Issues with Alternative Metrics
Axiom Adherence 
Mass [lee2010survey] 
AvgDeg [charikar2000greedy] 
Dens [lee2010survey] 
SingVal [prakash2010eigenspokes] 
CSSusp [jiang2016spotting] 
SliceNDice 

Mass  ✔  ✔  ✔  ✔  ? ^{1}^{1}1CSSusp is limited to discrete edge counts and cannot handle continuous mass settings.  ✔ 
Size  ✔  ✘  ✘  ✔  ?  ✔ 
Contrast  ✘  ✘  ✘  ✘  ?  ✔ 
Concentration  ✘  ✘  ✔  ✔  ?  ✔ 
CrossView Distr.  ✘  ✘  ✘  ✘  ✘  ✔ 
One may ask, why not use previously established metrics of suspiciousness? We next show that these metrics produce results which violate one or more proposed Axioms, and are thus unsuitable for our problem. We compare their performance on the toy settings from Figure 2. Each subgraph pair illustrates one of Axioms 15, and the shaded figures indicates higher intuitive suspiciousness. We consider 5 alternative metrics: mass (Mass) and density (Dens) [lee2010survey], average degree (AvgDeg) [charikar2000greedy], singular value (SingVal) [prakash2010eigenspokes] and CSSusp (metric from [jiang2016spotting]).
Overview of Alternative Metrics. Prior work has suggested Mass, AvgDeg and Dens as suspiciousness metrics for single graph views [shin2016mzoom, beutel2013copycatch, hooi2016fraudar]. We extend these to multiview cases by construing an aggregated view with edge weights summed across the views. [jiang2016spotting] proposes CSSusp for suspiciousness in discrete, multimodal tensor data; we can apply this by construing an MVSG as a 3mode tensor of . In short, other metrics are agnostic to differences across views, hence they all violate Axiom 5 (CrossView Distribution). Below, we discuss the specifics of each alternative metric, and their limitations with respect to Axioms 14.
Mass: Mass is defined as , or the sum over all edge weights and views. Table II shows that it violates Axioms 2 (Size) and 3 (Contrast) by not considering subgraph size or graph density .
AvgDeg: Average degree is defined as , or average Mass per node. It does not consider subgraph density and thus violates Axioms 2 (Size) and 4 (Concentration). It also violates Axiom 3 (Contrast), by not considering graph density .
Dens: Density is defined as , or average Mass per edge, over edges. It trivially violates Axiom 2 (Size) by not considering the ratio of subgraph density and size. It also violates Axiom 3 (Contrast) by not considering graph density .
SingVal: Singular values are factor “weights” associated with the singular value decomposition ; here, we consider the leading singular value over the viewaggregated . [shah2014spotting] shows that for i.i.d. cells in , , though this does not hold generally. Under this assumption, the metric violates Axiom 3 (Contrast), by not considering graph density .
CSSusp: [jiang2016spotting] defines block suspiciousness as , where is subtensor mass under assumption that cells are discrete, Poisson draws. However, this constrains adherence to Axioms 14 (Mass, Size, Contrast and Concentration) only for discrete settings, and is unusable for continuous edge weights/cell values. This limitation is notable, as later shown in Sections VA and VI.
V Proposed Algorithm: SliceNDice
Given the metric defined in the previous section, we next aim to efficiently mine highly suspicious groups, as proposed in Problem 3. At a highlevel, our approach is to start with a small MVSG over a few nodes and views, and expand via a greedy, alternating maximization approach (between nodes and views), evaluating until a local optimum is reached. Our main considerations are twofold: How can we scalably expand a given seed until convergence? and How can we select good seeds in the first place? We next address these questions.
Our goal is to find MVSGs (WLOG) which score highest on , and also meet the interpretability constraint . As mentioned in Section IIIC, full enumeration is combinatorial and computationally intractable in large data. Thus, we resort to a greedy approach which allows scalable convergence to locally optimal MVSGs. Our approach, SliceNDice, is outlined in Algorithm 1.
In short, the algorithm begins by seeding an MVSG defined over a few nodes and views according to some notion of suitability, and then utilizes an alternating maximization approach to improve the seed: the node set is kept fixed while the view set is updated, and subsequently the view set is kept fixed while the node set is updated. The updation steps only occur when increases, and since suspiciousness is bounded for any given (i.e. there are a finite number of possible choices for nodes and views), we ensure convergence to a local optimum. Next, we discuss updating and seeding choices, where we cover the referenced updation and seeding methods.
Updating choices. In order to find a highly suspicious group of entities, we aim to optimize the view set and node set selection via the UpdateViews and UpdateNodes methods. UpdateViews can be written concisely as , subject to . This is straightforward given our metric; we independently choose the top most suspicious views, given the fixed node set from the prior iteration. For UpdateNodes, we limit our search space to adding or removing a single node in the MVSG, which is dramatically more tractable than the possible node set choices over . We write UpdateNodes concisely as , subject to , meaning that each update changes the node set by, at most, a single entity (one more or one less).
Seeding choices. Clearly, update quality relies on reasonable seeding strategy which is able to find candidate suspicious MVSGs and also explore . The SeedViews method can be achieved in multiple ways; ultimately, the goal is to chooses initial views such that the seeds expand to a diverse set suspicious MVSGs. Given the desire for diversity, a reasonable approach is to sample views uniformly as done in prior work [jiang2014inferring, shin2016mzoom]. However, a downside with random sampling is that it does not respect our intuition regarding the nonuniform value of entity similarity across views. For example, consider that a view of country similarity has only 195 unique values, whereas a view of IP Address similarity has unique values; naturally, it is much more common for overlap in the former than the latter, despite the latter having a higher signaltonoise ratio. Thus, in practice, we aim to sample views in a weighted fashion, favoring those in which overlap occurs less frequently. We considered using the inverse of view densities
as weights, but we observe that density is highly sensitive to outliers from skewed value frequencies. We instead use the inverse of the
frequency percentiles across views as more robust estimates of their overlap propensity ( works well in practice). The effect is that lower signaltonoise ratio views such as country are more rarely sampled.Defining SeedNodes is more challenging. As Section IV mentions, we target overly dense MVSGs (WLOG) for which for all views. The key challenge is identifying seeds which satisfy this constraint, and thus offer promise in expanding to more suspicious MVSGs. Again, one could consider randomly sampling node sets and discarding unsatisfactory ones, but given that there are constraints (one per view), the probability of satisfaction rapidly decreases as increases (Section VI elaborates). To this end, we propose a carefully designed, greedy seeding technique called GreedySeed (see Algorithm 2) which enables us to quickly discover good candidates. Our approach exploits that that satisfactory seeds occur when entities share more similarities, and strategically constructs a node set across views which share properties with each other. Essentially, we initialize a candidate seed with two nodes that share a similarity in a (random) one of the chosen views, and try to incrementally add other nodes connected to an existing seed node in views where the constraint is yet unsatisfied. If unable to do so after a number of attempts, we start fresh. The process is stochastic, and thus enables high seed diversity and diverse suspicious MVSG discovery.
Va Implementation
We describe several considerations in practical implementation, which improve result relevance and computational efficiency.
Result quality. The quality of resulting MVSGs depends on the degree to which they accurately reflect truly suspicious synchronous behaviors. Not all synchronicity is equally suspicious, such as for attributes from freeform user text input. Consider an attribute “File Name” for user uploads. An edge between two users which share an uploaded file named “Cheapgucciad.jpg” is, intuitively, more suspiciouss than if the file was named “Test” (which is a common placeholder that unrelated users are likely to use). To avoid considering the latter case, which is a type of spurious
synchronicity, we use techniques from natural language processing to carefully weight similarities in the MVG construction. Firstly, we enable value blacklisting for highly common stopwords (e.g.
Test). Overlap on a stopworded value produces 0 mass (no penalty). Next, we weight edges for other value overlaps according to the common TFIDF NLP technique, which in this case is best characterized as a value’s inverse entity frequency (IEF) [aggarwal2012mining]. We define the IEF for value in view as: , which significantly discounts common values. We let , so that edge weight between two nodes (entities) depends on both number and rarity of shared values.Computational efficiency. Next, we discuss several optimizations to improve the speed of suspicious MVSG discovery. Firstly, we observe that SliceNDice is trivially parallelizable. In our implementation, we are able to run thousands of seed generation and expansion processes simultaneously by running in a multithreaded setting, and aggregating a ranked list afterwards. Another notable performance optimization involves the computation of view masses in UpdateNodes, which is the slowest part of SliceNDice. A naïve approach to measure synchronicity given nodes is to quadratically evaluate pairwise similarity, which is highly inefficient. We instead observe that it is possible to compute by cheaply maintaining the number of value frequencies in a viewspecific hashmap , such that . Specifically, indicates that similarities exist in the subgraph view on the value , and since each of them contribute weight, we can write the total mass as . This approach makes it possible to calculate view mass in linear time with respect to the number of subgraph nodes, which drastically improves efficiency. Furthermore, the space requirements are, for each view, a hashmap of value frequencies as well as a table of unique attribute values for each entity, a compressed representation compared to tensorbased approaches [prakash2010eigenspokes]. The time and space complexity requirements are more formally described in Section VIC.
Vi Evaluation
Our experiments aim to answer the following questions.

Q1. Detection performance: How effective is SliceNDice in detecting suspicious behaviors in real and simulated settings? How does it perform in comparison to prior works?

Q2. Efficiency: How do GreedySeed and SliceNDice scale theoretically and empirically on large datasets?
Via Datasets
We use both real and simulated settings in evaluation.
Snapchat advertiser platform. We use a dataset of advertiser organization accounts on Snapchat, a leading social media platform. Our data consists of organizations created on Snapchat from 20162019. Each organization is associated with several singlevalue (e.g. contact email) and multivalue attributes (e.g names of ad campaigns). All in all, we use attributes deigned most relevant to suspicious behavior detection, made available to us by Snapchat’s Ad Review team, whom we partner with for domain expert analysis and investigation:

Account details (6): Organization name, emails, contact name, zipcode, login IP addresses, browser user agents.

Advertisement content (6): Ad campaign name (e.g. Superbowl Website), media asset hashes, campaign headlines (e.g. 90% off glasses!), brand name (e.g. GAP), iOS/Android app identifiers, and external URLs.
Based on information from the domain experts, we pruned 1.7K organizations from the original data, primarily including advertisement agencies and known affiliate networks which can have high levels of synchrony (often marketing for the same subsets of companies), and limited our focus to the remaining organizations.
Simulated settings. We additionally considered several simulated attack settings, based on realistic attacker assumptions. Our intuition is that attack nodes will have higher propensity to share attribute values than normal ones, and may target varying attributes and have varying sophistication. Our simulation parameters include (entity count) and (total attribute types), (length, cardinalities of attribute value spaces), (entities and attributes per attack), (number of attacks), (value count per normal entity), and (attack temperature, s.t. attackers choose from a restricted attribute space with cardinalities . Together, these parameters can express a wide variety of attack types. Our specific attack procedure is:

Initialize normal entities. For each attr. , draw specific values uniformly over .

Conduct an attack, by randomly sampling entities and attributes. For each attr. , draw specific attr. values uniformly over . Repeat times.
Unless otherwise mentioned, for each scenario we fix parameters as nodes, views, attr. cardinality, nodes per attack, views per attack, mean values drawn per node and attribute, and temperature.
ViB Detection performance
We discuss performance in both settings.
Snapchat advertiser platform. We deployed SliceNDice on Google Cloud compute engine instances with high vCPU count to enable effective parallelization across many seeds over 2 days. During this time, we yielded a total of suspected entity group behaviors within Snapchat data. We fixed for this run, based on two reasonS: (a) input from our Ad Review partners, who noted that too many apparent features hindered the review process via information overload, and (b) balances the two extremes of low preventing discovery of more distributed, stealthier fraud which can only be uncovered by overlaying multiple graph views, and high hurting interpretability and increasing difficulty to satisfy density constraints. Finally, after mining and ranking many MVSGs, we aggregate and filter results for expert review by pruning “redundant” MVSGs which covering the same set of nodes; we use a Jaccard similarity threshold to determine overlap. We note that the number of distinct blocks was fairly robust to ; for example, a more liberal threshold of yielded distinct groups. We chose conservatively in order to minimize redundancy in results shared with the Ad Review team. We evaluated our methods both quantitatively and qualitatively in coordination with them. We were not able to compare with other suspicious group mining methods which rely on block decomposition or search [papalexakis2013more, prakash2010eigenspokes, charikar2000greedy, shin2016mzoom, jiang2016spotting] as these are highly dense matrices/tensors which are too large to manifest for this scale.
Quantitative Evaluation: We sort the suspected subgraphs in descending order and submitted those in the top 50 groups to the Ad Review team for indepth, manual validation of the constituent organizations. We provided the domain experts with 3 assets to aid evaluation: (a) network visualizations like those in Figure 3, (b) mappings between the highest frequency occurring attribute values for each attribute, and all organization entities associated with those attributes, and (c) entitylevel indices listing all instances of attribute synchrony and associated values, for each organization involved per group. While the exact review process and signals used are masked for security reasons, the review process generally consisted of evaluating individual advertiser accounts in each cluster, and determining their suspiciousness based on previous failed payments, previously submitted spammy or fraudulentlooking ads, similarity with previously discovered advertiser abuse vectors, and other proprietary signals. From surveying these organizations, reviewers found that an overwhelming were connected to fraudulent behaviors that violated Snapchat’s advertiser platform Terms of Service, resulting in an organizationlevel precision of . The organizations spanned diverse behaviors, including individuals who created multiple accounts to making multiple (similar) accounts to generate impressions before defaulting early on their spending budget (avoiding payment), those selling counterfeit goods or running ecommerce scams, fraudulent surveys with falsely promised rewards, and more. Intuitively, the diversity of these behaviors supports the flexibility of our problem formulation; despite SliceNDice’s agnosticism to the multiple types of abuse, it was able to uncover them automatically using a shared paradigm.
Qualitative Evaluation: We inspect two instances of advertiser fraud, shown in Figure 3, with help from the Ad Review team. Although SliceNDice selected groups based on suspiciousness according to only the top () views, we show the composites of similarities for all 12 attributes to illustrate the differences between the two attacks. On the left, we show the first case of blatant fraud across organizations, which are connected across all of the considered attributes to varying degrees. Many of these organizations were accessed using a common subset of IP addresses, with the cluster near the center all sharing browser user agents. Upon further inspection, we find that these accounts are engaging in ecommerce fraud, and link to a series of URLs that follow the pattern contactusXX.[redacted].com, where ranges from 0127. Multiple accounts link to these common URLs and share common advertisement headlines, which combined with shared login activities ranks the group very highly according to our metric. Our second case study is of a smaller ring of organizations, which could be considered stealthy fraud. These organizations appear to market a line of Tarot card and Horoscope applications. We noticed attempts taken by the fraudsters to cloak this ring from detection: no app identifier appears more than 4 times across the dataset, but most of the organizations have identifiers associating to similar app variants. This discovery illustrates SliceNDice’s ability to “string together” shared properties across multiple attribute views to discover otherwise hardtodiscern behaviors.
Simulated settings. We consider detection performance on simulations matching several realistic scenarios.

High attacker synchrony: Default settings; attacks sample attributes from th of associated attribute spaces, making them much denser than the norm.

Low attacker synchrony: . Attribute spaces are restricted to and thus much harder to detect.

Highsignal attribute attacks: Attack views are sampled with weights (more likely to land in sparse views).

Lowsignal attribute attacks: Attack views are sampled with weights (more likely to land in dense views).

Attacks in high dimensionality: . Attacks are highly distributed in higher dimensionality.
Our detection task is to classify each attribute overlap as suspicious or nonsuspicious; thus, we aim to label each nonzero entry (“behavior”) in the resulting
tensor. We evaluate SliceNDice’s precision/recall performance along with several stateoftheart group detection approaches in this task. For each method, we penalize each behavior in a discovered block using the block score given by that method, and sum results over multiple detected blocks. The intuition is that a good detector will penalize behaviors associated with attack entities and attributes more highly. We compare against PARAFAC decomposition [mao2014malspot], MultiAspectForensics (MAF) [maruhashi2011multiaspectforensics], Mzoom [shin2016mzoom], SVD [prakash2010eigenspokes], and AvgDeg [charikar2000greedy]. For SVD and AvgDeg which only operate on single graph views, we use the aggregated view adjacency matrix, whereas for others we use the adjacency tensor. SliceNDice utilizes the compressed representation discussed in Section VA. For SVD, we use singular value (SingVal) as the block score. For PARAFAC and MAF, we use the block norm which is effectively the higher order SingVal. AvgDeg uses the average subgraph degree (AvgDeg), and Mzoom measures mass suspiciousness under the Poisson assumption (CSSusp). Section VIII gives further details regarding baseline comparison.Figure 4 shows precision/recall curves for all 5 attack scenarios; note that SliceNDice significantly outperforms competitors in all cases, often maintaining over 90% precision while making limited false positives. Alternatives quickly drop to low precision for substantial recall, due to their inability to both (a) find and score groups while accounting for differences across attributes (Axiom 5), and (b) correctly discern the suspicious from nonsuspicious views, even when the right subgraph is discovered. In practice, this allow attributes with low signaltonoise ratio and higher natural density to overpower suspicious attacks which occur in less dense views.
ViC Efficiency
We consider efficiency of both SliceNDice, as well as our seeding algorithm, GreedySeed. The time complexity of SeedViews is trivially , as it involves only choosing random views. We can write the complexity of any suitable method for SeedNodes loosely as , given views and iterations to satisfy each density constraint successively. However, in practice, this notion of is illdefined and can adversely affect performance. We explore practical performance in average time to find suitable seeds which satisfy constraints, for both random seeding and our GreedySeed: Figure 4(a) shows that our GreedySeed finds seeds faster than random seeding on real data; note the log scale. Random seeding struggles significantly in meeting constraints as the number of views increases, further widening the performance gap between the methods. Each iteration of UpdateNodes is given nodes, views, and values per attribute. UpdateViews is simply , as UpdateNodes already updates subgraph masses across all views. The runtime in practice is dominated by UpdateNodes; assuming iterations per MVSG, the overall SliceNDice time complexity is . Figures 4(b)4(c) show that SliceNDice scales linearly with respect to both entities (over fixed iterations) and iterations (over fixed entities). Moreover, the overall space complexity is , due to the compact attributeoriented data representation described in Section VA. Note that alternative group mining methods which rely on block decomposition or search were infeasible to run on real data, due to the sheer attribute sharing density in the tensor representation, which grows quadratically with each shared value.
Vii Conclusion
In this work, we tackled the problem of scoring and discovering suspicious behavior in multiattribute entity data. Our work makes several notable contributions. Firstly, we construe this data as a multiview graph, and formulate this task in terms of mining suspiciously dense multiview subgraphs (MVSGs). We next propose and formalize intuitive desiderata (Axioms 15) that MVSG scoring metrics should obey to match human intuition, and designed a novel suspiciousness metric based on the proposed MVERE model which satisfies these metrics, unlike alternatives. Next, we proposed the SliceNDice algorithm, which enables scalable ranking and discovery of MVSGs suspicious according to our metric, and discussed practical implementation details which help result relevance and computational efficiency. Finally, we demonstrated strong empirical results, including experiments on real data from the Snapchat advertiser platform where we achieved 89% precision over 2.7K organizations and uncovered numerous fraudulent advertiser rings, consistently high precision/recall (over 97%) and outperformance of several stateoftheart group mining algorithms, and linear scalability.
References
Viii Reproducibility
Viiia Satisfaction of Axioms
Below, we show that our suspiciousness metric (and ) satisfies Axioms 15, and posited in Lemma 2. In each case, we consider how or changes as individual properties vary; since they are simply reparameterizations of one another, we suffice it to prove adherence to each axiom using the more convenient parameterization. We reproduce the axioms with each proof below for reader convenience.
Axiom (Mass).
Given two subgraphs with the same volume, and same mass in all except one view s.t. , is more suspicious. Formally,
Proof of Axiom 1 (Mass).
Because we are operating under the constraint , then which implies that . Therefore, and so as mass increases, suspiciousness increases. ∎
Axiom (Size).
Given two subgraphs with same densities , but different volume s.t. , is more suspicious. Formally,
Proof of Axiom 2 (Size).
The derivative of with respect to subgraph volume is:
This implies that , because always holds when (which holds given ). Therefore, suspiciousness increases as volume increases. ∎
Axiom (Contrast).
Given two subgraphs , with same masses and size , s.t. and have the same density in all except one view s.t. , is more suspicious. Formally,
Proof of Axiom 3 (Contrast).
The derivative of suspiciousness with respect to density is:
This implies that , because and . Thus, as graph density increases, suspiciousness decreases. Alternatively, as sparsity increases, suspiciousness increases. ∎
Axiom (Concentration).
Given two subgraphs with same masses but different volume s.t. , is more suspicious. Formally,
Proof of Axiom 4 (Concentration).
The derivative of view ’s contribution suspiciousness (parameterized by mass) and w.r.t the volume is:
because , so therefore . Therefore, for a fixed subgraph mass , suspiciousness decreases as volume increases. ∎
Axiom (Crossview Distribution).
Given two subgraphs with same volume and same mass in all except two views with densities s.t. has and has and , is more suspicious. Formally,
Proof of Axiom 5 (CrossView Distribution).
Assume that view is sparser than view (), and we are considering a subgraph which has identical mass in both views (). Adding mass to the sparser view will increase suspiciousness more than adding the same amount of mass to the denser view because:
∎
ViiiB Baseline Implementations for Comparison
We compared against 5 baselines in Section VI: PARAFAC [mao2014malspot], MAF [maruhashi2011multiaspectforensics], Mzoom [shin2016mzoom], AvgDeg [charikar2000greedy] and SVD [prakash2010eigenspokes]. Below, we give background and detail our implementations for these baselines.
ViiiB1 Parafac
PARAFAC [papalexakis2013more] is one of the most common tensor decomposition approaches, and can be seen as the higherorder analog to matrix singular value decomposition. An rank PARAFAC decomposition aims to approximate a multimodal tensor as a sum of rankone factors which, when summed, best reconstruct the tensor according to a Frobenius loss. In our case, the decomposition produces factor matrices of , of and of , such that we can write the tensor associated with MVG as
where denotes the column vector of (analogue for and ), and . Since PARAFAC gives continuous scores per node in the and vectors for each rankone factor, we sum them and then use the decision threshold suggested in [shin2016mzoom] () to mark nodes above that threshold as part of the block. We then select the top views which individually have the highest singular value over the associated submatrix, and penalize the associated entries with the norm of the rankone tensor (closest approximation of SingVal in higher dimensions). We use the Python tensorly library implementation, and decomposition.
ViiiB2 Maf
MAF [maruhashi2011multiaspectforensics] also utilizes PARAFAC decomposition, but proposes a different node inclusion method. Their intuition is to look for the largest “bands” of nodes which have similar factor scores, as they are likely clusters. Since in our case, and both reflect node scores, we sum them to produce the resulting node factor scores. We then compute a logspaced histogram over these using 20 bins (as proposed by the authors), and sort the histogram from highest to lowest frequency bins. We move down this histogram, including nodes in each bin until reaching the 90% energy criterion or 50% bin size threshold proposed by the authors. We mark these nodes as included in the block, and select the top views with the highest associated submatrix singular values, as for PARAFAC. We likewise penalize entries in this block using the associated block norm. We use the Python tensorly library implementation, and specify a decomposition.
ViiiB3 Mzoom
Mzoom [shin2016mzoom] proposes a greedy method for dense subtensor mining, which is flexible in handling various blocklevel suspiciousness metrics. It has been shown outperform [jiang2016spotting] in terms of discovering blocks which maximize CSSusp metric, and hence we use it over the method proposed in [jiang2016spotting]. The algorithm works by starting with the original tensor , and greedily shaves values from modes which maximize the benefit in terms of improving CSSusp. When a block is found which maximizes CSSusp over the local search, Mzoom prunes it from the overall tensor in order to not repeatedly converge to that block, and repeats the next iteration with . As Mzoom does not allow selection of a fixed views, we only modify their implementation to add the constraint that for any blocks found with views, we limit output to the first for fairness. We penalize entries in each block with that block’s CSSusp score. We use the authors’ original implementation, which was written in Java (and available on their website) and specify 500 blocks to be produced (unless the tensor is fully deflated/empty sooner).
ViiiB4 Svd
SVD [prakash2010eigenspokes], as discussed in Section II, is a matrix decomposition method which aims to produce a lowrank optimal reconstruction of according to Frobenius norm. In our case, since we aggregate over the views and produce a resulting matrix for , a rank SVD decomposes the matrix , where are , and is and diagonal, containing the singular values. Loosely, SVD effectively discovers lowrank structures in the form of clusters, such that indicate cluster affinity and indicates cluster strength or scale. We take a similar approach as for PARAFAC, in that for each rank , we sum the factor scores and and use a decision threshold of to mark node membership in the given block. Then, we rank the individual views according to their respective leading singular values, and choose the top for inclusion. We then penalize entries in the block with the submatrix SingVal score. We use the Python scipy package, and specifically the svds method to run sparse SVD, for which we use an decomposition.
ViiiB5 AvgDeg
[charikar2000greedy] proposes an algorithm, which we call AvgDeg, for greedily mining dense subgraphs according to the AvgDeg notion of suspiciousness. The algorithm proposed gives a 2approximation in terms of returning the maximally dense subgraph, and works by considering a single graph , and greedily shaving nodes with minimum degree while keeping track of the AvgDeg metric at each iteration. Upon convergence, the algorithm returns a subgraph which scores highly given AvgDeg. As for SVD, we consider to be aggregated over all views. Though the algorithm proposed by the author was initially defined in terms of unweighted graphs, we adapt it to weighted graph setting by shaving nodes greedily with minimum weighted degree rather than the adjacent edge count. Upon discovery of one subgraph , we repeat the next iteration with . After finding the nodes for each subgraph, we choose the top views with the highest individual AvgDeg for inclusion in the block. We request up to 500 blocks, but in practice find that the algorithm converges very quickly because it incorrectly prunes a large number of nodes in earlier iterations.
ViiiB6 SliceNDice
We use the standard implementation as described in Section VA, evaluating over 500 blocks. Our implementation is written in Python, and will be made available publicly.
ViiiC Source Code and Datasets
All source code including calculation of the proposed suspiciousness metric, and our implementation of the SliceNDice algorithm is available at http://github.com/hamedn/SliceNDice/. Our implementation was done using Python 3.7. The algorithm takes as input a CSV, where each row is an entity and each column is an attribute, and returns a ranked list of suspicious groups by row identifier. The code used to generate simulated attacks as discussed in Section VI is also included, and allows researchers and practitioners to create their own simulated attack datasets by modifying simulation parameters: (entity count), (total attribute types), (length, cardinalities of attribute value spaces), (entities and attributes per attack), (number of attacks), (value count per normal entity), and (attack temperature, s.t. attackers choose from a restricted attribute space with cardinalities . We also include benchmarking code used to compare the performance of SliceNDice against the aforementioned baselines. Unfortunately, given that the Snapchat advertiser data contains sensitive PII (personally identifiable information), it is not possible for us to release the dataset publicly. In fact, to the best of our knowledge, no such realworld social datasets are publicly available in the multiattribute setting. This is because although the multiattribute detection setting is a highly common one in many social platforms, attributed data in these settings is typically PII.
Comments
There are no comments yet.