# Subgraph Mining

DS 352 Syllabus

last updated 15-Oct-2018

## Introduction of frequent subgraph mining

Extends association analysis to finding frequent subgraphs

Useful for

• Web Mining (interlinked web pages),
• computational chemistry (atoms and ions bonds),
• bioinformatics,
• spatial data sets,
• XML document elements,
• etc. ### Some graph definitions A graph is isomorphic if it is topologically equivalent to another graph ## Topological Equivalence ## Mapping transactions to graphs

Each transaction is a clique of nodes Graphs as transactions ## Counting Support ## Challenges

Node may contain duplicate labels

Support and confidence measures

• How to define them?

Additional constraints imposed by pattern structure

• Support and confidence are not the only constraints
• Assumption: frequent subgraphs must be connected

Apriori-like approach: Use frequent k-subgraphs to generate frequent (k+1) subgraphs

What is k?

Support: number of graphs that contain a particular subgraph

Apriori principle still holds

Level-wise (Apriori-like) approach:

• Vertex growing: k is the number of vertices
• Edge growing: k is the number of edges

## Vertex growing ## Edge growing  ## Apriori-like algorithm

Find frequent 1-subgraphs

Repeat:

Candidate generation

• Use frequent (k-1)-subgraphs to generate candidate k-subgraph

Candidate pruning

• Prune candidate subgraphs that contain infrequent (k-1)-subgraphs

Support counting

• Count the support of each remaining candidate

Eliminate candidate k-subgraphs that are infrequent

### Example dataset  In Apriori: Merging two frequent k-itemsets will produce a candidate (k+1)-itemset

In frequent subgraph mining (vertex/edge growing)

Merging two frequent k-subgraphs may produce more than one candidate (k+1)-subgraph

## Multiplicity of candidates (vertices) ## Multiplicity of candidates (edges)

Case 1: identical vertex labels ### Case 2: Core contains identical labels ### Case 3: Core multiplicity ## Candidate Generation by edge growing Case 1: if a ≠ c and b ≠ d Case 2: if a = c and b ≠ d Case 3: if a ≠ c and b = d   ### Case 4: if a = c and b = d  