# Subgraph Mining

DS 352 Syllabus

last updated 15-Oct-2018

## Introduction of frequent subgraph mining

Extends association analysis to finding frequent subgraphs

Useful for

• Web Mining (interlinked web pages),
• computational chemistry (atoms and ions bonds),
• bioinformatics,
• spatial data sets,
• XML document elements,
• etc.

### Some graph definitions

A graph is isomorphic if it is topologically equivalent to another graph

## Mapping transactions to graphs

Each transaction is a clique of nodes

Graphs as transactions

## Challenges

Node may contain duplicate labels

Support and confidence measures

• How to define them?

Additional constraints imposed by pattern structure

• Support and confidence are not the only constraints
• Assumption: frequent subgraphs must be connected

Apriori-like approach: Use frequent k-subgraphs to generate frequent (k+1) subgraphs

What is k?

Support: number of graphs that contain a particular subgraph

Apriori principle still holds

Level-wise (Apriori-like) approach:

• Vertex growing: k is the number of vertices
• Edge growing: k is the number of edges

## Apriori-like algorithm

Find frequent 1-subgraphs

Repeat:

Candidate generation

• Use frequent (k-1)-subgraphs to generate candidate k-subgraph

Candidate pruning

• Prune candidate subgraphs that contain infrequent (k-1)-subgraphs

Support counting

• Count the support of each remaining candidate

Eliminate candidate k-subgraphs that are infrequent

### Example dataset

In Apriori: Merging two frequent k-itemsets will produce a candidate (k+1)-itemset

In frequent subgraph mining (vertex/edge growing)

Merging two frequent k-subgraphs may produce more than one candidate (k+1)-subgraph

## Multiplicity of candidates (edges)

Case 1: identical vertex labels

## Candidate Generation by edge growing

Case 1: if a ≠ c and b ≠ d Case 2: if a = c and b ≠ d Case 3: if a ≠ c and b = d