Skip to content

SimoneAvellino/CMiner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMiner

CMiner is an algorithm for mining patterns from graphs using a user-defined support technique. This implementation provides a command-line interface for running the algorithm, with configurable options like minimum and maximum nodes, support, and search approach.

Installation

Prerequisites

Make sure you have the following requirements to run the project:

  • Python: Version 3.11.6
  • pip: Version 24.2

Installation steps

  1. Clone the repository:

    git clone https://github.com/SimoneAvellino/CMiner
  2. Download the repository from https://github.com/SimoneAvellino/CMiner.

  3. Move into the repository folder:

    cd CMiner
  4. Install the dependencies:

    pip install -r requirements.txt
  5. Install the library in editable mode:

    pip install -e .

Usage

CMiner <db_file> <support> [options]

Required arguments:

  • db_file: Absolute path to the graph database file.
  • support: Minimum support for pattern extraction: Specify a value between 0 and 1 to represent a percentage (e.g., 0.2 for 20%) or an absolute number (e.g., 20 for at least 20 graphs). To find patterns in all graphs, use 1 (100%). For patterns in at least one graph, use a value greater than 1 (e.g., 1.1).

Additional options:

  • -l, --min_nodes: Minimum number of nodes in the pattern (default: 1).
  • -u, --max_nodes: Maximum number of nodes in the pattern (default: infinite).
  • -n, --num_nodes: Exact number of nodes in the pattern (if this option is set, -l and -u are not considered).
  • -d, --directed: Flag to indicate if the graphs are directed (default: 1, directed).
  • -m, --show_mappings: Display mappings of found patterns (default: 0, not displayed).
  • -t, --templates_path: File paths to start the search. The index of the nodes must start from 0.
  • -f, --with_frequencies: Display for each pattern the frequency in each graph. (default: 0, not displayed).
  • -x, --pattern_type: Flag to indicate the type of pattern that CMiner return. It can be 'all', 'maximum' (default: all) NOTE: this feature is under development, it could have bug.
  • -o, --output_path: File path to save results, if not set the results are shown in the console.
  • -w, --worker: Number of parallel workers to mine the patterns.

Basic usage example

  • Mine patterns from 2 up to 3 nodes, present in at least 50% of graphs in the database.
CMiner /path/to/db.data 0.5 -l 2 -u 3
  • Mine all patterns present in at least 2 graphs in the database that have exactly 5 nodes.
CMiner /path/to/db.data 2 -n 5

Template usage examples

Some usage examples from the folder experiments/Datasets/OntoUML:

  • Mine all patterns present in at least 2 graphs in the database that match the template defined in S1.txt:
CMiner ./ontographs.data 2 -t ./S1.txt -n 3

Note: we specify -n 3 so that only solutions that are exactly the template are returned.

File:
t # 1
v 0 kind
v 1 subkind
v 2 subkind
e 1 0 Generalization
e 2 0 Generalization
Graphically:
S1 Graph
  • Same as before, but this time node labels are not specified:
CMiner ./ontographs.data 2 -t ./S2.txt -n 3
File:
t # 1
v 0
v 1
v 2
e 1 0 Generalization
e 2 0 Generalization
Graphically:
S1 Graph
  • You can also partially or completely omit labels for both nodes and edges:
CMiner ./ontographs.data 2 -t ./S3.txt -n 3
File:
t # 1
v 0 kind
v 1
v 2
e 1 0
e 2 0
Graphically:
S1 Graph

About

A Frequent Subgraph Mining Algorithm for multigraphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages