Skip to content

b18050/buzzdb-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

buzzdb-graph

Comparing Graph-Based Indexing vs. Normal Indexing

This project compares two methods of indexing relationships in a dataset: Graph-Based Indexing and Normal Indexing. The primary goal is to evaluate how these approaches differ in terms of storage, query efficiency, and traversal capability.


Problem Context

The dataset represents relationships (e.g., friendships) among entities (e.g., people). For example:

  • Entities: Ram, Shyam, Geet, Smit, Harry
  • Relationships:
    • Ram → Shyam, Geet, Smit
    • Harry → Geet, Smit
    • Geet → Ram, Harry
    • Smit → Ram, Harry

Indexing Approaches

1. Graph-Based Indexing

In this method:

  • Relationships are represented as directed edges in a graph structure.
  • Each entity (node) directly points to its related entities (neighbors).
  • Example:
    • Ram → {Shyam, Geet, Smit}
    • Harry → {Geet, Smit}
    • Geet → {Ram, Harry}
    • Smit → {Ram, Harry}
    • Shyam → {Ram}

Advantages:

  • Enables efficient traversal of relationships, especially for multi-level queries (e.g., BFS for mutual friends).
  • Supports graph-specific algorithms like DFS or BFS directly.
  • Compact representation of relationships.

Operations:

  • Apply BFS directly for tasks such as finding mutual friends or traversing relationships.

2. Normal Indexing

In this method:

  • Relationships are stored as tuples in a tabular format.
  • Each relationship is represented as a separate tuple (Entity1, Entity2).
  • Example Tuples:
    • (Ram, Shyam)
    • (Ram, Geet)
    • (Ram, Smit)
    • (Geet, Harry)
    • (Smit, Harry)

Normalized Representation:

  • Each entity maps to a list of tuples that describe its relationships:
    • Ram → {Tuple 1, Tuple 2, Tuple 3}
    • Geet → {Tuple 4, Tuple 9}
    • Smit → {Tuple 5, Tuple 8}
    • Harry → {Tuple 6, Tuple 7}
    • Shyam → {Tuple 10}

Advantages:

  • Easy to store in relational databases.
  • Simple to query for single-level relationships.

Limitations:

  • Traversing multiple levels (e.g., finding second-degree relationships) requires more computation (e.g., joins or repeated lookups).
  • Relational operations can be slower for complex queries.

Key Differences

Feature Graph-Based Indexing Normal Indexing
Storage Representation Compact graph with edges Flat list of tuples
Traversal Capability Directly supports BFS/DFS Requires lookups and joins
Efficiency for Multi-Level Queries High Moderate to Low
Querying Relationships Suited for graph traversal Suitable for simple queries
Scalability Scales well with dense graphs Can become cumbersome as size increases

Use Cases

  • Graph-Based Indexing: Ideal for scenarios requiring multi-level traversal, such as social networks or recommendation systems.
  • Normal Indexing: Suitable for storing large volumes of data with simple querying needs.

Implementation Details

  1. Graph-Based Indexing:

    • Use an adjacency list to store relationships.
    • Apply graph traversal algorithms like BFS or DFS for querying.
  2. Normal Indexing:

    • Use a relational table or flat file to store tuples.
    • Query tuples directly for first-level relationships and apply joins for deeper traversal.

Conclusion

Graph-based indexing offers a natural and efficient way to represent and traverse relationships for graph-like data, such as social networks. Normal indexing, while simpler to implement and store, can struggle with multi-level queries due to its reliance on relational operations.

This comparison highlights the trade-offs between the two approaches and provides insights into choosing the right method based on the requirements of the system.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published