Skip to content

vuldin/trie-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

trie-rs

A Rust implementation of a trie data structure with a focus on matching phrases using word stems.

This is the Rust port of the JavaScript @vuldin/trie package, providing the same phrase-matching functionality with Rust's performance and safety guarantees.

Features

  • Phrase-based matching: Nodes represent word stems rather than individual characters
  • Intelligent text processing: Filters common words and uses stemming for fuzzy matching
  • Flexible API: Support for method chaining and multiple phrase formats
  • Zero dependencies: All text processing implemented with standard library only
  • Performance: Built with Rust for speed and memory efficiency

Usage

Add to your Cargo.toml:

[dependencies]
trie-rs = "0.1.0"

Basic Example

use trie_rs::Trie;

fn main() {
    let mut trie = Trie::new();
    
    // Add phrases to the trie
    trie.add("Hello world!")
        .add("This is a test sentence that contains some common words.")
        .add("Method calls can be chained");
    
    // Search for phrases
    if let Some(result) = trie.find("test sentence") {
        println!("Found {} matches, exact: {}", result.count, result.exact);
        // Output: Found 2 matches, exact: true
    }
    
    // Common words are automatically filtered
    if let Some(result) = trie.find("test a sentence") {
        println!("Found {} matches, exact: {}", result.count, result.exact);
        // Output: Found 2 matches, exact: true (same as above)
    }
}

How It Works

Text Processing Pipeline

  1. Phrase Extraction: Input text is split into phrases using punctuation
  2. Word Filtering: Common English words are removed from each phrase
  3. Stemming: Remaining words are converted to their stem forms (e.g., "running" → "run")
  4. Trie Construction: Stems are inserted into the trie structure in reverse order

Example Transformation

The text:

"Here is a test sentence that contains some common words in English."

Becomes the stems:

["english", "word", "common", "contain", "sentenc", "test", "here"]

Data Structure

Unlike traditional tries that store individual characters, this implementation:

  • Stores word stems at each node
  • Tracks which phrases contain each stem
  • Supports partial phrase matching
  • Enables fuzzy matching through stemming

API Reference

Trie::new()

Creates a new empty trie.

trie.add(text: &str) -> &mut Self

Adds text to the trie. Supports method chaining.

trie.find(phrase: &str) -> Option<FindResult>

Searches for a phrase in the trie. Returns None if no matches found.

FindResult

pub struct FindResult {
    pub count: usize,  // Number of matching stems
    pub exact: bool,   // Whether all stems in the phrase were found
}

Testing

Run the test suite:

cargo test

Performance

The Rust implementation provides significant performance improvements over the JavaScript version:

  • 2.8-3.5x faster execution for insertion and search operations
  • Zero dependencies for smaller bundle size (137KB vs 193KB with dependencies)
  • Zero-copy string operations where possible
  • Efficient HashMap-based lookups
  • Memory-safe operations without garbage collection overhead
  • Custom text processing algorithms optimized for performance

License

MIT License - see the LICENSE file for details.

About

Rank text by best match to seeded words and phrases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages