O(K)hash (pronounced OK hash or oꭓash) is a hash function with
Calculating the hash of a large file normally involves reading the entire file, which can be time-consuming, especially with slow I/O operations. O(K)hash addresses this issue by reading a fixed number of bytes from random positions inside the file, providing a constant time complexity of
After all, it's just an OK hash:
-
Not Suitable for Detecting Corruptions: O(K)hash is not suitable for detecting file corruptions that do not change the size of the file. In the case of a bit flip or small corruptions, the probability of detecting corruption is lower than:
$\frac{base size}{file size}$ . - Consider File Size Checking: Depending on the nature and number of large files you are working with, it may be more effective to check the file size before calculating a conventional hash to ensure data integrity.
You can install O(K)hash using pip:
pip install okhash-
Calculate O(K)Hash of a String:
import okhash data = "Hello, world!" checksum = okhash.okhash(data.encode('utf-8'), K=3) print(checksum.hex())
-
Calculate & compare O(K)Hash of Files:
import okhash file1_checksum = okhash.okhash_filepath("file1.bin") file2_checksum = okhash.okhash_filepath("file2.bin") if okhash.compare_okhashes(file1_checksum, file2_checksum): print("Checksums match.") else: print("Checksums do not match.")
-
Calculate Checksums: To calculate checksums for a file with a specified K value (default is K=2), use the following command:
python3 -m okhash -K 3 file.bin
-
Check Checksums: The result of the previous command can be used to check checksums for multiple files:
python3 -m okhash *.bin > okhashes.txt python3 -m okhash --check okhashes.txt
-
Additional Options:
python3 -m okhash --help
Here's a table describing the strengths (K) and their corresponding parameters:
| K | Base Size (Subset Data Size for Hash Calculation) | Block Size |
|---|---|---|
| 1 | 1024 B = 1 KiB | 1024 B |
| 2 | 1048576 B = 1 MiB | 4096 B |
| 3 | 1073741824 B = 1 GiB | 262144 B |
| 4 | 1099511627776 B = 1 TiB | 16777216 B |
| K |
The minimum file size for a given K is equal to twice the base size; otherwise, the hash calculation will resort to SHA-256 for the entire file.
O(K)hash is released under the MIT License.