Skip to content

add a new tool in libbpf-tools named ext4File#5440

Open
niebowen666 wants to merge 1 commit intoiovisor:masterfrom
niebowen666:ext4File
Open

add a new tool in libbpf-tools named ext4File#5440
niebowen666 wants to merge 1 commit intoiovisor:masterfrom
niebowen666:ext4File

Conversation

@niebowen666
Copy link
Copy Markdown

@niebowen666 niebowen666 commented Dec 18, 2025

ext4file introduction

Overview

ext4file is used to monitor the I/O patterns (buffer or direct) of each file in the target ext4 filesystem, as well as the hint used by each file.

Why ext4file

ext4file is a tool used to track file-level buffer or direct I/O. Currently, in the repository, there exist block-layer tools to monitor I/O patterns of whole disk (such as biopattern, biolatency, etc.) and VFS-layer tools to trace file lifecycle and I/O behavior of files throughout the entire VFS (such as filelife and vfsstat, etc.).

Below is a comparative summary:

Tool Name Layer Main Function Tracks Filename? Differences
biopattern Block Measures the proportion of random vs. sequential I/O on a storage device ❌ No Can not achieve file-layer tracing(The other bio tools all have this problem)
fsdist VFS Tracks latency distribution of operations like read, write, open, and sync ❌ No Focus on latency, not I/O pattern
fsslower VFS Traces slow file operations (e.g., long-latency reads/writes), focuses on I/O size ✅ Yes Trace the I/O size and latency, not I/O pattern
filelife VFS Monitors file lifecycle events (creation and deletion) ✅ Yes Only focus on file creation and deletion
filetop VFS Shows real-time I/O activity of active files (displays only top entries to avoid verbosity) ✅ Yes There exists no distinction between buffer and direct I/O
ext4file ext4 Filesystem Tracks buffer vs. direct I/O patterns per file, enables fine-grained file-level monitoring using inode ✅ Yes

How to use

Run ext4file before executing your test. You can refer to ./ext4file -h to get the usage of the tool

Show I/O pattern for every file in ext4 filesystem.

Usage: ./ext4file [-h] [-d DIR] [-o FILE] [interval] [count]

Options:
  -h, --help                   Print this help message
  -d DIR, --dir=DIR            Trace the ext4 filesystem mounted on the specified directory
  -o FILE, --output=FILE       Write output to a file (optional; default: stdout)
  interval                     Time interval (in seconds) between reports (default: unlimited)
  count                        Number of reports to generate (default: unlimited)

Examples:
  ./ext4file -d /mnt/ext4                      # Trace I/O patterns of files on the ext4 filesystem mounted at /mnt/ext4
  ./ext4file -d /mnt/ext4 1 10                 # Generate 10 reports, one per second
  ./ext4file -d /mnt/ext4 -o output 1 10       # Generate 10 reports at 1-second intervals, saving output to ./output

The output could be:

root@server:/home/nbw/OpenSource/biohint/libbpf-tools# ./ext4file -d /mnt/ext4File/
EXT4 Filesystem Info: blocks_count=3750232064 blocks_per_group=32768 bg_cnt=114448
Tracing Ext4 read/write... Hit Ctrl-C to end.
2026-01-14 13:58:21
file_name            inode      pa_inode   hint   buffer_read     direct_read     buffer_write    direct_write    delete
test3                83361794   83361793   0      0               0               0               0               False
test2                34         2          2      8               0               1               0               False
dir1                 83361793   2          0      0               0               0               0               False
dir2                 440467457  2          0      0               0               0               0               True
test3                33         2          3      8               0               1               0               False
test1                33         2          5      8               0               1               0               True
test3                440467458  440467457  0      0               0               0               0               True

Below is the detailed explanation of each field in the ext4file output. This tool traces per-file I/O patterns (buffered vs. direct) on ext4 filesystems, providing fine-grained visibility into application behavior.

Field Description
file_name The name of the file (without full path). Note: multiple files may share the same name.
inode The inode number of the file. Inode is the unique identifier for a file within the filesystem, even across renames or hard links. This enables accurate tracking of I/O for specific files.
pa_inode The inode number of the file’s parent directory.
hint The FDP (Flexible Data Placement) hint value associated with the file. FDP is a new NVMe feature that enables the host to guide data placement on the SSD.
buffer_read Number of buffered read operations performed on the file. Buffered I/O goes through the kernel page cache.
direct_read Number of direct read operations performed on the file. Direct I/O bypasses the page cache.
direct_write Number of direct write operations performed on the file. Like direct read, it skips the page cache and writes data directly from user space to storage.
buffer_write Number of buffered write operations performed on the file. Data is first written to the page cache and later flushed to disk asynchronously by the kernel.
delete Indicates whether the file has been unlinked (deleted). If True, the file was removed from the directory but may still be accessible if held open by a process. I/O on such files can indicate resource leaks or long-running file handles.

Target Audience

This tool is intended for ext4 filesystem developers and performance engineers who need to analyze I/O behavior at the file level.

@Bojun-Seo
Copy link
Copy Markdown
Contributor

Bojun-Seo commented Jan 6, 2026

Here are my quick notes:

  • Docs: Need more explanation
  • Naming: ext4File -> ext4file
  • Patch splitting: Please split patches functionally or logically

Thanks

@niebowen666
Copy link
Copy Markdown
Author

Here are my quick notes:

  • Docs: Need more explanation
  • Naming: ext4File -> ext4file
  • Patch splitting: Please split patches functionally or logically

Thanks

Thanks for your reply.
But I wonder what kind of docs should I offer and which directory should I submit these docs to.
Besides, another two PR has been submitted: #5439 and #5429.
Could you take a look if you have time. Thanks a lot!

@Bojun-Seo
Copy link
Copy Markdown
Contributor

When I said docs, I actually meant the commit message.
I want you to provide the purpose, necessity, value, and usage instructions in the commit message.

@niebowen666 niebowen666 force-pushed the ext4File branch 2 times, most recently from 7cc4f5e to a5912d8 Compare January 15, 2026 11:43
@niebowen666
Copy link
Copy Markdown
Author

When I said docs, I actually meant the commit message. I want you to provide the purpose, necessity, value, and usage instructions in the commit message.

Hi Bojun,
I have fix my code and update the commit message.

  • Detailed explanation has been commit
  • The name have been changed to ext4file
  • I have removed the tracking of the time for file creation, deletion, and access. Currently, ext4file only focuses on file-level I/O patterns.

tcptop \
vfsstat \
wakeuptime \
ext4file \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this new tool ? We already have fsdist/fsslower/filelife/filetop ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ext4file is a tool used to track file-level buffer or direct I/O.
If a certain file is expected to be accessed by direct I/O, ext4file can detect abnormal I/O access.
I have read the source code of the tools you listed above.

  • fsdist focuses on the execution time of operations like read, write, open, and sync, which is different from the issue we are concerned with(Buffer I/O and Direct I/O).
  • Compared to fsdist, fsslower is more powerful. The information it traces includes file names and pays attention to the size of I/O. It also sets a threshold, and if the execution time of an operation is below this threshold, it will skip tracing. Although it tracks file names, it does not achieve file-level tracking, because a file name does not represent a unique file. In addition, it tracks the size of I/O rather than the distribution between buffer and direct, so the results of ext4file can complement those of fsslower.
  • filelife ignores I/O and only focuses on the creation and deletion of files.
  • To prevent excessive output, filetop only displays part of the data, and filetop's I/O tracking cannot further determine whether it is buffer or direct.

ext4file can complement the tools mentioned above and can determine whether a file exhibits unexpected I/O under complex workloads.

@Bojun-Seo
Copy link
Copy Markdown
Contributor

I'm someone who believes that each commit/patch should be self-contained and complete (self-contained atomic unit). I think developers should be able to understand the full context and intent just by reading the commit message alone, without having to dig through the PR description or conversation thread.

Therefore, it would be great if you could include the PR description into the commit message(s). Also, if you revise the patches so that each individual commit/patch maintains its own completeness (rather than scattering fixes across multiple small follow-up commits), it would make the review much easier.

Additionally, it would be helpful to add the answer to question of @chenhengqi directly into the explanation under the Why ext4file section.

Copy link
Copy Markdown
Contributor

@Bojun-Seo Bojun-Seo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don’t think we need to split this into two commits. How about combining them into one?
  • Too long commit title. Commit title is usually shorter than 70 or 80 characters.
  • I just quickly checked bpf.c for now.

@@ -0,0 +1,196 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
// Copyright (c) 2025 Samsung Electronics Co., Ltd.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the license title of .bpf.c, .c and .h file.

__type(value, struct file_info_val);
} file_info_map SEC(".maps");

static __always_inline bool str_equal(const char *a, const char *b) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check coding style consistency.

char* a vs char *a

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the coding style into a unified format: char *a

int BPF_PROG(my_ext4_add_entry, handle_t* handle,
struct dentry* dentry, struct inode* inode)
{
bpf_printk("ext4_add_entry");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bpf_printk in BPF code can cause performance issues.
I recommend removing it—not just here, but in all other places as well.

You'll see that no other tools except memleak use bpf_printk in their BPF code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted the unnecessary bpf_printk.

int BPF_PROG(my_ext4_file_read_iter,
struct kiocb *iocb, struct iov_iter *to)
{
//bpf_printk("ext4_file_read_iter\n");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unnecessary comments.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have remove the comments

@niebowen666
Copy link
Copy Markdown
Author

  • I don’t think we need to split this into two commits. How about combining them into one?
  • Too long commit title. Commit title is usually shorter than 70 or 80 characters.
  • I just quickly checked bpf.c for now.

Thank you, get it√

@Bojun-Seo
Copy link
Copy Markdown
Contributor

@niebowen666
If you're not actually trying to close the PR, but rather want to prevent others from reviewing it temporarily while you're still making changes, I recommend changing the PR's status to Draft not Closed.

@niebowen666 niebowen666 reopened this Feb 9, 2026
@niebowen666
Copy link
Copy Markdown
Author

@niebowen666 If you're not actually trying to close the PR, but rather want to prevent others from reviewing it temporarily while you're still making changes, I recommend changing the PR's status to Draft not Closed.

Thanks for your advice.
I have merged the two commits with a shorter commit title.

@niebowen666
Copy link
Copy Markdown
Author

niebowen666 commented Feb 24, 2026

@Bojun-Seo
Hi, Bojun.
I have merged the two commits with a shorter commit title. Anything wrong or unsuitable about my PR?
Thanks!

Copy link
Copy Markdown
Contributor

@Bojun-Seo Bojun-Seo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, please change the title of commit message.

I noticed there are two maps in the bpf.c code. Conceptually, it seems like only one map would be sufficient. The ino_name_map uses the inode as the key and file_info_key as the value, while the file_info_map uses file_info_key as the key and file_info_value as the value. This effectively means that the inode is the ultimate key, and all other information is stored as part of the value in one map.

So, I’m wondering—was there a specific reason for separating them into two maps?

} file_info_map SEC(".maps");

static __always_inline bool str_equal(const char *a, const char *b) {
for (size_t i = 0; i < MAX_FILE_NAME; i++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of MAX_FILE_NAME? Where is it defined?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is 255 which is defined in ext4file.h. I set this value based on the definition of NAME_MAX in the Linux kernel.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MAX_FILE_NAME the same as NAME_MAX?
Even if the macro names are different, does it get automatically converted or something?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I found that the NAME_MAX in Linux kernel is set to 255. So I set a new macro MAX_FILE_NAME to 255

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I found that the NAME_MAX in Linux kernel is set to 255.

OK.

So I set a new macro MAX_FILE_NAME to 255

I cannot find the code that sets the new macro MAX_FILE_NAME. Could you tell me the line number of ext4file.h where it is defined?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, NAME_MAX in ext4file.h should be changed to MAX_FILE_NAME. I have fixed it

@niebowen666
Copy link
Copy Markdown
Author

First, please change the title of commit message.

I noticed there are two maps in the bpf.c code. Conceptually, it seems like only one map would be sufficient. The ino_name_map uses the inode as the key and file_info_key as the value, while the file_info_map uses file_info_key as the key and file_info_value as the value. This effectively means that the inode is the ultimate key, and all other information is stored as part of the value in one map.

So, I’m wondering—was there a specific reason for separating them into two maps?

As you can see, ext4file tracks the deletion of files. We envision that users need to frequently create and delete files, and the file descriptor (fd) resources in the kernel are limited, so fd reuse may occur. In this case, an fd may not represent a specific file. Based on the idea, we believe that the existing method of uniquely representing a file is more reasonable.

@Bojun-Seo
Copy link
Copy Markdown
Contributor

First, please change the title of commit message.
I noticed there are two maps in the bpf.c code. Conceptually, it seems like only one map would be sufficient. The ino_name_map uses the inode as the key and file_info_key as the value, while the file_info_map uses file_info_key as the key and file_info_value as the value. This effectively means that the inode is the ultimate key, and all other information is stored as part of the value in one map.
So, I’m wondering—was there a specific reason for separating them into two maps?

As you can see, ext4file tracks the deletion of files. We envision that users need to frequently create and delete files, and the file descriptor (fd) resources in the kernel are limited, so fd reuse may occur. In this case, an fd may not represent a specific file. Based on the idea, we believe that the existing method of uniquely representing a file is more reasonable.

Got it — that makes sense.
By the way, could you include this in the commit message?

@niebowen666
Copy link
Copy Markdown
Author

First, please change the title of commit message.
I noticed there are two maps in the bpf.c code. Conceptually, it seems like only one map would be sufficient. The ino_name_map uses the inode as the key and file_info_key as the value, while the file_info_map uses file_info_key as the key and file_info_value as the value. This effectively means that the inode is the ultimate key, and all other information is stored as part of the value in one map.
So, I’m wondering—was there a specific reason for separating them into two maps?

As you can see, ext4file tracks the deletion of files. We envision that users need to frequently create and delete files, and the file descriptor (fd) resources in the kernel are limited, so fd reuse may occur. In this case, an fd may not represent a specific file. Based on the idea, we believe that the existing method of uniquely representing a file is more reasonable.

Got it — that makes sense. By the way, could you include this in the commit message?

Sure, I have already modified the commit message.

Copy link
Copy Markdown
Contributor

@Bojun-Seo Bojun-Seo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add ext4file on .gitignore file.

#include "trace_helpers.h"
#include "ext4file.h"

#define BG_LIST_NUM 57232
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove dead code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it.

… is used to monitor the I/O patterns (buffer or direct) of each file in the target ext4 filesystem, as well as the hint used by each file.

ext4file is used to monitor the I/O patterns (buffer or direct) of each file in the target ext4 filesystem, as well as the hint used by each file.

ext4file is a tool used to track file-level buffer or direct I/O. Currently, in the repository, there exist block-layer tools to monitor I/O patterns of whole disk (such as biopattern, biolatency, etc.) and VFS-layer tools to trace file lifecycle and I/O behavior of files throughout the entire VFS (such as filelife and vfsstat, etc.).

Below is a comparative summary:

| Tool Name     | Layer               | Main Function                                                                                                 | Tracks Filename? |  Differences |
|---------------|---------------------|---------------------------------------------------------------------------------------------------------------|------------------|------------------------|
| biopattern    | Block        | Measures the proportion of random vs. sequential I/O on a storage device                                | No | Can not achieve file-layer tracing(The other bio tools all have this problem) |
| fsdist        | VFS           | Tracks latency distribution of operations like read, write, open, and sync                                    | No | Focus on latency, not I/O pattern |
| fsslower      | VFS           | Traces slow file operations (e.g., long-latency reads/writes), focuses on I/O size                            | Yes| Trace the I/O size and latency, not I/O pattern |
| filelife      | VFS           | Monitors file lifecycle events (creation and deletion)                                                        | Yes| Only focus on file creation and deletion |
| filetop       | VFS          | Shows real-time I/O activity of active files (displays only top entries to avoid verbosity)                   | Yes| There exists no distinction between buffer and direct I/O |
| ext4file      | ext4 Filesystem | Tracks **buffer vs. direct I/O patterns** per file, enables fine-grained file-level monitoring using inode | Yes|  |

Run ext4file before executing your test. You can refer to ./ext4file -h to get the usage of the tool

Show I/O pattern for every file in ext4 filesystem.

Usage: ./ext4file [-h] [-d DIR] [-o FILE] [interval] [count]

Options:
  -h, --help                   Print this help message
  -d DIR, --dir=DIR            Trace the ext4 filesystem mounted on the specified directory
  -o FILE, --output=FILE       Write output to a file (optional; default: stdout)
  interval                     Time interval (in seconds) between reports (default: unlimited)
  count                        Number of reports to generate (default: unlimited)

Examples:
  ./ext4file -d /mnt/ext4                      # Trace I/O patterns of files on the ext4 filesystem mounted at /mnt/ext4
  ./ext4file -d /mnt/ext4 1 10                 # Generate 10 reports, one per second
  ./ext4file -d /mnt/ext4 -o output 1 10       # Generate 10 reports at 1-second intervals, saving output to ./output

The output could be:

root@server:/home/nbw/OpenSource/biohint/libbpf-tools# ./ext4file -d /mnt/ext4File/
EXT4 Filesystem Info: blocks_count=3750232064 blocks_per_group=32768 bg_cnt=114448
Tracing Ext4 read/write... Hit Ctrl-C to end.
2026-01-14 13:58:21
file_name            inode      pa_inode   hint   buffer_read     direct_read     buffer_write    direct_write    delete
test3                83361794   83361793   0      0               0               0               0               False
test2                34         2          2      8               0               1               0               False
dir1                 83361793   2          0      0               0               0               0               False
dir2                 440467457  2          0      0               0               0               0               True
test3                33         2          3      8               0               1               0               False
test1                33         2          5      8               0               1               0               True
test3                440467458  440467457  0      0               0               0               0               True

Below is the detailed explanation of each field in the ext4file output. This tool traces per-file I/O patterns (buffered vs. direct) on ext4 filesystems, providing fine-grained visibility into application behavior.
| Field         | Description   |
|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| file_name | The name of the file (without full path). Note: multiple files may share the same name.                                                                                                                     |
| inode     | The inode number of the file. Inode is the unique identifier for a file within the filesystem, even across renames or hard links. This enables accurate tracking of I/O for specific files.                 |
| pa_inode  | The inode number of the file’s parent directory. |
| hint      | The FDP (Flexible Data Placement) hint value associated with the file. FDP is a new NVMe feature that enables the host to guide data placement on the SSD.                                                 |
| buffer_read | Number of buffered read operations performed on the file. Buffered I/O goes through the kernel page cache.                                                                                             |
| direct_read | Number of direct read operations performed on the file. Direct I/O bypasses the page cache.                                                                                                           |
| direct_write | Number of direct write operations performed on the file. Like direct read, it skips the page cache and writes data directly from user space to storage.                                             |
| buffer_write | Number of buffered write operations performed on the file. Data is first written to the page cache and later flushed to disk asynchronously by the kernel.                                         |
| delete    | Indicates whether the file has been unlinked (deleted). If True, the file was removed from the directory but may still be accessible if held open by a process. I/O on such files can indicate resource leaks or long-running file handles. |

This tool is intended for ext4 filesystem developers and performance engineers who need to analyze I/O behavior at the file level.

Was there a specific reason for separating them into two maps?
	- ext4file tracks the deletion of files. We envision that users need to frequently create and delete files, and the file descriptor (fd) resources in the kernel are limited, so fd reuse may occur. In this case, an fd may not represent a specific file. Based on the idea, we believe that the existing method of uniquely representing a file is more reasonable.
@niebowen666
Copy link
Copy Markdown
Author

Please add ext4file on .gitignore file.

Do you mean I should add the compiled binary file ext4file (not the source files) to .gitignore? Or something else

@Bojun-Seo
Copy link
Copy Markdown
Contributor

Please add ext4file on .gitignore file.

Do you mean I should add the compiled binary file ext4file (not the source files) to .gitignore? Or something else

I mean, compiled binary file(ext4file) should be added on .gitignore file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants