Skip to content
Scot Breitenfeld edited this page Jun 26, 2025 · 109 revisions

Meeting Notes of 2025

💻 Zoom link: https://us06web.zoom.us/j/89601195963

📆 Meeting calendar invite.

Note

🎥 Please note that by joining and participating in these Working Group meetings, you acknowledge that your name will be visible to other attendees in the Zoom session, and this participation will be considered a public record. Furthermore, your verbal or written contributions may be included in the publicly accessible meeting notes and summary.

Please provide time estimates for each agenda item.

Agenda items must be added at least 48 hours prior to the meeting.


2025-07-10

  • Facilitator/time-keeper: Scot Breitenfeld
  • Note-taker/Editor: AI/Scot Breitenfeld

Old Action Items

  • Aleksandar will set up a filter working subgroup to discuss further next steps; Quincey was interested in contributing.
  • Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework. (Quincey)

Agenda

Minutes

Action Items

  • []
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-06-26

  • Facilitator/time-keeper: Scot Breitenfeld
  • Note-taker/Editor: AI/Scot Breitenfeld

Old Action Items

  • Aleksandar will set up a filter working subgroup to discuss further next steps; Quincey was interested in contributing.
  • Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework. (Quincey)

Agenda (✋FINALIZED)

Minutes

Quick recap

The team discussed updates on an HDF5 accelerator project focused on optimizing GPU usage and data transfer efficiency, with plans for further improvements and collaboration on chunk cache development. They explored challenges related to type conversion, data storage formats, and metadata handling while considering various technical solutions and standards for different storage systems. The group also discussed potential changes to the HDF5 library version defaults and checksum algorithms, with plans to continue these discussions in future meetings.

Summary

HDF5 GPU Accelerator Updates

Quincy presented updates on his accelerator for HDF5, focusing on optimizing GPU usage to reduce CPU workload. He explained the current design, where data transfers directly into GPU buffers, with compression and type conversion kernels running on the GPU. Quincy outlined plans to improve the pipeline and discussed potential changes to the HDF5 library in consultation with Neil's work on the chunk cache.

VFD Interface and Memory Optimization

Quincey discussed the need to revise the VFD interface to make it more pluggable and to address memory allocation issues for buffers on accelerators. Dana suggested that POSIX support would be sufficient for most users, while Scot raised questions about the need, given that CXL is in development. Quincey proposed a new type of filter for data movement. He discussed the challenges of type conversion on GPUs, emphasizing the need for flexibility in composing pipelines for AI and ML codes. Neil mentioned ongoing work on the shared chunk cache, and Quincey planned to collaborate with an intern on this in the coming months.

Advancing Data Storage and Access

The team discussed challenges with type conversion and data efficiency, with Dana expressing concerns about making these systems accessible to average scientists. They explored the potential of Compute Express Link (CXL) as a future standard. Quincey presented on sharded storage for HDF5, emphasizing the need to maintain compatibility with existing APIs while implementing a well-documented format that separates metadata from raw data. The team discussed moving away from some novel features to reduce maintenance burden, with Quincey expressing a long-term goal of making the new sharded connector the default for both local and cloud storage.

Pluggable Data Storage Connector Design

Quincey and Aleksandar discussed the design of a connector for handling data storage and metadata, emphasizing the need for a pluggable interface to accommodate various storage solutions. They highlighted the importance of meeting vendor requirements within a year or two to remain competitive, as current cloud capabilities have not met expectations for the past decade. Aleksandar noted the shift towards Zarr-based data storage in Earth sciences.

Efficient Metadata Management Strategies

Quincey discussed the need for a metadata container to manage HDF5 files and perform queries efficiently. He proposed using SQLite for metadata storage, emphasizing the importance of a single I/O operation to retrieve all necessary information. Aleksandar mentioned the trade-offs between JSON and other formats, while Dana highlighted the growing interest in local object stores for efficiency in machine learning applications. Quincey also touched on advancements in object storage, featuring sub-microsecond access times, which suggests a shift towards more efficient data handling.

Storage Performance Optimization Discussion

Quincey presented performance data comparing different storage systems, noting that Zarr's local object store performance was close to POSIX but still an order of magnitude slower. He suggested that aligning Zarr's orange curve with their green curve could improve performance, potentially achieving results 5x better than current levels.

Checksum Algorithms Implementation Discussion

The team discussed implementing alternative checksum algorithms in HDF5 files, with Mark proposing to add support for CRC-32 and Fletcher-32 in the superblock to increase data transparency and enable more flexible data writing. Quincey and Neil raised security concerns about allowing users to choose checksum algorithms, particularly regarding the superblock's bootstrap process. At the same time, Neil suggested using CRC as the default algorithm when using the latest library version. The team agreed that while implementing multiple checksum options would make the format more complex, it could be explored as an enhancement, with the implementation details to be determined in future discussions.

HDF5 Default Version Update Discussion

The group discussed changing the default lower bound for HDF5 library versions from 1.6 to 1.8, which would improve compatibility with newer file formats and performance. Quincey and Neil agreed that 1.8 has been available for a long time, providing enough time for users to upgrade. Mark pointed out that this could signal that HDF5 2.0 is an important milestone, but he suggested that users could still save files in compatible formats if needed.

Action Items

  • []
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-06-12

  • Facilitator/time-keeper: Gerd Heber
  • Note-taker/Editor: AI/Gerd Heber

Old Action Items

  • Aleksandar will set up a filter working subgroup to discuss further next steps; Quincey was interested in contributing.
  • Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework. (Quincey)

Agenda

Minutes

Quick recap

Neil presented improvements to the virtual data set feature, focusing on optimizing storage and memory usage through various file format changes and string handling strategies. The team concluded by discussing memory usage issues, chunk caching configurations, and the transition to CMake while also addressing concerns about Zlib integration and cross-compilation.

Summary

HUG'25 Recordings and Action Updates The meeting began with Gerd welcoming attendees and mentioning that HUG'25 recordings are now available on YouTube. He thanked the Organizing Committee, sponsors, and hosts in Hamburg for their efforts in making the event successful. Gerd then addressed two old action items: setting up a filter working subgroup and discussing proposals to accelerate native storage sharding. Quincy mentioned he would update the agenda regularly to avoid missing deadlines. The conversation ended with Neil preparing to present an RC on an improvement he is working on.

Optimizing Virtual Data Set Storage

Neil discussed improvements to the virtual data set feature, focusing on optimizing storage and memory usage by reducing redundant string data. He proposed adding a flags parameter to the file format, which would allow sharing of common source files and dataset names across multiple mappings. This change would reduce the amount of repeated string data stored in memory and on disk, potentially leading to performance improvements. Neil outlined the specific file format changes, including the use of indices to reference shared file names, and explained how the new system would work in practice. Shared String Optimization Strategies Neil discussed optimization strategies for handling shared strings in both on-disk and in-memory formats. He proposed using a hash table to detect shared strings during file reading, which would help reduce memory allocation costs. Neil also suggested simplifying the memory format by using null pointers to indicate shared files, which could further reduce the time required for copying. He clarified that reference counting would likely not be necessary for shared buffers, as mappings are only freed when all mappings are freed, and shared buffers are never freed.

Virtual Data Sets and String Storage

The team discussed the implementation of virtual data sets and string storage in the global heap. Quincey suggested using the global heap's reference counting for string storage, but Neil pointed out potential I/O overhead for unique strings. Elena raised questions about string encoding and proposed considering UTF8 for security and efficiency. The group also discussed the relationship between virtual data sets and sparse design, with Elena and Gerd agreeing to take that discussion offline. Neil emphasized that virtual data sets are fundamentally metadata, not raw data, and suggested exploring compression options if they were stored differently.

Enhancing Cache Size for Users

The team discussed increasing the default chunk cache size from 1 MB to 8 MB to address issues with HDF5 library users who cannot configure advanced settings in their software. Elena suggested adding configuration file support to override defaults, while Neil and Quincey proposed using environment variables as an intermediate solution. The group agreed to implement the 8MB change as an interim step, with plans to explore configuration file options and error-handling mechanisms in future development. Chunk Caching Memory Optimization Discussion The team discussed memory usage issues with chunk caching, agreeing to set the default to 8MB while making it configurable for future versions. Elena emphasized the need to alert users about increased memory consumption when working with chunked datasets, particularly for AI users. Aleksandar raised concerns about the complexity of the software stack and the need to balance performance improvements with user experience.

HDF5 Zlib Linking Requirements Discussion

The team discussed the build process and linking requirements for HDF5 with Zlib. Elena and Quincey clarified that while Zlib can be statically linked into the HDF5 binary, users still need to link their applications with the Zlib library at build time. Aleksandar shared that Allen had told him the library could operate without requiring Zlib after build, but the team determined this was not the case for Zlib-ng. The discussion concluded with Gerd requesting benchmark comparisons for Zlib and Zlib-ng, which Aleksandar mentioned were available in the Zlib-ng GitHub repository and an AWS blog post from a few years ago.

CMake Transition and Distribution Impact

The team discussed the transition to CMake and its impact on various distributions, including Conda. Aleksandar explained that they would recommend switching to CMake and potentially changing the default to Zlib-ng. He also mentioned speaking with the Conda community about the changes. Elena brought up a forum message about broken cross-compilation, which Gerd confirmed was addressed. The group agreed to discuss CRC 32 in the next meeting, with Quincey planning to add some items after completing his current task.

Action Items

  • []
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-05-29 ❌ CANCELED

• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-05-15 ❌ CANCELED

• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-05-01

  • Facilitator/time-keeper: Gerd Heber
  • Note-taker/Editor: AI/Gerd Heber

Old Action Items

  • A Manifesto for the Future of HDF document will be discussed in a follow-up meeting (Gerd).
  • Aleksandar will set up a filter working subgroup to discuss further next steps; Quincey was interested in contributing.
  • Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework. (Quincey)

Agenda

Minutes

Quick recap The team discussed using a Github repository to capture enhancement proposals and the need for a detailed report card to track unsupported features. The main agenda item was a discussion led by Neil on approaches and decisions concerning the RFC, covering fine-grain capability reporting by HDF5 VOL Connectors.

Summary HDF Alliance Github Repository Development Gerd presented the HDF Alliance's GitHub repository, which uses the Mist framework to capture enhancement proposals. The repository is still being developed and used as a guinea pig for the HDF manifesto. Gerd encouraged everyone to contribute to the repository and improve the manifesto. Quincey expressed interest in contributing but needed more information on proposing a new HDF proposal. Alexander explained that each proposal has its folder, and the editor can use Markdown. He also mentioned that he could write up the mechanics of how to do things. Quincey planned to fork the repository and start with the sharded storage and accelerator I/O. Scot suggested that contributors should be added as collaborators to the repository to avoid the need for forking.

Unsupported Features Tracking and Reporting Neil discussed the need for a detailed report card to track unsupported features and suggested using a text string to provide reasons for unsupported callbacks. Quincy agreed to find a way to have a connector tell why a feature was not supported.

VOL Connector Failure Handling Options Quincey discussed the potential for the vault connector to return a typical fail value while retaining some state internally. He suggested that the HDF5 library could then call back into the connector to determine if the failure was due to an unsupported operation or a regular failure. Quincey also proposed extending the capability flags for VOL connectors, but expressed concerns about its scalability. He ranked different options for handling connector failures, with returning values and the magic return value being his top choices. Quincey also suggested revising all callbacks to return a routine and proposed a callback approach for the public API. He concluded by discussing the possibility of suppressing error stacks for unsupported operations and updating the Async stuff in a backward-compatible way. Error Stack Suppression Discussion Quincey, Jordan, and Neil discussed suppressing error stacks in their system. Jordan expressed concern about losing error information with the pause mechanism, but Quincey suggested retaining it all along the way and only suppressing the call. Neil suggested printing the error stack by default, while Quincey leaned towards not printing it by default. They also discussed the API level, with Quincey preferring a return value approach and Neil agreeing with the preference. Robinson pointed out known problems with concurrency in the global variable approach.

RFC Return Values and Data Types The team agreed to rewrite the return values approach in the RFC to list the preferred approach. They also discussed renaming data types, particularly for floating point types, and removing Endianess from the scheme. The team decided to handle these changes and move forward with them.

Action Items

  • [] None
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-04-17

  • Facilitator/time-keeper: Scot Breitenfeld
  • Note-taker/Editor: AI/Scot Breitenfeld

Old Action Items

  • A Manifesto for the Future of HDF document will be discussed in a follow-up meeting (Gerd).
  • Aleksandar will set up a filter working subgroup to discuss further next steps; Quincey was interested in contributing.
  • Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework. (Quincey)

Agenda

Minutes

Quick recap

The meeting covered discussions on various technical aspects of HDF5 VOL connectors, including handling unsupported HDF5 operations, error reporting, and compatibility issues. The team explored different approaches to improve automated VOL testing and enhance sustainability while considering user-friendliness and standardized testing across connectors. Additionally, the group discussed naming conventions for new data types.

Summary

Adding Unsupported Error Codes to Vol Connector, RFC review

Neil discussed the need for a new return value in the VOL Connector to indicate an unsupported status. He proposed adding a new return value to the existing integer, herr_t, which currently uses 0 for success and -1 for failure. Neil suggested using -2 to indicate an unsupported error and potentially add other error codes. He also considered changing the function signature or adding a special pointer to indicate an unsupported error. Neil emphasized the importance of avoiding breaking existing code and considering the compatibility of current VOL Connectors.

Neil and Quincey discussed the need for thread safety in their system, particularly regarding pass-throughs. They agreed that the pass-throughs should not be affected and that they must ensure the state is correctly passed up the stack. They discussed adding a new parameter to all callbacks to include an error code. They also considered requiring VOL connectors to use the public HDF5 interface to create an error stack and parse it to look for a specific code or string. Quincey suggested a sixth approach where the VOL connector issues a normal error but retains the internal state so the library can call back and determine if the failure was due to unsupported cases.

Quincey, Neil, Elena, and Scot discussed the system's need for stateful and thread-safe mechanisms. They also addressed the issue of error reporting to applications, with Elena suggesting that real error messages should be provided to users. Neil expressed concerns about requiring VOL connectors to use HDF5 error scheme and the difficulty of reporting unsupported features without printing a lot of output. Scot suggested that applications still have the option to use the error stack in addition to an error return value. The team also discussed the potential need for best practices for writing VOL connectors and the importance of error reporting.

Volume Connector Compatibility and Error Reporting

Neil mentioned the need for error reporting in the VOL connector and the possibility of asynchronous failures. Neil raised a potential compatibility issue with existing applications that may not check for return values correctly, which could lead to problems with applications using VOL connectors.

Handling Unsupported Operations in HDF5

The group discusses various approaches to handling unsupported operations in HDF5 VOL connectors. Neil presents several options, including adding new API calls, modifying existing functions, or using special error codes. He suggests that the option that adds a new function might be the best approach. Elena proposes redirecting unsupported operations to the native library, but Quincey and Neil explain that this isn't always possible due to some VOL connectors. The discussion considers error stack printing and handling asynchronous operations.

Neil proposes implementing a system to handle unsupported operations in VOL connectors, primarily to improve automated VOL testing. The motivation is to enhance sustainability and give users confidence in VOL connectors' capabilities. Jordan explains that this approach addresses issues with connectors like DAOS, which have unique features unsupported in HDF5. The team discusses the challenges of implementing this system, including covering all edge cases with capability flags. They consider options such as dynamic determination of supported features, expanding capability flags, and creating a "report card" for VOL connectors. The discussion also discusses the need to balance user-friendliness with technical feasibility and the importance of standardized testing across different VOL connectors. The group discusses the next steps for the RFC proposal. They decided to continue the discussion on a forum post created by Neil and discuss it in the next meeting.

Discussing Naming Conventions

Jordan mentions creating a forum post to gather feedback on naming conventions and data types for new types. The group debates the merits of including predefined big-endian types for new data types and discusses potential naming conventions. They agreed to continue the discussion on the forum if needed and aim to resolve the naming convention issue soon.

Action Items

  • [] None
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-04-03

  • Facilitator/time-keeper: Gerd Heber
  • Note-taker/Editor: AI/Scot

Old action items

  • Scot will check whether collaborators can create branches within the HDF Group organization on GitHub.

    ✅ Collaborators can create branches.

  • The HDF Group will present its vision for community collaboration at the next meeting.

    ❎ A draft document is under review and will be presented in a follow-up meeting.

  • The HDF Group will provide an update on the HEP (HDF5 Enhancement Proposal) infrastructure at the next meeting.

    ✅ Added to agenda

  • Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework.

Agenda

  • Review Meeting Etiquette (Gerd, 4min)

  • Meeting etiquette - key points

    • Many of us attend plenty of ineffective meetings. Let this not be one of them!
    • Read & improve the etiquette! Don't like something? => Come back next time & discuss!
    • The default facilitator is The HDF Group's Sustaining Engineer of the Week, but volunteers are always welcome. Just pencil in your name!
    • Be present and respectful: (ChatGPT's impression)
    • ✋ Raise your hand to speak, and the facilitator will call on people in the order in which they raise their hands, but may alter that based on who has not spoken recently or to follow a thread.
    • Practice makes perfect. Let's try this!
  • Discussion/Resolution on non-standard naming conventions for floating point data types. See PR for the motivation behind the discussion. (Jordan, 14min)

  • Future role for HDFGroup/hdf5_plugins filter plugin repository (@ajelenak, 19 min)

  • Quick preview of initial HDF5 Extensions Proposal (HEP) framework (@ajelenak, 9 min)

  • Review Monte Carlo testing of H5FL package, in [PR] (https://github.com/HDFGroup/hdf5/pull/5195). (Quincey, 14 min)

Minutes

Quick recap

The meeting covered various topics, including meeting etiquette and naming schemes in HDF5 data types. There were also discussions about managing HDF5 plugins, filters, and repositories, focusing on improving accessibility and maintenance across different platforms. Finally, the group discussed new approaches for publishing enhancement proposals and implementing thread-safe mechanisms for memory allocation.

Summary

Improving Meeting Etiquette and Facilitation

Gerd was the facilitator of the meeting. He emphasized the importance of meeting etiquette and encouraged everyone to review and improve it. Gerd also encouraged volunteers to take on the role of facilitator to gain experience.

New Naming Scheme for Floating Point Formats

Jordan proposes a new naming scheme for predefined data types in the library, focusing on floating point formats used in machine learning. The proposal includes adding a leading type class specifier to identify the data type immediately. Quincey suggests refining the scheme to include type class, qualifier, endianness, and size. The group discusses the challenges of naming non-standard types and the potential need to deprecate them in the future. They consider whether to seek broader input on the forum but also express concerns about diluting decision-making.

Future of HDF5 Plugins

Aleksandar then spoke about the future role of the HDF Group's HDF5 filter plugins repository, and the need for better management. He expressed concerns about the accessibility of HDF5 filters and the need for more proactive roles in providing these filters. Aleksandar also mentioned the issues raised by the ZFP filter developers were similar.

Aleksandar discussed the role of the HDF Group and the ecosystem of filters. Scot suggested that the GitHub repository should use submodules rather than copies. Elena reminded the group of the community's desire for the most useful filters to be built into the library. Allen clarified that some of the filter plugins do not have separate repositories for filter plugins. Aleksandar raised the issue of the relationship between the filter and the filter plugin, and Allen confirmed that the identifiers are for the filter plugin, not the compression filter.

Aleksandar discussed the challenges of maintaining and building libraries for various platforms, highlighting the exhaustion of volunteers in package repositories. He suggested that the repository should include plugins only if their maintainers are willing to fix any issue discovered by HDFG's testing across various compilers and platforms. He also proposed that the repository managers should not be solely responsible for fixing plugin issues. Elena agreed with Aleksandar's points and suggested that simplifying the process could be beneficial. Quincey expressed interest in helping with the issue and suggested setting up a subgroup to address it.

The meeting discussed issues surrounding the repository, including what should be included as a submodule and who is responsible for fixing issues on certain compilers and platforms. They also discussed the need to make the repository more accessible to its user base, particularly for the Python ecosystem of data science. Elena suggested defining the purpose of the repository first before making decisions. Allen proposed including the repository in the HDF5 build process instead of building from it separately. The team also discussed the possibility of creating a CMake preset for Conda Forge to simplify the process.

New Approach for Publishing Enhancement Proposals

Aleksandar presented a new approach for publishing HEP, focusing on web publishing rather than PDFs or Word files. He introduced the technology called MyST, which is based on the Markdown text markup format which is easy for people to adopt. The goal is to enable high-quality web-published proposals. He also mentioned that the technology comes from the Jupiter book publishing community, which aims to make Jupyter notebooks a first-class scientific publishing format.

Proposal Management

Aleksandar presented a framework for managing proposals, which he believes is user-friendly and doesn't require complex technical skills. He suggested hosting the proposals on Github Pages for easy access. Elena expressed concerns about creating barriers for users, indicating that proposals should be made public and easily commentable.

Thread-Safe Memory Allocation Mechanism

Quincey discussed implementing a thread-safe mechanism for memory allocation and deallocation, using a free list to reallocate similar-sized memory quickly. He sought feedback on his approach, which involves generating test vectors of operations and executing them a million times to identify failures. Jordan suggested that exhaustive testing is becoming more feasible with modern computing power. Neil and Gerd provided additional insights and suggestions. The team agreed to continue refining the testing approach.

Action Items

  • A Manifesto for the Future of HDF document will be presented in a follow-up meeting (Gerd).
  • Aleksandar will set up a filter working subgroup to discuss further next steps; Quicney was interested in contributing.
• —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– • ·· • —– ٠ ✤ ٠ —–· · • —– ٠ ✤ ٠ —– •

2025-03-20

  • Facilitator/time-keeper: Neil Fortner
  • Note-taker/recorder: Scot Breitenfeld/AI

Old action items

(Presumably, none.)

Agenda

  • Changes to the agenda?
  • Ideas for how to enable community collaboration (Quincey,20 min)
    • Branch management, technical discussions, etc.
    • Possibly create Github org, with HEPs and other collaborations
  • NVIDIA roadmap collaboration opportunities (Quincey, 40 min)
    • Accelerator-enabled I/O operations
    • Sharded storage

Minutes

Quick Recap

The HDF5 Working Group meeting focused on the potential for establishing a separate organization for community discussions and collaborations and the need for increased community involvement in two upcoming NVIDIA projects. The meeting also explored the current CPU-GPU node architecture, emphasizing the importance of concentrating on the accelerator components and the potential of implementing an actual metadata database for quicker operations. Additionally, the group discussed the design of the HDF5 architecture, the advantages of the Zarr format over HDF5, and two technical proposals for GPU accelerators: storage and sharded storage.

Summary

HDF5 Working Group Meeting Agenda
Neil, the facilitator, shared the agenda and invited any additions. Scot informed the group about two methods for receiving meeting cancellation notices: via the mailing list and calendar invites. He recommended using these automated methods instead of relying on forum posts for updates.

Neutral Platform for Community Discussions
Quincey proposed creating a separate, broader, and independent organization for community discussions and collaborations, suggesting it could serve as a neutral space not explicitly organized by the HDF Group. He highlighted that this could benefit Nvidia and AMD collaborations, managing branches, and reviewing documents. Steve questioned the necessity of this organization apart from the HDF Group, while Gerd raised concerns about its scope and neutrality. Quincey emphasized the need for a discussion platform that could operate independently from HDF Group meetings. Gerd suggested that the HDF Group could be a neutral platform for such discussions.

Revenue Generation for HDF Group
Scot and Steve discussed the importance of generating revenue for the HDF Group, with Steve expressing concern about the time and resources spent on initiatives that do not produce revenue. Scot emphasized the need for a stable outlook on HDF5 and their services and the potential for increased revenue through outreach. Quincey suggested that the HDF Group allow collaborators to create branches, which Scot agreed to investigate. The conversation concluded with Quincey seeking clarity on Steve's comments regarding the HDF Group's responsibilities.

Increased Community Involvement in Projects
Quincey discussed the need for greater community involvement in two projects: one focusing on accelerator native storage and the other on sharded storage. He desired increased participation from management and other community members in these projects. Quincey mentioned that he would begin inviting people to participate if necessary, and he planned to wrap up multi-threading discussions in the next couple of weeks.

New GPU Architecture for Data Transfer
Quincey spoke about the current CPU-GPU node architecture, where data is cached in CPU memory before being transferred to the GPU. He proposed a new architecture where the GPU would handle most operations, with data transferred in and out more efficiently. Quincey emphasized the importance of establishing a vendor-neutral mechanism for GPU-related tasks and encouraged participation from others. He also suggested integrating this architecture with MPI for collective I/O operations, proposing that type conversion could be incorporated into the new I/O pipeline.

Improving HDF5 Compatibility and Design
Quincey stressed the need to focus on the accelerator components and enhance the techniques introduced by Zarr to ensure HDF5 compatibility with POSIX and object stores. He proposed a design that includes a directory resembling a container, sharding out the dataset storage and utilizing databases for metadata management. He encouraged feedback from a variety of stakeholders to enhance the design. Aleksandar questioned how this approach differs from Zarr and the existing HDF5 schema. Quincey clarified that there are indeed distinctions.

Metadata Database for Faster Operations
Quincey discussed using an actual metadata database for faster operations, which could provide an advantage over Zarr. Aleksandar agreed with Quincey’s points but emphasized the importance of understanding and accessibility for scientists. He expressed concern about HDF5's complexity and the lack of alternatives, stating that these factors should be considered in their decision-making processes.

GPU Memory Storage and Compatibility During the meeting, Joe Lee and Quincey discussed the potential for storing key-value pairs in GPU memory. Quincey confirmed this is feasible but stressed the importance of an abstract and pluggable interface. They also discussed using Nvidia's compression algorithm, nvCOMP. Joe Lee asked about the openness of the instruction set for the H200 chip, to which Quincey admitted he did not know the answer. Furthermore, they discussed the need for a vendor-neutral interface between HDF5 and Nvidia GPUs, as well as the use of NVIDIA GPUDirect® Storage (GDS) APIs to communicate with them.

HDF5 Architecture and Acceleration Discussion Quincey presented the design of the HDF5 architecture, emphasizing the significance of source and destination data buffers on accelerators. He proposed a vendor-neutral approach to enhance performance and suggested collaborating with the HDF5 GPU VFD. Joe Lee inquired about benchmarking HDF5 GPU VFD against other I/O libraries using an AI application that Nvidia can showcase at GTC 2026; Quincey responded that the key metric is the acceleration of I/O for HDF5-based applications. He noted that his components demonstrate improvements, although the final product is not yet built. Gerd sought clarification regarding Alexander's remark about scientists understanding storage concepts, and Alexander explained that these individuals are early adopters of storage software.

Zarr’s Advantages Over HDF5 Aleksandar outlined the advantages of the Zarr format compared to HDF5, highlighting its simplicity and ease of implementation. He pointed out that scientists have adopted Zarr's direct implementation in various programming languages. Aleksandar also mentioned that Zarr is now adding features, such as chunks in a file, which were previously lacking. Quincey proposed that the interface for interacting with the metadata database should enable interaction with a JSON plain text metadata file, which could serve as another plugin for the metadata.

On-Node Storage and Sharded Proposals Quincey introduced two technical proposals regarding GPU storage and sharded storage, seeking interest from participating organizations. Aleksandar expressed interest in the sharded proposal, while Neil indicated an interest in both proposals but highlighted funding constraints. Steve from Lifeboat raised concerns about aligning community and commercial interests. Quincey plans to begin sketching designs for the proposals but mentioned that thread safety work is currently consuming his time. The group agreed to reconvene in two weeks to continue the discussion.

Action Items

  • [] Scot will check whether collaborators can create branches within the HDF Group organization on GitHub.
  • [] The HDF Group will present its vision for community collaboration at the next meeting.
  • [] The HDF Group will provide an update on the HEP (HDF5 Enhancement Proposal) process at the next meeting.
  • [] Quincey will initiate discussions about the accelerator native storage and sharded storage proposals in the forum within the next month, in light of a more formal HEP framework.
Clone this wiki locally