Replies: 2 comments 6 replies
-
|
It feels more natural to me to design SFA so that the library itself can accept an OpenDAL operator (rather than a Path). This way, all users of SFA can read directly from S3. Is |
Beta Was this translation helpful? Give feedback.
-
|
I found OpenDAL recently when I was working with ZIP files, in an efficient way (from a streaming/partial reading perspective, not full decompression), and it was very adherent to my use case (as the original library I was working on was doing HTTP buffered readings and other strategies that OpenDAL already considers). So I thought OpenDAL could make a move towards this "archive" scenario, as it already has very nice examples of opening files from multiple protocols and serving as an object store, for instance... or even the "ofs" project. In that way, archive files could be read, generally, as "virtual file systems" somehow. When I open a ZIP file, what I'm doing, in practice, is listing files like I would do when I open an S3 bucket. Even Windows (which I haven't touched for a while, by the way) has some kind of "folder-like" integration in its iconic "file explorer". In terms of API, a ZIP file could be "opened" in OpenDAL as any other operator. The particular case here is that it can be thought of as "an operator inside another operator", as I may have ZIP files inside S3 buckets, or served directly via HTTP or even the filesystem. This is the only "challenge" that I see in terms of how OpenDAL could handle this kind of scenario. Example of how this could be represented as URIs: # 1) as a "path", directly (should be parsed considering the extension, which is not great)
s3://<bucket>/folder-a/file.zip/inner-file-1.txt
# 2) same as above, but with a "hint" from protocol, as used in Git and others
s3+zip://<bucket>/folder-a/file.zip/inner-file-1.txtThere are other more complex scenarios, like nested "VFS": s3://<bucket>/data.iso/file-a.zip/file-b.zip/inner-file-a.7z/target-file.txt
# considering "hints" in protocol (yeah, just to be a little bit crazy here)
s3+iso+zip+7z://<bucket>/data.iso/file-a.zip/file-b.zip/inner-file-a.7z/target-file.txtThis is taking it to the limit and considering some edge cases (which aren't impossible anyway). Thinking from an architectural perspective, it seems to me that this would make some kind of "processor" entity emerge, that could be enabled when needed, in options/protocol-hints or as a layer (e.g. a middleware that can intercept processing, taking only the part of the URI that it has interest in, like Well... this is just a big "brainstorm" here, just to say that I would think that OpenDAL would be even more incredible supporting this kind of scenario, as it just rounds out what it currently supports. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm interested in making fjall-rs/sfa accessible through opendal. It's conceptually similar to a zip file but less complicated - the "file entries" (sections) are basically concatenated, uncompressed, and then there's a footer with their keys along with an offset and length. I may also be interested in zip files themselves.
I'd like to be able to read these remotely, e.g. an SFA file on S3. I think that means I should be working on a Layer rather than an Operator, although that might break some assumptions about getting AccessorInfo from the Layer.
I'd intend initially to make this accessor read-only and store the index in memory, probably on first read (so put it behind an
RwLock<Option<...>>).Does this sound like the right track?
Mailing list thread: https://lists.apache.org/thread/4547mv3g2bgp2dv7ykh4zf437skmhwr7
Beta Was this translation helpful? Give feedback.
All reactions