How to convert an N-Triples file into Jelly in Python with fixed memory allocation? #97
-
|
Hi there, I do use Python and rdflib to work with RDF data. One of my challenges is to process an RDF data file (usually in N-Triples or N-Quads) in a streaming fashion, so that the memory consumption is fixed. I was hoping to be able to do it with the Jelly format, but the blocker is that to convert from N-Triples to Jelly, it have to be loaded to the memory :( I looked through the docs, but I could not find a solution to that. Could such a challenge be addressed with Jelly? If not now, then potentially in the future. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Hi Maksim! Thank you for your interest in the project. It is possible for pyjelly to write a Jelly file in a fully streaming fashion, an example of that is here, section "Serializing a stream of statements". But, I don't think that RDFLib's N-Triples parser can actually return a stream of triples, so yeah, that would be a blocker. One option is that you could try to modify the RDFLib N-Triples parser to make it return an iterator of triples... Unfortunately this is a limitation of RDFLib so we can't do much about it. Alternatively, you could try working with pyjelly's RDFLib-less integration – there is an example of that here, section "Serializing a stream of statements". We have an unofficial N-Triples/N-Quads parser that works with that, but it's only intended for tests. The code is here, but it's not something we support officially and we did not test it with W3C conformance tests. Finally, you can also try to use jelly-cli from the command-line, to convert huge files in a streaming manner, much faster than possible in Python. This tool uses limited memory and we used it to convert things like Wikidata and OpenStreetMap without any problems. If at all possible, I strongly suggest using jelly-cli, it's very easy to use and is also very fast. I'm not sure if any of these options are useful to you – please let me know if you have further questions or feature suggestions! :) |
Beta Was this translation helpful? Give feedback.
Hi Maksim! Thank you for your interest in the project.
It is possible for pyjelly to write a Jelly file in a fully streaming fashion, an example of that is here, section "Serializing a stream of statements". But, I don't think that RDFLib's N-Triples parser can actually return a stream of triples, so yeah, that would be a blocker. One option is that you could try to modify the RDFLib N-Triples parser to make it return an iterator of triples... Unfortunately this is a limitation of RDFLib so we can't do much about it.
Alternatively, you could try working with pyjelly's RDFLib-less integration – there is an example of that here, section "Serializing a stream of statements". We have an unoff…