NestIO for large datasets #1001
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi!
I was recently working together with @jasperalbers, who is using the NestIO to load simulated data from large scale multi-area models.
His problem was that he has hundreds of thousands of neurons (around 1GB in total), which when saved as
neo.SpikeTrain
objects it would take hours to load from disk (on all HDF5, pickle and nix). Incredible amounts of time were spent building the neo objects themselves. We found a rather unorthodox workaround to this problem, by saving the spikes directly as lists of lists, which brought down the load time to a few seconds.We wrote a couple of extra functions to the
NestIO
to load the spike times as llists of lists, alongside the neuron IDs. This is obviously not ideal from a metadata perspective, but we thought it might still be a useful function to have, especially for agile analysis of large simulated data.Let us know if this functions are any good, if you think they are worth including I can also write some tests.
Best,
Aitor