Skip to content

Add intan reader for concatenated files #4070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

h-mayorquin
Copy link
Collaborator

This is yet another modality of intan where files are partitioned by the acquisition system. This reader is a convenience for the users so they can readily load their data as it is.

@h-mayorquin h-mayorquin requested a review from zm711 July 17, 2025 06:05
@h-mayorquin h-mayorquin marked this pull request as ready for review July 17, 2025 06:37
@@ -200,5 +201,6 @@
"read_binary", # convenience function for binary formats
"read_zarr",
"read_neuroscope", # convenience function for neuroscope
"read_intan_segmented", # convenience function for segmented intan files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Can you add this to the docs extracors and API as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done.

@alejoe91 alejoe91 added the extractors Related to extractors module label Jul 17, 2025
@zm711
Copy link
Member

zm711 commented Jul 17, 2025

I think the PR is in good shape but I have a few motivation questions.

  1. Why a separate extractor? Could we add a mode into the normal Intan?
  2. Segment is loaded in spikeinterface. What this is actually is a concatenated version so maybe we should think about the naming
  3. why not give the user the option to append or concatenate? Maybe they did the splits on purpose (ie they did baseline, experiment, post time period. So they would prefer to append instead. Why should we create a convenience function for only one of the two options and then tell the user if you want to do the other then do it yourself.
  4. I could imagine other readers also have these format styles (though I don't have an example off the top of my head) so why preference Intan (other than you and I work on it a lot)
  5. Why not do a dir mode at the Neo level where you make each file a Neo segment instead? Why here?

@h-mayorquin
Copy link
Collaborator Author

h-mayorquin commented Jul 17, 2025

I think the PR is in good shape but I have a few motivation questions.

  1. Why a separate extractor? Could we add a mode into the normal Intan?

This is my preference to keep the extractor simple. Otherwise, I would need to add logic for which file to select, more arguments a more complicated docstring, etc. This way I highlight that this is an special case by construction and I think makes the implementation simpler.

  1. Segment is loaded in spikeinterface. What this is actually is a concatenated version so maybe we should think about the naming

Yes, open to other naming suggestions.

  1. why not give the user the option to append or concatenate? Maybe they did the splits on purpose (ie they did baseline, experiment, post time period. So they would prefer to append instead. Why should we create a convenience function for only one of the two options and then tell the user if you want to do the other then do it yourself.

I think that the splits are always continuous on Intan, aren't they? The idea is to load this data has the acquisition system produced it. If you have an option that indeed produces the files with gaps in between then I think we should have an option as you suggest.

  1. I could imagine other readers also have these format styles (though I don't have an example off the top of my head) so why preference Intan (other than you and I work on it a lot)

Yes, if other readers also produce segmented files with equivalent logic, there is a clear path for appending (like here by filename), and we have testing data I think we should offer something similar. Nothing special about Intan other than being on my to-do list as I know the format and I have data. (I don't work with Intan so much by the way, I used it in a project at the start of the year and that's it, but I would say I know the format well now).

  1. Why not do a dir mode at the Neo level where you make each file a Neo segment instead? Why here?

This is the quickest way to provide the feature to users. I am not sure how to do it in neo and now that I think about it the implementation in neo is already too complicated for my liking. This is how I feel.

@zm711
Copy link
Member

zm711 commented Jul 17, 2025

I think that the splits are always continuous on Intan, aren't they? The idea is to load this data has the acquisition system produced it. If you have an option that indeed produces the files with gaps in between then I think we should have an option.

True they are continuous in time. But the experimenter could want them to be same segment or separate segment for their own provenance. Here you are forcing them to keep them in one segment. Why? Concatenate vs append is not based on time it is based on wanting them in separate segments for whatever reason you want them separate.

@h-mayorquin
Copy link
Collaborator Author

True they are continuous in time. But the experimenter could want them to be same segment or separate segment for their own provenance. Here you are forcing them to keep them in one segment. Why? Concatenate vs append is not based on time it is based on wanting them in separate segments for whatever reason you want them separate.

I am not forcing them, they still can just load the single segments on intan and use append themselves.

I am not providing a utility for them to do this easily because I come from a different place: The idea for me is that we should have recordings that extract the data in the way that the acquisition system "intended" it and tools for allowing the users to modify that on top of it to whatever representation they like. Loading data as a continuous segment is the former, loading data as different segments is the latter.

@zm711
Copy link
Member

zm711 commented Jul 17, 2025

I am not providing a utility for them to do this easily because I come from a different place: The idea for me is that we should have recordings that extract the data in the way that the acquisition system "intended" it

I think we don't know how acq system "intended" in this case. The software lets the user choose their cutoff length, which means that the software could be allowing the user to intend for this to be individual segments. I think the argument in your favor is that Intan provides a post-hoc stitching software that can make a series of continuous recordings one giant recording (which would be read as a neo mono-segement (ie I don't think there is an initial intention about concatenate vs append, but there is a post-hoc intention for concatenate).

I think we addressed (1), (4), and (5). We disagree about (3) but I'm willing to be overruled. let me think about naming for (2).

stream_name=None,
all_annotations=False,
use_names_as_ids=False,
ignore_integrity_checks: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go through with this I think we have to remove the ignore_integrity_checks. I think this PR would need to ensure that the recordings are truly continuous no? So two step process.

  1. Make sure no individual file is broken
  2. ensure that the timestamp files are actually continuous

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise you're running into your issue with the sampling rates? This PR could be used to stitch together any arbitrary intan files when we should have a way to do this like Intan does it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think it is fine if they want to load some of the files even if they gaps within. I don't see a strong reason to limit user choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One corrupt file could mess up everything right? How would the user know which file was corrupt?

I guess what I was really proposing was shouldn't we check what we are stitching are actually continuous or do you want people to be able to just pile in a bunch of random Intan files and try to stitch them together with this?

@h-mayorquin
Copy link
Collaborator Author

I don't feel that strong about 3, I was just explaining why I decided to do it this way. I am fine with adding that functionality (automatically creating a mega-segmented file) but not on this PR as that implementation would become more complicated and I don't have time to do it at the moment.

@h-mayorquin
Copy link
Collaborator Author

as for the naming, what about read_intan_split or something like that?

recording_list.append(recording)

# Initialize the parent class with the recording list
ConcatenateSegmentRecording.__init__(self, recording_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it just be a boolean check here with an AppendSegmentRecording?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it inherits from the respective class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that. Yeah I was thinking you were just taking the concatenate_recording, for my own learning what is the benefit of doing this way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a class for _kwargs and pickling.

@zm711
Copy link
Member

zm711 commented Jul 17, 2025

as for the naming, what about read_intan_split or something like that?

yeah we could riff on that. like read_split_intan_files, read_split_intan. The issue is the gui software isn't super clear with a naming convention. It is just a setting. So for official intan terminology we have "Traditional" with splits?

image

@h-mayorquin
Copy link
Collaborator Author

Changed the naming.

@h-mayorquin h-mayorquin deleted the add_intan_for_concatenated_files branch July 24, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extractors Related to extractors module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants