Data compression for screen data#81
Data compression for screen data#81schewskone wants to merge 38 commits intosensorium-competition:mainfrom
Conversation
…ges and mp4 videos
… for compatibility with allen-exporter and future datasets to avoid duplicate files
Adding small fix for ToTensor issue to decoder PR because seperate PR is unnessesary
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for compressed screen data by extending trials to handle encoded images and videos, updating tests and data generation to work with JPEG/MP4, and adjusting interpolation interfaces and CI.
- Introduce EncodedImageTrial and EncodedVideoTrial with compressed data loaders
- Enhance test utilities and screen interpolation tests for both encoded and raw formats
- Update interpolation API to return only data, adjust downstream consumers, and add FFmpeg to CI
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_sequence_interpolator.py | Update tests for new single-array return value of interpolate |
| tests/test_screen_interpolator.py | Add parameterized tests for encoded vs. raw screen data |
| tests/create_screen_data.py | Extend data generation to save JPEG/MP4 and emit metadata |
| experanto/interpolators.py | Remove valid returns, introduce image_names flag, add encoded trials |
| experanto/experiment.py | Adjust interpolate to drop valid output |
| experanto/datasets.py | Update dataset pipelines to match new return signature |
| configs/default.yaml | Add image_names configuration |
| .github/workflows/test.yml | Install FFmpeg for encoding dependencies |
Comments suppressed due to low confidence (4)
experanto/interpolators.py:152
- The return type annotation still indicates a tuple, but the method now returns only a single array. Update the type hint and docstring to reflect the new signature.
def interpolate(self, times: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
experanto/interpolators.py:425
- [nitpick] Using
formatshadows the built-in Python function. Consider renaming this variable tofile_formatto avoid confusion.
format = metadata.get("file_format")
tests/create_screen_data.py:14
- The new
encodedparameter isn't documented. Please update the function docstring to explain whatencodeddoes and what formats it controls.
def create_screen_data(
tests/create_screen_data.py:118
- The code calls
shutil.rmtreebutshutilis not imported. Addimport shutilat the top of the module.
shutil.rmtree(SCREEN_ROOT)
|
|
||
| frames = np.random.rand(n_frames, *frame_shape).astype(np.float32) | ||
| # Generate frames with values in [0, 255] for better encoding | ||
| frames = (np.random.rand(n_frames, *frame_shape) * 255).astype(np.uint8) |
There was a problem hiding this comment.
@schewskone why is it now np.uint8 and not float32?
There was a problem hiding this comment.
I get not all of the asserts fro valid should be removed here.
@schewskone could you please take a look and plug some of them back, f.e. line 37 and 41-43 probably should stay, same for 71 and 75-77
(I merged it with the current main - check if you disagree with stn)
| file_ext = ".jpg" | ||
| else: | ||
| # Save as numpy array (original behavior) | ||
| np.save(data_dir / f"{i:05d}.npy", frames[i].astype(np.uint8)) |
There was a problem hiding this comment.
@schewskone it used to be np.float32 here in test as well - why have you changed it and does it match the real exported data?
Also, why there are no grayscale .jpeg saved? (e.g. 1 channel ones)
| # Initialize the decoder only once per unique file | ||
| # Assuming ScreenTrial.create or a helper can return just a decoder | ||
| if self.device == "cuda": | ||
| with set_cuda_backend("beta"): |
There was a problem hiding this comment.
@schewskone could you please add a comment here what set_cuda_backend("beta") does? and is it necessary to have it here because moving decoded data from CPU to GPU after interpolating is much slower? have you profiled it?
|
|
||
| # 2. Logic to handle shared video decoders | ||
| decoder_to_use = None | ||
| if file_format in [".mp4", ".avi", ".mov"]: # Add your video formats here |
There was a problem hiding this comment.
@schewskone should we check that a file_format is .npy otherwise and throw a NonImplemented error is not?
| if data_file_name not in shared_decoders: | ||
| # Initialize the decoder only once per unique file | ||
| # Assuming ScreenTrial.create or a helper can return just a decoder | ||
| if self.device == "cuda": |
There was a problem hiding this comment.
what is device cuda:0? I would suggest checking if device contains cuda
| def _initialize_decoder(self, data_file_name): | ||
| decoder = VideoDecoder( | ||
| str(data_file_name), |
There was a problem hiding this comment.
@schewskone why do you need to cast it to str? could it be sth else?
| def _initialize_decoder(self, data_file_name): | |
| decoder = VideoDecoder( | |
| str(data_file_name), | |
| def _initialize_decoder(self, data_file_name: str): | |
| decoder = VideoDecoder( | |
| data_file_name, |
| ) | ||
|
|
||
|
|
||
| class EncodedImageTrial(ScreenTrial): |
There was a problem hiding this comment.
@schewskone where is it actually used? I cannot find it in being called explicitly but I might be missing sth (the only place I see it could be called now is lines 782-783)
| cls = globals()[class_name] | ||
|
|
||
| # Pass shared_decoder only for EncodedVideoTrials | ||
| if cls is EncodedVideoTrial: |
There was a problem hiding this comment.
@schewskone could we please add a small comment here why EncodedImageTrial don't need a shared decoder?
|
|
||
| def get_data_(self) -> np.array: | ||
| """Override base implementation to load compressed images""" | ||
| img = cv2.imread(str(self.data_file_name)) # returns BGR |
There was a problem hiding this comment.
I guess here even if the original .jpg is 1 channel (grayscale) image - cv2.imread would upload 3 duplicated channels. Do we want to catch it somehow, @schewskone (Warnings? or set a flag to cast things to 1 channel if the user knows their jpg are grayscale?)?
| if img is None: | ||
| raise ValueError(f"Could not read image file: {self.data_file_name}") | ||
| # Convert BGR to RGB | ||
| img = img[:, :, [2, 1, 0]] |
There was a problem hiding this comment.
@schewskone why not cv2.cvtColor(img, cv2.COLOR_BGR2RGB)? I'm not 100% sure but quick googling says that cv2 should be faster (both create a contagious copy of the image though)
|
|
||
| def get_data_(self, frame_indices=None) -> np.array: | ||
| """Override base implementation to load compressed videos""" | ||
| # Frame_indices is only ever None for caching purposes, we then decode entire video and cache it |
There was a problem hiding this comment.
should we maybe add an assert then, that if self._cached_data is None then frame_indices should not be None?
| frames = frames[:, [2, 1, 0], ...] | ||
| # Reorder dimensions | ||
| frames = frames.permute(0, 2, 3, 1).contiguous() # T,H,W,C |
There was a problem hiding this comment.
why not to do it together? because it looks like some dimensions end up being moved twice
experanto/interpolators.py
Outdated
| "EncodedVideoTrial requires a shared_decoder to be provided." | ||
| ) | ||
|
|
||
| def get_data(self, frame_indices) -> np.array: |
There was a problem hiding this comment.
| def get_data(self, frame_indices) -> np.array: | |
| def get_data(self, frame_indices) -> np.ndarray: |
| out = out.transpose( | ||
| 0, 3, 1, 2 | ||
| ) # transform into (T, C, H, W) after finishing with Cv2 operation |
There was a problem hiding this comment.
@schewskone should this reordering really be here? because it has not been here before. is it back compatible with the numpy files?
pollytur
left a comment
There was a problem hiding this comment.
please look at the comments - at least test need to plug back valid checks for sure
requirements.txt
Outdated
| # Core ML CPU only version | ||
| torch==2.9.0 | ||
| torchvision==0.24.0 | ||
| torchaudio==2.9.0 |
There was a problem hiding this comment.
@schewskone do we really need torchaudio for this PR? i dont think its used anywhere, is it?
requirements.txt
Outdated
| torch==2.9.0 | ||
| torchvision==0.24.0 |
There was a problem hiding this comment.
do we really need to upgrade torch from 2.7 and torchvision from 2.22?
requirements.txt
Outdated
|
|
||
| hydra-core==1.3.2 | ||
| numpy==2.2.5 | ||
| numpy==2.2.6 |
There was a problem hiding this comment.
Also, does numpy 2.2.6 support python 3.9 or do we need to upgrade to 3.10?
There was a problem hiding this comment.
torch 2.9 and codec 0.9 required 3.10 and above https://github.com/meta-pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec
pollytur
left a comment
There was a problem hiding this comment.
@schewskone I merged main into it again and made sure that the CI/CD passes (not sure why its not displayed in the PR but its here)
Please address the rest of the comments
Implemented support for compressed screen formats
get_data_()methods that support encoded formats.Right now the VideoDecoder decodes the entire video and returns that to the interpolation function. This can be optimized with the help of sliced decoding which would require changes to the interpolation method. @pollytur and I decided we should discuss first and implement it later on if deemed necessary.