The code make_cityscapes_buffer.py uses mobilenet_v3_small as feature encoder to construct replay buffer. However, the SLAM process uses depth's image encoder to get the feature. this looks a bit strange. Should I change mobilenet_v3_small to depth's image encoder to get the same results in the paper ?