Thanks to the authors for the impressive work!
I'm wondering how the visualization results of the LangSplat 3D features presented in the paper were achieved. When I directly decompress the LangSplat 3D features into 512-dimensional vectors and match them with textual feature vectors, the results are quite poor, unlike the reasonably good results shown in the paper. Could the authors kindly clarify how this was implemented? Any advice would be greatly appreciated. Thanks!

Thanks to the authors for the impressive work!
I'm wondering how the visualization results of the LangSplat 3D features presented in the paper were achieved. When I directly decompress the LangSplat 3D features into 512-dimensional vectors and match them with textual feature vectors, the results are quite poor, unlike the reasonably good results shown in the paper. Could the authors kindly clarify how this was implemented? Any advice would be greatly appreciated. Thanks!