You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Brain팀 AI Scientist는 회사의 사업 방향에 따른 연구인 **Strategic Research**와 개인 관심 주제에 따른 자유 연구인 **General Research**를 **일과 중 절반씩 진행**하는 것을 권장하는 등 최적의 연구 환경을 마련해드리기 위해 노력하고 있습니다.
140
+
Brain팀 AI Scientist는 회사의 사업 방향에 따른 연구인 **Strategic Research**와 ~~개인 관심 주제에 따른 자유 연구인 **General Research**를~~(회사 사정에 따라 현재는 중단)**일과 중 절반씩 진행**하는 것을 권장하는 등 최적의 연구 환경을 마련해드리기 위해 노력하고 있습니다.
141
141
142
142
**NeurIPS, ICLR, CVPR, ECCV, Interspeech, ACL 등 학회 참석 및 학회 논문 제출**을 희망하실 경우, 적극적으로 지원해드리고 있습니다! 💸 (현재까지 채택된 논문 현황에 대해서는 **[Publications](/publications)** 탭을 참고해주세요!)
<features.PaperLinkItempaperLink="https://arxiv.org/abs/2503.23730"title="KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language" />
<features.PaperDescriptionpreview="Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. "
22
+
description="These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed performance gains. Using fact verification as a testbed, we analyzed whether the generated documents contained information entailed by ground truth evidence and assessed their impact on performance. Our findings indicate that performance improvements occurred consistently only for claims whose generated documents included sentences entailed by ground truth evidence. This suggests that knowledge leakage may be present in these benchmarks, inflating the perceived performance of LLM-based query expansion methods, particularly in real-world scenarios that require retrieving niche or novel knowledge."/>
<features.PaperTitlepaperLink="https://arxiv.org/abs/2503.23730"title="KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language"/>
<features.PaperDescriptionpreview="The recent emergence of Large Vision-Language Models(VLMs) has resulted in a variety of different benchmarks for evaluating such models. "
29
+
description="Despite this, we observe that most existing evaluation methods suffer from the fact that they either require the model to choose from pre-determined responses, sacrificing open-endedness, or evaluate responses using a judge model, resulting in subjective and unreliable evaluation. In addition, we observe a lack of benchmarks for VLMs in the Korean language, which are necessary as a separate metric from more common English language benchmarks, as the performance of generative language models can differ significantly based on the language being used. Therefore, we present KOFFVQA, a general-purpose free-form visual question answering benchmark in the Korean language for the evaluation of VLMs. Our benchmark consists of 275 carefully crafted questions each paired with an image and grading criteria covering 10 different aspects of VLM performance. The grading criteria eliminate the problem of unreliability by allowing the judge model to grade each response based on a pre-determined set of rules. By defining the evaluation criteria in an objective manner, even a small open-source model can be used to evaluate models on our benchmark reliably. In addition to evaluating a large number of existing VLMs on our benchmark, we also experimentally verify that our method of using pre-existing grading criteria for evaluation is much more reliable than existing methods. Our evaluation code is available at https://github.com/maum-ai/KOFFVQA."/>
<features.PaperTitlepaperLink="https://arxiv.org/abs/2410.01273"title="CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction"/>
<features.PaperDescriptionpreview="Real-life robot navigation involves more than just reaching a destination; it requires optimizing movements while addressing scenario-specific goals. "
22
38
description="An intuitive way for humans to express these goals is through abstract cues like verbal commands or rough sketches. Such human guidance may lack details or be noisy. Nonetheless, we expect robots to navigate as intended. For robots to interpret and execute these abstract instructions in line with human expectations, they must share a common understanding of basic navigation concepts with humans. To this end, we introduce CANVAS, a novel framework that combines visual and linguistic instructions for commonsense-aware navigation. Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior. We present COMMAND, a comprehensive dataset with human-annotated navigation results, spanning over 48 hours and 219 km, designed to train commonsense-aware navigation systems in simulated environments. Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments, demonstrating superior performance with noisy instructions. Notably, in the orchard environment, where ROS NavStack records a 0% total success rate, CANVAS achieves a total success rate of 67%. CANVAS also closely aligns with human demonstrations and commonsense constraints, even in unseen environments. Furthermore, real-world deployment of CANVAS showcases impressive Sim2Real transfer with a total success rate of 69%, highlighting the potential of learning from human demonstrations in simulated environments for real-world applications."/>
<features.PaperTitlepaperLink="https://openreview.net/forum?id=U6wyOnPt1U"title="Integrating Visual and Linguistic Instructions for Context-Aware Navigation Agents"/>
0 commit comments