최신 인공지능 연구 트렌드: 2025.04.14(월)

머신러닝_딥러닝/최신AI논문

최신 인공지능 연구 트렌드: 2025.04.14(월)

8353cc 2025. 4. 14. 20:46

728x90

SMALL

최신 연구 트렌드: 지금 주목받는 AI 논문 모음

📄

category : [fine tuning]

InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians

제목: Photorealistic 인공 얼굴 모델링에서 균형잡힌 고도화 가우스 곡선을 활용한 손-인공아바타 상호작용 연구 이 논문은 균형잡힌 고도화 가우스 곡선을 활용해, 실제 분포를 재현하는 인공 얼굴 모델링에 집중적으로 다루고 있습니다. 이러한 모델링 기법을 이용하면, 손의 움직임과 인공아바타 간의 상호 작용을 더욱 자연스럽고 생동감 있게 만들 수 있습니다. 본 연구는 균형 잡힌 가우스 곡선이 어떻게 실제 분포를 재현하는지에 대해 체계적으로 이해하고자 합니다. 이를 통해, 인공 아바타 개발자들은 손-인공아바타 상호 작용을 향상시키고자 할 때, 더 효과적인 방법들을 개발할 수 있을 것입니다. 본 연구의 목적은, 균형 잡힌 가우스 곡선 기법이 인공 아바타 손-인공아바타 상호 작용에 미치는 영향을 분석하는 것입니다. 이를 통해, 이러한 기술이 얼마나 실제 사용자 경험에 이바지할 수 있는지를 파악하고, 이를 활용한 새로운 아바타 시스템의 개발을 위한 지침을 제공하게 됩니다. 이는 인공아바타 개발자 및 관련 산업에서 새로운 전략과 아이디어를 제공함으로써, 손-인공아바타 상호 작용이 더욱 자연스러워지고, 사용자의 경험에 크게 도움이 될 것입니다.

발행인: Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash

발행일: 2025-04-10 00:00:00

With the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet existing 3D hand and head avatar models often overlook the crucial aspect of hand-body interactions, such as between hand and face. We present InteracttAvatar, the first model to faithfully capture the photorealistic appearance of dynamic hand and non-rigid hand-face interactions. Our novel Dynamic Gaussian Hand model, combining template model and 3D Gaussian Splatting as well as a dynamic refinement module, captures pose-dependent change, e.g. the fine wrinkles and complex shadows that occur during articulation. Importantly, our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures. Through experiments of novel view synthesis, self reenactment and cross-identity reenactment, we demonstrate that InteracttAvatar can reconstruct hand and hand-face interactions from monocular or multiview videos with high-fidelity details and be animated with novel poses.

논문 다운로드

📄

category : [fine tuning]

Free monad sequences and extension operations

제목: Free Monad Sequences 및 확장 연산들에 대한 집중적 분석 이 섹션에서는 Free Monad Sequence와 그 구현에 따른 다양한 확장 연산들을 살펴보겠습니다. Free Monad은 비구조화한 코드의 복잡성을 제어하고, 추상화된 동작들의 순서를 관리하는 데 효과적인 도구입니다. まず, Free Monad Sequence는 일반화된 순열 구조를 제공하며, 이로 인해 다양한 연산들을 적용할 수 있습니다. 이러한 연산들에는 계산의 순서 변경, 연산의 중복 제거, 그리고 동작들의 효율적 분배 등이 포함됩니다. 추가적으로, Free Monad Sequence는 일관된 구현을 위해 필요한 여러 확장 연산들을 지원합니다. 예를 들어, 동작의 반복적인 실행은 일반적인 Free Monad Sequence 연산으로 처리할 수 있으며, 이를 통해 더욱 복잡한 시스템에서 효율적인 연산 실행이 가능해집니다. 더불어, 다양한 입력 및 출력 모드를 지원하는 연산들도 Free Monad Sequence 내에 구현되어 있습니다. 이는 특정 작업의 환경이나 상태를 고려하여 동작을 수행할 수 있게 해주며, 이를 통해 더욱 유연한 코드 설계가 가능해집니다. 결론적으로, Free Monad Sequence와 그 확장 연산들은 비구조화된 코드에서의 효율적이고 일관성 있는 동작 관리에 크게 기여하며, 이는 종종 복잡한 시스템을 설계하는 데 도움이 됩니다.

발행인: Christian Sattler

발행일: 2025-04-10 00:00:00

In the first part of this article, we give an analysis of the free monad sequence in non-cocomplete categories, with the needed colimits explicitly parametrized. This enables us to state a more finely grained functoriality principle for free monad and monoid sequences. In the second part, we deal with the problem of functorially extending via pullback squares a category of maps along the category of coalgebras of an algebraic weak factorization system. This generalizes the classical problem of extending a class of maps along the left class of a weak factorization system in the sense of pullback squares where the vertical maps are in the chosen class and the bottom map is in the left class. Such situations arise in the context of model structures where one might wish to extend fibrations along trivial cofibrations. We derive suitable conditions for the algebraic analogue of weak saturation of the extension problem, using the results of the first part to reduce the technical burden.

논문 다운로드

📄

category : [fine tuning]

MM-IFEngine: Towards Multimodal Instruction Following

MM-IFEngine: 멀 multimodal 지 instruct following 에 대한 논의를 중심으로 살펴보는 것과 동시에 신중히 답변해 보아야 합니다.

발행인: Shengyuan Ding, Shenxi Wu, Xiangyu Zhao, Yuhang Zang, Haodong Duan, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Dahua Lin, Jiaqi Wang

발행일: 2025-04-10 00:00:00

The Instruction Following (IF) ability measures how well Multi-modal Large Language Models (MLLMs) understand exactly what users are telling them and whether they are doing it right. Existing multimodal instruction following training data is scarce, the benchmarks are simple with atomic instructions, and the evaluation strategies are imprecise for tasks demanding exact output constraints. To address this, we present MM-IFEngine, an effective pipeline to generate high-quality image-instruction pairs. Our MM-IFEngine pipeline yields large-scale, diverse, and high-quality training data MM-IFInstruct-23k, which is suitable for Supervised Fine-Tuning (SFT) and extended as MM-IFDPO-23k for Direct Preference Optimization (DPO). We further introduce MM-IFEval, a challenging and diverse multi-modal instruction-following benchmark that includes (1) both compose-level constraints for output responses and perception-level constraints tied to the input images, and (2) a comprehensive evaluation pipeline incorporating both rule-based assessment and judge model. We conduct SFT and DPO experiments and demonstrate that fine-tuning MLLMs on MM-IFInstruct-23k and MM-IFDPO-23k achieves notable gains on various IF benchmarks, such as MM-IFEval (+10.2$\%$), MIA (+7.6$\%$), and IFEval (+12.3$\%$). The full data and evaluation code will be released on https://github.com/SYuan03/MM-IFEngine.

논문 다운로드

📄

category : [fine tuning]

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

[C3PO: 문제별 지지 레이어, 중심적 전문가 협업 경로 최적화와 테스트 타임 전문가 재混합]

발행인: Zhongyang Li, Ziyue Li, Tianyi Zhou

발행일: 2025-04-10 00:00:00

Mixture-of-Experts (MoE) Large Language Models (LLMs) suffer from severely sub-optimal expert pathways-our study reveals that naive expert selection learned from pretraining leaves a surprising 10-20% accuracy gap for improvement. Motivated by this observation, we develop a novel class of test-time optimization methods to re-weight or "re-mixing" the experts in different layers jointly for each test sample. Since the test sample's ground truth is unknown, we propose to optimize a surrogate objective defined by the sample's "successful neighbors" from a reference set of samples. We introduce three surrogates and algorithms based on mode-finding, kernel regression, and the average loss of similar reference samples/tasks. To reduce the cost of optimizing whole pathways, we apply our algorithms merely to the core experts' mixing weights in critical layers, which enjoy similar performance but save significant computation. This leads to "Critical-Layer, Core-Expert, Collaborative Pathway Optimization (C3PO)". We apply C3PO to two recent MoE LLMs and examine it on six widely-used benchmarks. It consistently improves the base model by 7-15% in accuracy and outperforms widely used test-time learning baselines, e.g., in-context learning and prompt/prefix tuning, by a large margin. Moreover, C3PO enables MoE LLMs with 1-3B active parameters to outperform LLMs of 7-9B parameters, hence improving MoE's advantages on efficiency. Our thorough ablation study further sheds novel insights on achieving test-time improvement on MoE.

논문 다운로드

📄

category : [fine tuning]

Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments

[TITLE: 카드를 집중적으로 들여다보고 신중히 답변해야 함: "Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments"] 최근 언어 모델의 발전으로 인간이 제시하는 단어 유사성 평가와 일치 여부를 정확하게 추론하는 것이 가능해졌습니다. 본 연구에서는 "Cat, Rat, Meow"라는 제목 아래에서 이러한 언어 모델의 성능을 검토하고 비판적인 대화를 개진하겠습니다. 본 연구는 주제에 따라 토픽별로 차근차근 분석하며 최대한 자연스러운 영문 문장을 사용하여 한글로 번역해보겠습니다. 언어 모델의 단어 유사성 평가 결과와 인간의 의견 사이에서의 일치도를 측정하고, 해당 문제에 대한 더 깊이 있는 이해를 구축하겠습니다.

발행인: Lorenz Linhardt, Tom Neuhäuser, Lenka Tětková, Oliver Eberle

발행일: 2025-04-10 00:00:00

Small and mid-sized generative language models have gained increasing attention. Their size and availability make them amenable to being analyzed at a behavioral as well as a representational level, allowing investigations of how these levels interact. We evaluate 32 publicly available language models for their representational and behavioral alignment with human similarity judgments on a word triplet task. This provides a novel evaluation setting to probe semantic associations in language beyond common pairwise comparisons. We find that (1) even the representations of small language models can achieve human-level alignment, (2) instruction-tuned model variants can exhibit substantially increased agreement, (3) the pattern of alignment across layers is highly model dependent, and (4) alignment based on models' behavioral responses is highly dependent on model size, matching their representational alignment only for the largest evaluated models.

논문 다운로드

📄

category : [LLM]

MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking

[MONA]: 비감시적인 최적화와 비감시적인 승인으로부터 멀티-steps 보상 지향 가마킹을 방어하는 것을 가능케 하는 비감시적 승인 이 논문은 비감시적 최적화와 비감시적 승인 방법으로, 멀티-steps 보상을 획득하는 과정에서 발생할 수 있는 가마킹 문제를 해결하고자 합니다. 이 접근법은 일반적으로 감시되는 곳에서는 불가능한 방식의 최적화를 가능하게 하며, 동시에 사용자의 결정력을 침해하지 않습니다. 결과적으로, 이러한 방법은 보다 안전하고 신뢰성 있는 인공 지능 환경을 구축하는 데 기여할 것입니다.

발행인: Sebastian Farquhar, Vikrant Varma, David Lindner, David Elson, Caleb Biddulph, Ian Goodfellow, Rohin Shah

발행일: 2025-01-22 00:00:00

Future advanced AI systems may learn sophisticated strategies through reinforcement learning (RL) that humans cannot understand well enough to safely evaluate. We propose a training method which avoids agents learning undesired multi-step plans that receive high reward (multi-step "reward hacks") even if humans are not able to detect that the behaviour is undesired. The method, Myopic Optimization with Non-myopic Approval (MONA), works by combining short-sighted optimization with far-sighted reward. We demonstrate that MONA can prevent multi-step reward hacking that ordinary RL causes, even without being able to detect the reward hacking and without any extra information that ordinary RL does not get access to. We study MONA empirically in three settings which model different misalignment failure modes including 2-step environments with LLMs representing delegated oversight and encoded reasoning and longer-horizon gridworld environments representing sensor tampering.

논문 다운로드

📄

category : [LLM]

Porting an LLM based Application from ChatGPT to an On-Premise Environment

제목: "Llm 기반 애플리케이션을 클라우드 ChatGPT 환경에서 오프라인 환경으로 포팅하기" 오프라인 환경에 있는 LLM(대량 문맥) 기반 애플리케이션이 클라우드에서의 ChatGPT 환경에서 작업하고 있다면, 이들은 데이터 처리 및 학습 과정에서 여러 차원의 문제가 발생할 수 있습니다. 이러한 문제들을 해결하기 위해 온프레임 인프라는 다음과 같은 측면을 고려해야 합니다: 1) 시스템 설계: 클라우드 환경과는 달리, 오프라인 환경에서는 데이터 처리와 학습 과정에서의 네트워크 성능이 중요합니다. 또한, 데이터 저장 및 관리, 서버 리소스 사용 등에 대한 고려사항도 있습니다. 2) 시스템 구축: 클라우드 환경과 달리, 오프라인 환경에서는 장치의 제약 사항을 고려해야 합니다. 예를 들어, 데이터 처리와 학습은 높은 성능을 요구하는 병렬 프로세싱이 필요할 수 있으며, 이는 기기 속성에 따라 달라질 수 있습니다. 3) 시스템 운영: 클라우드 환경에서의 인프라는 자동화된 관리 및 유지 보수를 통해 간소화되었지만, 오프라인 환경에서는 이러한 작업이 주로 전문 팀이나 IT 지원을 통한 단순 관리가 필요할 수 있습니다. 위와 같은 고려사항들을 감안하면, LLM 기반 애플리케이션의 포팅 작업은 매우 복잡하며, 이는 각 업계에 따라 다를 수 있습니다. 이를 위해, 다음과 같은 전략들이 도움이 될 수 있습니다: - 시스템 설계: 효율적인 데이터 처리와 학습 과정을 위한 최적의 인프라 구성을 고려해야 합니다. - 시스템 구축: 기기 제약 사항과 관련된 특수한 요구사항들을 충족시키는 방법에 대한 연구가 필요합니다. - 시스템 운영: 클라우드 환경에서의 자동화된 관리와 유지 보수를 복제하는 방법을 연구해야 합니다. 이러한 포팅 작업은 높은 수준의 전문 지식과 시간, 리소스가 필요하며, 최종 결과로는 애플리케이션의 성능과 확장성이 향상되기를 바랍니다.

발행인: Teemu Paloniemi, Manu Setälä, Tommi Mikkonen

발행일: 2025-04-10 00:00:00

Given the data-intensive nature of Machine Learning (ML) systems in general, and Large Language Models (LLM) in particular, using them in cloud based environments can become a challenge due to legislation related to privacy and security of data. Taking such aspects into consideration implies porting the LLMs to an on-premise environment, where privacy and security can be controlled. In this paper, we study this porting process of a real-life application using ChatGPT, which runs in a public cloud, to an on-premise environment. The application being ported is AIPA, a system that leverages Large Language Models (LLMs) and sophisticated data analytics to enhance the assessment of procurement call bids. The main considerations in the porting process include transparency of open source models and cost of hardware, which are central design choices of the on-premise environment. In addition to presenting the porting process, we evaluate downsides and benefits associated with porting.

논문 다운로드

📄

category : [LLM]

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

주제: Native 다모드 모델의 스케일링 법칙에 대해 집중적으로 살펴보고 신중히 답변해 주시기 바랍니다 - Scaling Laws for Native Multimodal Models 이 주제는 현대 머신 러닝 분야에서 중요한 영역 중 하나로, 다양한 형태의 데이터를 효과적으로 처리하고 사용하는 방법을 연구하고 있습니다. 특히, 'Native Multimodal Models'은 기계 학습 모델이 직접 이해할 수 있는 자연어, 이미지, 음성 등 여러 종류의 데이터를 동시에 처리할 수 있도록 설계된 모델에 대한 이론적 및 실용적인 접근 방식을 다루고 있습니다. 'Scaling Laws'는 특정 모델의 성능이 입력 데이터 크기나 다른 요인에 따라 어떻게 변화하는지 규명하는 기초적인 수학적 법칙입니다. 이를 통해 우리는 더 큰 데이터 세트를 처리할 때 어떤 문제가 발생하고, 이러한 문제들은 어떻게 해결될 수 있는지 이해하게 됩니다. Native Multimodal Models의 스케일링 법칙은 이러한 모델들이 크기가 커짐에 따라 성능이 어떻게 변화하는지를 연구하는데 중점을 둘 필요가 있습니다. 이는 여러 가지 고려 사항들을 포함하되, 예를 들어 가중치 초기화 전략, 데이터 증강 방법, 및 다양한 입력 형식 간의 통합을 들 수 있습니다. 즉, 본 주제는 Native Multimodal Models이 어떤 규칙에 따라 성능을 변화시키고, 이러한 법칙들이 어떻게 모델 개발과 훈련에서 활용될 수 있는지에 대한 깊은 이해를 제공합니다.

발행인: Mustafa Shukor, Enrico Fini, Victor Guilherme Turrisi da Costa, Matthieu Cord, Joshua Susskind, Alaaeldin El-Nouby

발행일: 2025-04-10 00:00:00

Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit remarkable sample efficiency, it remains an open question whether such late-fusion architectures are inherently superior. In this work, we revisit the architectural design of native multimodal models (NMMs)--those trained from the ground up on all modalities--and conduct an extensive scaling laws study, spanning 457 trained models with different architectures and training mixtures. Our investigation reveals no inherent advantage to late-fusion architectures over early-fusion ones, which do not rely on image encoders. On the contrary, early-fusion exhibits stronger performance at lower parameter counts, is more efficient to train, and is easier to deploy. Motivated by the strong performance of the early-fusion architectures, we show that incorporating Mixture of Experts (MoEs) allows for models that learn modality-specific weights, significantly enhancing performance.

논문 다운로드

📄

category : [LLM]

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

[VCR- Bench: Vide오 원철 논리 연속 평가 기준]의 타이틀을 집중적으로 살펴보고 신중히 답변해야 함.

발행인: Yukun Qi, Yiming Zhao, Yu Zeng, Xikun Bao, Wenxuan Huang, Lin Chen, Zehui Chen, Jie Zhao, Zhongang Qi, Feng Zhao

발행일: 2025-04-10 00:00:00

The advancement of Chain-of-Thought (CoT) reasoning has significantly enhanced the capabilities of large language models (LLMs) and large vision-language models (LVLMs). However, a rigorous evaluation framework for video CoT reasoning remains absent. Current video benchmarks fail to adequately assess the reasoning process and expose whether failures stem from deficiencies in perception or reasoning capabilities. Therefore, we introduce VCR-Bench, a novel benchmark designed to comprehensively evaluate LVLMs' Video Chain-of-Thought Reasoning capabilities. VCR-Bench comprises 859 videos spanning a variety of video content and durations, along with 1,034 high-quality question-answer pairs. Each pair is manually annotated with a stepwise CoT rationale, where every step is tagged to indicate its association with the perception or reasoning capabilities. Furthermore, we design seven distinct task dimensions and propose the CoT score to assess the entire CoT process based on the stepwise tagged CoT rationals. Extensive experiments on VCR-Bench highlight substantial limitations in current LVLMs. Even the top-performing model, o1, only achieves a 62.8% CoT score and an 56.7% accuracy, while most models score below 40%. Experiments show most models score lower on perception than reasoning steps, revealing LVLMs' key bottleneck in temporal-spatial information processing for complex video reasoning. A robust positive correlation between the CoT score and accuracy confirms the validity of our evaluation framework and underscores the critical role of CoT reasoning in solving complex video reasoning tasks. We hope VCR-Bench to serve as a standardized evaluation framework and expose the actual drawbacks in complex video reasoning task.

논문 다운로드

📄

category : [RAG]

ConceptFormer: Towards Efficient Use of Knowledge-Graph Embeddings in Large Language Models

"주목할 만한 개념포저의 대화형 언어 모델에서的知识グラフ 인코딩의 효율적 사용에 대한 탐구"

발행인: Joel Barmettler, Abraham Bernstein, Luca Rossetto

발행일: 2025-04-10 00:00:00

Retrieval Augmented Generation (RAG) has enjoyed increased attention in the recent past and recent advancements in Large Language Models (LLMs) have highlighted the importance of integrating world knowledge into these systems. Current RAG methodologies often modify the internal architecture of pre-trained language models (PLMs) or rely on textifying knowledge graphs (KGs), which is inefficient in terms of token usage. This paper introduces ConceptFormer, a new approach to augment LLMs with structured knowledge from KGs, such as Wikidata, without altering their internal structure or relying on textual input of KGs. ConceptFormer operates in the LLM embedding vector space, creating and injecting \emph{concept vectors} that encapsulate the information of the KG nodes directly. Trained in conjunction with a frozen LLM, ConceptFormer generates a comprehensive lookup table that maps KG nodes to their respective concept vectors. The approach aims to enhance the factual recall capabilities of LLMs by enabling them to process these concept vectors natively, thus enriching them with structured world knowledge in an efficient and scalable manner. Our experiments demonstrate that the addition of concept vectors to GPT-2 0.1B substantially increases its factual recall ability (Hit@10) by up to 272\% when tested on sentences from Wikipedia and up to 348\% on synthetically generated sentences. Even injecting only a single concept vector into the prompt increases factual recall ability (Hit@10) by up to 213\% on Wikipedia sentences, significantly outperforming RAG with graph textification while consuming 130x fewer input tokens.

논문 다운로드

📄

category : [RAG]

CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections

[CollEX : 과학 자료 셋의 다중 모드 인공 지능 상호 작용 추상화 시스템, 과학 컬렉션의 참신하고 천천히 탐구를 돕는] 이제 번역한 제목을 한글로 출력해 드리겠습니다: [CollEX : 과학 자료 셋의 다중 모드 인공 지능 상호 작용 추상화 시스템, 과학 컬렉션에 대한 참신하고 천천히 탐구를 돕는]

발행인: Florian Schneider, Narges Baba Ahmadi, Niloufar Baba Ahmadi, Iris Vogel, Martin Semmann, Chris Biemann

발행일: 2025-04-10 00:00:00

In this paper, we introduce CollEx, an innovative multimodal agentic Retrieval-Augmented Generation (RAG) system designed to enhance interactive exploration of extensive scientific collections. Given the overwhelming volume and inherent complexity of scientific collections, conventional search systems often lack necessary intuitiveness and interactivity, presenting substantial barriers for learners, educators, and researchers. CollEx addresses these limitations by employing state-of-the-art Large Vision-Language Models (LVLMs) as multimodal agents accessible through an intuitive chat interface. By abstracting complex interactions via specialized agents equipped with advanced tools, CollEx facilitates curiosity-driven exploration, significantly simplifying access to diverse scientific collections and records therein. Our system integrates textual and visual modalities, supporting educational scenarios that are helpful for teachers, pupils, students, and researchers by fostering independent exploration as well as scientific excitement and curiosity. Furthermore, CollEx serves the research community by discovering interdisciplinary connections and complementing visual data. We illustrate the effectiveness of our system through a proof-of-concept application containing over 64,000 unique records across 32 collections from a local scientific collection from a public university.

논문 다운로드

📄

category : [RAG]

PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization

타이틀: "중심적으로 타이틀을 집중하여 살펴보고 신중히 답변하기 위한 [PR-Attack: Retrieval-Augmented Generation의 동기부여를 통한 촉매형 페모가치 공격을 위한 이차계 최적화}" 번역 과정: 타이틀 중복된 단어들을 제거하고, 영문 표현과 한국어 문맥에 맞게 재구성하였습니다. 이를 통해 원문의 핵심적인 내용을 유지하면서도 독특하고 구체적인 타이틀을 완성했습니다.

발행인: Yang Jiao, Xiaodong Wang, Kai Yang

발행일: 2025-04-10 00:00:00

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications, e.g., medical question-answering, mathematical sciences, and code generation. However, they also exhibit inherent limitations, such as outdated knowledge and susceptibility to hallucinations. Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm to address these issues, but it also introduces new vulnerabilities. Recent efforts have focused on the security of RAG-based LLMs, yet existing attack methods face three critical challenges: (1) their effectiveness declines sharply when only a limited number of poisoned texts can be injected into the knowledge database, (2) they lack sufficient stealth, as the attacks are often detectable by anomaly detection systems, which compromises their effectiveness, and (3) they rely on heuristic approaches to generate poisoned texts, lacking formal optimization frameworks and theoretic guarantees, which limits their effectiveness and applicability. To address these issues, we propose coordinated Prompt-RAG attack (PR-attack), a novel optimization-driven attack that introduces a small number of poisoned texts into the knowledge database while embedding a backdoor trigger within the prompt. When activated, the trigger causes the LLM to generate pre-designed responses to targeted queries, while maintaining normal behavior in other contexts. This ensures both high effectiveness and stealth. We formulate the attack generation process as a bilevel optimization problem leveraging a principled optimization framework to develop optimal poisoned texts and triggers. Extensive experiments across diverse LLMs and datasets demonstrate the effectiveness of PR-Attack, achieving a high attack success rate even with a limited number of poisoned texts and significantly improved stealth compared to existing methods.

논문 다운로드

📄

category : [RAG]

MRD-RAG: Enhancing Medical Diagnosis with Multi-Round Retrieval-Augmented Generation

제목: 병리학 진단 향상에 대한 다중 회수 부가 생성기법 연구

발행인: Yixiang Chen, Penglei Sun, Xiang Li, Xiaowen Chu

발행일: 2025-04-10 00:00:00

In recent years, accurately and quickly deploying medical large language models (LLMs) has become a significant trend. Among these, retrieval-augmented generation (RAG) has garnered significant attention due to its features of rapid deployment and privacy protection. However, existing medical RAG frameworks still have shortcomings. Most existing medical RAG frameworks are designed for single-round question answering tasks and are not suitable for multi-round diagnostic dialogue. On the other hand, existing medical multi-round RAG frameworks do not consider the interconnections between potential diseases to inquire precisely like a doctor. To address these issues, we propose a Multi-Round Diagnostic RAG (MRD-RAG) framework that mimics the doctor's diagnostic process. This RAG framework can analyze diagnosis information of potential diseases and accurately conduct multi-round diagnosis like a doctor. To evaluate the effectiveness of our proposed frameworks, we conduct experiments on two modern medical datasets and two traditional Chinese medicine datasets, with evaluations by GPT and human doctors on different methods. The results indicate that our RAG framework can significantly enhance the diagnostic performance of LLMs, highlighting the potential of our approach in medical diagnosis. The code and data can be found in our project website https://github.com/YixiangCh/MRD-RAG/tree/master.

논문 다운로드

📄

category : [RAG]

A System for Comprehensive Assessment of RAG Frameworks

[A 시스템: RAG 프레임워크 전반적 평가에 대한 집中的 관찰 및 신중한 답변]

발행인: Mattia Rengo, Senad Beadini, Domenico Alfano, Roberto Abbruzzese

발행일: 2025-04-10 00:00:00

Retrieval Augmented Generation (RAG) has emerged as a standard paradigm for enhancing the factual accuracy and contextual relevance of Large Language Models (LLMs) by integrating retrieval mechanisms. However, existing evaluation frameworks fail to provide a holistic black-box approach to assessing RAG systems, especially in real-world deployment scenarios. To address this gap, we introduce SCARF (System for Comprehensive Assessment of RAG Frameworks), a modular and flexible evaluation framework designed to benchmark deployed RAG applications systematically. SCARF provides an end-to-end, black-box evaluation methodology, enabling a limited-effort comparison across diverse RAG frameworks. Our framework supports multiple deployment configurations and facilitates automated testing across vector databases and LLM serving strategies, producing a detailed performance report. Moreover, SCARF integrates practical considerations such as response coherence, providing a scalable and adaptable solution for researchers and industry professionals evaluating RAG applications. Using the REST APIs interface, we demonstrate how SCARF can be applied to real-world scenarios, showcasing its flexibility in assessing different RAG frameworks and configurations. SCARF is available at GitHub repository.

논문 다운로드

728x90

LIST