반응형
Notice
Recent Posts
Recent Comments
Link
«   2025/05   »
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Tags more
Archives
Today
Total
관리 메뉴

사람과 AI

최신 인공지능 연구 트렌드: 2025.04.13(일) 본문

머신러닝_딥러닝/최신AI논문

최신 인공지능 연구 트렌드: 2025.04.13(일)

8353cc 2025. 4. 13. 21:08
반응형

 

 
📄
category : [fine tuning]
SOFIA/upGREAT imaging spectroscopy of the [C II] 158 um fine structure line toward the Sgr A region in the Galactic center
타이틀: [SOFIA/upGREAT 고해상 스펙트로그램의 [C II] 158 μm 라인 측정 및 Sgr A 근처 galactic center에서] 소설[SOFIA/upGREAT 대형 항성 시스템을 통해 galactic center 주변에 존재하는 [C II] 158 μm 라인의 고해상 스펙트로그램을 측정하여 분석한 연구입니다. Sgr A는 galactic center에서 관측가능한 가장 가까운 대형 항성 시스템으로, 매우 중요한 astrophysical 연구의 주요 지역입니다. 이 연구에서는 high-resolution spectroscopy를 사용하여 Sgr A 근처에 존재하는 [C II] 158 μm 라인의 특징을 살펴보고, galactic center의 맥주름이나 빈부조화 등의 astrophysical 현상을 이해하는데 기여할 수 있는 새로운 정보를 제공하고자 합니다.
발행인: A. I. Harris, R. Güsten, M. A. Requena-Torres, D. Riquelme, M. R. Morris, G. J. Stacey, J. Stutzki, Y. Okada, E. Chambers, M. Mertens, C. Fischer
발행일: 2025-04-08 00:00:00
We present SOFIA/upGREAT velocity-resolved spectral imaging and analysis of the 158 um [C II] spectral line toward the central 80 by 43\,pc region of the Central Molecular Zone of the Galaxy. The field we imaged with 14" (0.6 pc) spatial and 1 km/s spectral resolution contains the Circum-Nuclear Disk (CND) around the central black hole Sgr A*, the neighboring thermal Arched Filaments, the nonthermal filaments of the Radio Arc, and the three luminous central star clusters. [C II] traces emission from the CND's inner edge to material orbiting at a distance of approximately 6 pc. Its velocity field reveals no sign of inflowing material nor interaction with winds from the Sgr A East supernova remnant. Wide-field imaging of the Sgr A region shows multiple circular segments, including the thermal Arched Filaments, that are centered on a region that includes the Quintuplet cluster. We examine the possibility that the Arched Filaments and other large-scale arcs trace transient excitation events from supernova blast waves. Along the Arched Filaments, comparisons among far-IR fine structure lines show changes in ionization state over small scales and that high-excitation lines are systematically shifted in position from the other lines. These also point to transient fast winds that shocked on the surface of the Arches cloud to produce additional local UV radiation to excite the Arched Filaments on a cloud surface illuminated by UV from hot stars.
논문 다운로드
 
📄
category : [fine tuning]
Electronic Structure Guided Inverse Design Using Generative Models
전자 구조 기반 생성 모델을 이용한 반응물 설계를 위한 역설계 방법
발행인: Shuyi Jia, Panchapakesan Ganesh, Victor Fung
발행일: 2025-04-08 00:00:00
The electronic structure of a material fundamentally determines its underlying physical, and by extension, its functional properties. Consequently, the ability to identify or generate materials with desired electronic properties would enable the design of tailored functional materials. Traditional approaches relying on human intuition or exhaustive computational screening of known materials remain inefficient and resource-prohibitive for this task. Here, we introduce DOSMatGen, the first instance of a machine learning method which generates crystal structures that match a given desired electronic density of states. DOSMatGen is an E(3)-equivariant joint diffusion framework, and utilizes classifier-free guidance to accurately condition the generated materials on the density of states. Our experiments find this approach can successfully yield materials which are both stable and match closely with the desired density of states. Furthermore, this method is highly flexible and allows for finely controlled generation which can target specific templates or even individual sites within a material. This method enables a more physics-driven approach to designing new materials for applications including catalysts, photovoltaics, and superconductors.
논문 다운로드
 
📄
category : [fine tuning]
Transfer between Modalities with MetaQueries
제목: 모델 간 기계번역을 위한 메타질문기반 텐서 전이 학습 답변: 메타질문기반 텐서 전이 학습은 다양한 매달린스 패러미터를 통해 단일 모델에 대한 여러 번의 학습 과정으로 이전된 능력을 갖는다. 이를 통해 비선형적이고 다양한 기계번역 문제에서 뛰어난 성능을 보여준다. 이러한 접근법은 특히 문장, 이미지, 음성 등 다양한 모달 간의 번역을 가능하게 한다.
발행인: Xichen Pan, Satya Narayan Shukla, Aashu Singh, Zhuokai Zhao, Shlok Kumar Mishra, Jialiang Wang, Zhiyang Xu, Jiuhai Chen, Kunpeng Li, Felix Juefei-Xu, Ji Hou, Saining Xie
발행일: 2025-04-08 00:00:00
Unified multimodal models aim to integrate understanding (text output) and generation (pixel output), but aligning these different modalities within a single architecture often demands complex training recipes and careful data balancing. We introduce MetaQueries, a set of learnable queries that act as an efficient interface between autoregressive multimodal LLMs (MLLMs) and diffusion models. MetaQueries connects the MLLM's latents to the diffusion decoder, enabling knowledge-augmented image generation by leveraging the MLLM's deep understanding and reasoning capabilities. Our method simplifies training, requiring only paired image-caption data and standard diffusion objectives. Notably, this transfer is effective even when the MLLM backbone remains frozen, thereby preserving its state-of-the-art multimodal understanding capabilities while achieving strong generative performance. Additionally, our method is flexible and can be easily instruction-tuned for advanced applications such as image editing and subject-driven generation.
논문 다운로드
 
📄
category : [fine tuning]
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
[Hogwild! 인ferences: 수평적 주목을 통한 동시 분산된 Large Language Model(Generation)] 이 제목은 대량 언어 모델의 생성에 대한 새로운 접근 방식, "Hogwild!"라는 이름의 연구를 다루고 있습니다. 이 기법은 여러 개의 처리 코루틴(Coroutine)이 서로 동시에 주목을 수행하는 방식으로 이루어집니다. 이러한 동시 작업을 통해 더 빠르고 효율적으로 대량 언어 모델의 생성이 가능해집니다.
발행인: Gleb Rodionov, Roman Garipov, Alina Shutova, George Yakushev, Vage Egiazarian, Anton Sinitsin, Denis Kuznedelev, Dan Alistarh
발행일: 2025-04-08 00:00:00
Large Language Models (LLMs) have demonstrated the ability to tackle increasingly complex tasks through advanced reasoning, long-form content generation, and tool use. Solving these tasks often involves long inference-time computations. In human problem solving, a common strategy to expedite work is collaboration: by dividing the problem into sub-tasks, exploring different strategies concurrently, etc. Recent research has shown that LLMs can also operate in parallel by implementing explicit cooperation frameworks, such as voting mechanisms or the explicit creation of independent sub-tasks that can be executed in parallel. However, each of these frameworks may not be suitable for all types of tasks, which can hinder their applicability. In this work, we propose a different design approach: we run LLM "workers" in parallel , allowing them to synchronize via a concurrently-updated attention cache and prompt these workers to decide how best to collaborate. Our approach allows the instances to come up with their own collaboration strategy for the problem at hand, all the while "seeing" each other's partial progress in the concurrent cache. We implement this approach via Hogwild! Inference: a parallel LLM inference engine where multiple instances of the same LLM run in parallel with the same attention cache, with "instant" access to each other's generated tokens. Hogwild! inference takes advantage of Rotary Position Embeddings (RoPE) to avoid recomputation while improving parallel hardware utilization. We find that modern reasoning-capable LLMs can perform inference with shared Key-Value cache out of the box, without additional fine-tuning.
논문 다운로드
 
📄
category : [fine tuning]
GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization
[GOLLuM: Gaussian Process Optimized Language Model Reinvention via Bayesian Optimization]
발행인: Bojana Ranković, Philippe Schwaller
발행일: 2025-04-08 00:00:00
Large Language Models (LLMs) can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We address this gap with a novel architecture that reframes LLM finetuning as Gaussian process (GP) marginal likelihood optimization via deep kernel methods. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both - LLMs to provide a rich and flexible input space for Bayesian optimization and - GPs to model this space with predictive uncertainty for more efficient sampling. Applied to Buchwald-Hartwig reaction optimization, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions in just 50 optimization iterations). We also observe a 14% improvement over domain-specific representations without requiring specialized features. Extensive empirical evaluation across 19 benchmarks - ranging from general chemistry to reaction and molecular property optimization - demonstrates our method's robustness, generality, and consistent improvements across: (1) tasks, (2) LLM architectures (encoder, decoder, encoder-decoder), (3) pretraining domains (chemistry-related or general-purpose) and (4) hyperparameter settings (tuned once on a single dataset). Finally, we explain these improvements: joint LLM-GP optimization through marginal likelihood implicitly performs contrastive learning, aligning representations to produce (1) better-structured embedding spaces, (2) improved uncertainty calibration, and (3) more efficient sampling - without requiring any external loss. This work provides both practical advances in sample-efficient optimization and insights into what makes effective Bayesian optimization.
논문 다운로드
 
📄
category : [LLM]
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
타이틀: APIGen-MT의 다방면 데이터 생성을 위한 인공 지능 기반 수순 시스템: 실제 환경에서 유추형 인간-기계 상호작용을 활용한 AGENTIC 플라이웨ay 번역: APIGen-MT의 다변화 데이터를 생성하기 위해 인공 지능 기반의 순서 시스템: 실제 환경에서 추측형 인간-기계 상호 작용을 통한 AGENTIC 플로우 입니다.
발행인: Akshara Prabhakar, Zuxin Liu, Ming Zhu, Jianguo Zhang, Tulika Awalgaonkar, Shiyu Wang, Zhiwei Liu, Haolin Chen, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Weiran Yao, Huan Wang, Silvio Savarese, Caiming Xiong
발행일: 2025-04-04 00:00:00
Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Models are available on HuggingFace at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4 and project website is https://apigen-mt.github.io
논문 다운로드
 
📄
category : [LLM]
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
[FEABench: 고무성능 언어모델의 다물리학적 추론 능력 평가를 위한 타이틀 연구]
발행인: Nayantara Mudur, Hao Cui, Subhashini Venugopalan, Paul Raccuglia, Michael P. Brenner, Peter Norgaard
발행일: 2025-04-08 00:00:00
Building precise simulations of the real world and invoking numerical solvers to answer quantitative problems is an essential requirement in engineering and science. We present FEABench, a benchmark to evaluate the ability of large language models (LLMs) and LLM agents to simulate and solve physics, mathematics and engineering problems using finite element analysis (FEA). We introduce a comprehensive evaluation scheme to investigate the ability of LLMs to solve these problems end-to-end by reasoning over natural language problem descriptions and operating COMSOL Multiphysics$^\circledR$, an FEA software, to compute the answers. We additionally design a language model agent equipped with the ability to interact with the software through its Application Programming Interface (API), examine its outputs and use tools to improve its solutions over multiple iterations. Our best performing strategy generates executable API calls 88% of the time. LLMs that can successfully interact with and operate FEA software to solve problems such as those in our benchmark would push the frontiers of automation in engineering. Acquiring this capability would augment LLMs' reasoning skills with the precision of numerical solvers and advance the development of autonomous systems that can tackle complex problems in the real world. The code is available at https://github.com/google/feabench
논문 다운로드
 
📄
category : [RAG]
PathGPT: Leveraging Large Language Models for Personalized Route Generation
주제별 쓰여보기 - "대형 언어 모델을 활용한 개인화된 경로 생성" 대형 언어 모델(Large Language Models, LLMs)은 최근 자연어 처리와 대화 시스템 분야에서 혁신적인 성과를 기록하고 있습니다. 이 논문에서는 이러한 LLMs가 주관적이고 고유한 경로를 생성하는 데 어떻게 활용될 수 있는지에 대해 살펴보려 합니다. 경로 생성은 다양한 분야에서 중요한 역할을 합니다. 교통 흐름 관리, 지진 경보 시스템 등에서 이론적인 이해가 필요한 응용 분야는 물론, 개인화된 여행 예약 시스템, 자동 주차 시스템과 같은 실제 인프라에서도 뛰어난 결과를 보여줄 수 있습니다. 대형 언어 모델은 텍스트 데이터를 기반으로 학습하며, 이는 그들이 사람들의 대화 패턴을 이해하고 고객 맞춤형 서비스를 제공하는 데 사용될 수 있음을 의미합니다. 이러한 모델들은 자연스러운 대화를 생성하거나 주제별 정보를 추출하는 능력이 있어, 경로 선택에 대한 다양한 요구 사항을 충족시키는 방법을 배울 수 있습니다. 본 연구에서는 대형 언어 모델의 활용 가능성과 그것이 어떻게 개인화된 경로 생성에 도움이 될 수 있는지를 조명하겠습니다. 이를 통해 우리는 기술 발전이 인간 생활에 미치는 영향을 이해하고, 지능형 시스템을 개발하는 데 어떤 방향으로 나아가야 하는지에 대한 우리의 역할을 생각해볼 수 있을 것입니다.
발행인: Steeve Cuthbert Marcelyn, Yucen Gao, Yuzhe Zhang, Xiaofeng Gao, Guihai Chen
발행일: 2025-04-08 00:00:00
The proliferation of GPS enabled devices has led to the accumulation of a substantial corpus of historical trajectory data. By leveraging these data for training machine learning models,researchers have devised novel data-driven methodologies that address the personalized route recommendation (PRR) problem. In contrast to conventional algorithms such as Dijkstra shortest path algorithm,these novel algorithms possess the capacity to discern and learn patterns within the data,thereby facilitating the generation of more personalized paths. However,once these models have been trained,their application is constrained to the generation of routes that align with their training patterns. This limitation renders them less adaptable to novel scenarios and the deployment of multiple machine learning models might be necessary to address new possible scenarios,which can be costly as each model must be trained separately. Inspired by recent advances in the field of Large Language Models (LLMs),we leveraged their natural language understanding capabilities to develop a unified model to solve the PRR problem while being seamlessly adaptable to new scenarios without additional training. To accomplish this,we combined the extensive knowledge LLMs acquired during training with further access to external hand-crafted context information,similar to RAG (Retrieved Augmented Generation) systems,to enhance their ability to generate paths according to user-defined requirements. Extensive experiments on different datasets show a considerable uplift in LLM performance on the PRR problem.
논문 다운로드
 
📄
category : [RAG]
Themes of Building LLM-based Applications for Production: A Practitioner's View
제목: 제작가的经验과 관점에서 본 언어 모델 기반 응용 프로그램의 주요 테마 이 연구에서는 언어 모델(LLM) 기반 응용 프로그램을 구축하는 데 필요한 주요 테마에 대해 신중하게 살펴보고, 제작가의 경험과 관점을 중심으로 접근하겠습니다. LLM은 최근 AI 분야에서 중요한 발전을 겪고 있으며, 이를 통해 다양한 산업 영역에 걸쳐 새로운 응용 프로그램이 탄생되고 있습니다. 이 연구는 제작 가들이 이러한 기술 변화를 반영하고 효과적으로 응용 프로그램을 구축하기 위한 전략과 실무적 지침들을 제공할 것입니다.
발행인: Alina Mailach, Sebastian Simon, Johannes Dorn, Norbert Siegmund
발행일: 2024-11-13 00:00:00
Background: Large language models (LLMs) have become a paramount interest of researchers and practitioners alike, yet a comprehensive overview of key considerations for those developing LLM-based systems is lacking. This study addresses this gap by collecting and mapping the topics practitioners discuss online, offering practical insights into where priorities lie in developing LLM-based applications. Method: We collected 189 videos from 2022 to 2024 from practitioners actively developing such systems and discussing various aspects they encounter during development and deployment of LLMs in production. We analyzed the transcripts using BERTopic, then manually sorted and merged the generated topics into themes, leading to a total of 20 topics in 8 themes. Results: The most prevalent topics fall within the theme Design & Architecture, with a strong focus on retrieval-augmented generation (RAG) systems. Other frequently discussed topics include model capabilities and enhancement techniques (e.g., fine-tuning, prompt engineering), infrastructure and tooling, and risks and ethical challenges. Implications: Our results highlight current discussions and challenges in deploying LLMs in production. This way, we provide a systematic overview of key aspects practitioners should be aware of when developing LLM-based applications. We further pale off topics of interest for academics where further research is needed.
논문 다운로드
 
📄
category : [RAG]
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
타이틀: 다국어 상황 활용의 일관성에 대해 집중적으로 살펴보고 신중히 답변해보자: Retrieval-가공식화 과정에서의 다언어 맥락 이용의 일관성에 관하여 [On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation]
발행인: Jirui Qi, Raquel Fernández, Arianna Bisazza
발행일: 2025-04-01 00:00:00
Retrieval-augmented generation (RAG) with large language models (LLMs) has demonstrated strong performance in multilingual question-answering (QA) tasks by leveraging relevant passages retrieved from corpora. In multilingual RAG (mRAG), the retrieved passages can be written in languages other than that of the query entered by the user, making it challenging for LLMs to effectively utilize the provided information. Recent research suggests that retrieving passages from multilingual corpora can improve RAG performance, particularly for low-resource languages. However, the extent to which LLMs can leverage different kinds of multilingual contexts to generate accurate answers, *independently from retrieval quality*, remains understudied. In this paper, we conduct an extensive assessment of LLMs' ability to (i) make consistent use of a relevant passage regardless of its language, (ii) respond in the expected language, and (iii) focus on the relevant passage even when multiple `distracting' passages in different languages are provided in the context. Our experiments with four LLMs across three QA datasets covering a total of 48 languages reveal a surprising ability of LLMs to extract the relevant information from out-language passages, but a much weaker ability to formulate a full answer in the correct language. Our analysis, based on both accuracy and feature attribution techniques, further shows that distracting passages negatively impact answer quality regardless of their language. However, distractors in the query language exert a slightly stronger influence. Taken together, our findings deepen the understanding of how LLMs utilize context in mRAG systems, providing directions for future improvements.
논문 다운로드
 
📄
category : [RAG]
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
[검색-R1: 리플레이스 핸들러를 활용하는 텍스트 모델 튜닝과 그에 대한 강화학습]
발행인: Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han
발행일: 2025-03-12 00:00:00
Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Prompting advanced LLMs with reasoning capabilities to use search engines during inference is often suboptimal, as the LLM might not fully possess the capability on how to interact optimally with the search engine. This paper introduces Search-R1, an extension of reinforcement learning (RL) for reasoning frameworks where the LLM learns to autonomously generate (multiple) search queries during step-by-step reasoning with real-time retrieval. Search-R1 optimizes LLM reasoning trajectories with multi-turn search interactions, leveraging retrieved token masking for stable RL training and a simple outcome-based reward function. Experiments on seven question-answering datasets show that Search-R1 improves performance by 41% (Qwen2.5-7B) and 20% (Qwen2.5-3B) over various RAG baselines under the same setting. This paper further provides empirical insights into RL optimization methods, LLM choices, and response length dynamics in retrieval-augmented reasoning. The code and model checkpoints are available at https://github.com/PeterGriffinJin/Search-R1.
논문 다운로드
 
📄
category : [RAG]
Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning
[Tytle]: Decentralized AI Memory System: SHIMI - A Semantic Hierarchical Memory Index for Scalable Agent Reasoning [Abrrevated Title]: SHIMI, 하나의 분산된 AI 메모리 시스템: 수용성 높은 로봇 판단을 위한 상세화 지시기 메모리 인덱스
발행인: Tooraj Helmi
발행일: 2025-04-08 00:00:00
Retrieval-Augmented Generation (RAG) and vector-based search have become foundational tools for memory in AI systems, yet they struggle with abstraction, scalability, and semantic precision - especially in decentralized environments. We present SHIMI (Semantic Hierarchical Memory Index), a unified architecture that models knowledge as a dynamically structured hierarchy of concepts, enabling agents to retrieve information based on meaning rather than surface similarity. SHIMI organizes memory into layered semantic nodes and supports top-down traversal from abstract intent to specific entities, offering more precise and explainable retrieval. Critically, SHIMI is natively designed for decentralized ecosystems, where agents maintain local memory trees and synchronize them asynchronously across networks. We introduce a lightweight sync protocol that leverages Merkle-DAG summaries, Bloom filters, and CRDT-style conflict resolution to enable partial synchronization with minimal overhead. Through benchmark experiments and use cases involving decentralized agent collaboration, we demonstrate SHIMI's advantages in retrieval accuracy, semantic fidelity, and scalability - positioning it as a core infrastructure layer for decentralized cognitive systems.
논문 다운로드

 

 

 

반응형