id large_stringlengths 9 16 | title large_stringlengths 5 245 | abstract large_stringlengths 83 4.03k | categories large_stringlengths 5 108 | update_date timestamp[ms]date 2007-05-23 00:00:00 2026-06-26 00:00:00 | authors large_stringlengths 5 24.7k | classification_label large_stringclasses 2
values | is_new_dataset bool 2
classes | confidence_score float64 0.5 0.98 | classification_date large_stringdate 2026-06-30 02:08:35 2026-06-30 02:08:35 | model_version large_stringclasses 1
value | embedding large listlengths | embedding_model large_stringclasses 0
values |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
2310.11714 | Consistent Distributed Ranking of Generative Models via Kernel Distances | Ranking generative models based on the fidelity and diversity of their outputs is required to identify the best generator in a group of candidate generative AI models. To rank a group of models in a conventional centralized setting, a standard score is commonly evaluated for each involved model. The selection and desig... | cs.LG | 2026-06-26T00:00:00 | Zixiao Wang, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu | no_new_dataset | false | 0.965777 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2409.16395 | HELIOT: LLM-Based CDSS for Adverse Drug Reaction Management | Medication errors significantly threaten patient safety, leading to adverse drug events and substantial economic burdens on healthcare systems. Clinical Decision Support Systems (CDSSs) aimed at mitigating these errors often face limitations when processing unstructured clinical data, including reliance on static datab... | cs.AI | 2026-06-26T00:00:00 | Gabriele De Vito, Filomena Ferrucci, Athanasios Angelakis | new_dataset | true | 0.964089 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2412.09959 | Towards Consistent and Efficient Dataset Distillation via Diffusion-Driven Selection | Dataset distillation provides an effective approach to reduce memory and computational costs by optimizing a compact dataset that achieves performance comparable to the full original. However, for large-scale datasets and complex deep networks (e.g., ImageNet-1K with ResNet-101), the vast optimization space hinders dis... | cs.CV | 2026-06-26T00:00:00 | Xinhao Zhong, Shuoyang Sun, Zhaoyang Xu, Xulin Gu, Bin Chen, Min Zhang, Yaowei Wang | no_new_dataset | false | 0.961972 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2501.07526 | Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization | Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where communication is more expensive than computation, the scalability and performance of th... | cs.DC stat.ML | 2026-06-26T00:00:00 | Aditya Devarakonda, Ramakrishnan Kannan | no_new_dataset | false | 0.961576 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2501.13955 | Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs? | This study explores the potential of Large Language Models (LLMs) to generate artificial surveys, with a focus on personal mobility preferences in Germany. By leveraging LLMs for synthetic data creation, we aim to address the limitations of traditional survey methods, such as high costs, inefficiency and scalability ch... | cs.CL cs.AI cs.CY | 2026-06-26T00:00:00 | Ioannis Tzachristas, Santhanakrishnan Narayanan and Constantinos Antoniou | no_new_dataset | false | 0.615637 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2501.19274 | GO: The Great Outdoors Multimodal Dataset | The Great Outdoors (GO) dataset is a multi-modal annotated data resource aimed at advancing ground robotics research in unstructured environments. Existing off-road datasets often lack sensor diversity and exclude vital modalities like thermal and radar that are critical for operation in degraded conditions (e.g., low ... | cs.RO | 2026-06-26T00:00:00 | Peng Jiang, Kasi Viswanath, Akhil Nagariya, George Chustz, Maggie Wigness, Philip Osteen, Timothy Overbye, Christian Ellis, Long Quang, Jia Huang, Srikanth Saripalli | new_dataset | true | 0.970379 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2502.06890 | LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison | The increasing volume of drug combinations in modern therapeutic regimens needs reliable methods for predicting drug-drug interactions (DDIs). While Large Language Models (LLMs) have revolutionized various domains, their potential in pharmaceutical research, particularly in DDI prediction, remains largely unexplored. T... | cs.LG cs.AI q-bio.QM | 2026-06-26T00:00:00 | Gabriele De Vito, Filomena Ferrucci, Athanasios Angelakis | no_new_dataset | false | 0.950446 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2504.13432 | Circular Quasiconformal Deturbulence: Geometry-Based Restoration from Multiple Turbulent Frames | Imaging through inhomogeneous media often results in severe distortions, posing significant challenges to downstream image-processing tasks. The lack of clean paired images makes supervised learning impractical, motivating unsupervised restoration approaches. In this work, we propose the Circular Quasi-Conformal Deturb... | cs.CV | 2026-06-26T00:00:00 | Chu Chen, Han Zhang, Lok Ming Lui | no_new_dataset | false | 0.963591 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2505.20178 | No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference | Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic \enquote{free lunch} for PPI++, an adaptive form of PPI, showing that the \textit{asymptotic} variance of PPI++ is always less than ... | stat.ML cs.LG | 2026-06-26T00:00:00 | Pranav Mani, Peng Xu, Zachary C. Lipton, Michael Oberst | no_new_dataset | false | 0.963184 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2507.03122 | Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings | This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline comb... | cs.IR cs.CL cs.LG | 2026-06-26T00:00:00 | Binbin Xu, G\'erard Dray | no_new_dataset | false | 0.96394 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2508.08005 | Learning to Select Maximum Clique Algorithms: From Traditional Machine Learning to a Dual-Channel Hybrid Neural Architecture | The Maximum Clique Problem (MCP) is an NP-hard problem with wide-ranging applications in fields such as bioinformatics, network science, and social computing, yet no single algorithm consistently outperforms all others across diverse graph instances. This underscores the critical need for instance-aware algorithm selec... | cs.LG cs.AI | 2026-06-26T00:00:00 | Xiang Li, Shanshan Wang, Chenglong Xiao | new_dataset | true | 0.966797 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2508.17916 | EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images | Depth estimation is a foundational component for 3D reconstruction in minimally invasive endoscopic surgeries. However, existing monocular depth estimation techniques often exhibit limited performance to the varying illumination and complex textures of the surgical environment. While applying foundation models offers a... | cs.CV | 2026-06-26T00:00:00 | Xinning Yao, Bo Liu, Bojian Li, Jingjing Wang, Jinghua Yue, Fugen Zhou | no_new_dataset | false | 0.968252 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2508.21221 | Uncertainty-Aware Ankle Exoskeleton Control | Lower limb exoskeletons show promise to assist human movement, but their utility is limited by controllers designed for discrete, predefined actions in controlled environments, restricting their real-world applicability. We present an uncertainty-aware control framework that enables ankle exoskeletons to operate safely... | cs.RO | 2026-06-26T00:00:00 | Fatima Mumtaza Tourk, Bishoy Galoaa, Sanat Shajan, Aaron J. Young, Michael Everett, Max K. Shepherd | no_new_dataset | false | 0.943242 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2509.09960 | Limited Reference, Reliable Generation: A Two-Component Framework for Tabular Data Generation in Low-Data Regimes | Synthetic tabular data generation is increasingly essential in machine learning, supporting downstream applications when real-world, high-quality tabular data is insufficient. Existing tabular generation approaches, such as generative adversarial networks (GANs) and fine-tuned Large Language Models (LLMs), typically re... | cs.LG cs.AI | 2026-06-26T00:00:00 | Mingxuan Jiang, Keyang Chen, Yongxin Wang, Yongsheng Zhao, Ziyue Dai, Yicun Liu, Zeping Li, Qiuyang Zhang, Hongyi Nie, Hongbin Zhu, Sen Liu, Guangnan Ye, and Hongfeng Chai | no_new_dataset | false | 0.95743 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2510.00586 | Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors | Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned documents for each target phrase. We introduce Eyes-on-Me, a modular attack that decomposes an adversarial document into reusable **Attention Attractors** and **Focus Regions... | cs.LG cs.CL cs.CR | 2026-06-26T00:00:00 | Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen | no_new_dataset | false | 0.92902 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2510.02809 | Relevance-Aware Thresholding in Online Conformal Prediction for Time Series | Uncertainty quantification has received considerable interest in recent works in Machine Learning. In particular, Conformal Prediction (CP) gains ground in this field. For the case of time series, Online Conformal Prediction (OCP) becomes an option to address the problem of data distribution shift over time. Indeed, th... | cs.LG cs.AI | 2026-06-26T00:00:00 | Th\'eo Dupuy and Binbin Xu and St\'ephane Perrey and Jacky Montmain and Abdelhak Imoussaten | no_new_dataset | false | 0.970307 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2510.17459 | Estimating Orbital Parameters of Direct Imaging Exoplanet Using Neural Network | In this work, we propose a flow-matching Markov chain Monte Carlo (FM-MCMC) algorithm for estimating the orbital parameters of exoplanetary systems, especially for those only one exoplanet is involved. Compared to traditional methods that rely on random sampling within the Bayesian framework, our approach first leverag... | astro-ph.EP astro-ph.GA cs.LG | 2026-06-26T00:00:00 | Bo Liang, Hanlin Song, Chang Liu, Tianyu Zhao, Yuxiang Xu, Zihao Xiao, Manjia Liang, Minghui Du, Wei-Liang Qian, Li-e Qiang, Peng Xu, Ziren Luo | no_new_dataset | false | 0.966104 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2510.20769 | CSU-PCAST: A Dual-Branch Transformer Framework for medium-range ensemble Precipitation Forecasting | Accurate medium-range precipitation forecasting is essential for hydrometeorological risk management but remains challenging for both numerical weather prediction (NWP) systems and data-driven models. We present CSU-PCAST, a deep learning-based ensemble forecasting framework for global precipitation prediction. The mod... | physics.ao-ph cs.LG | 2026-06-26T00:00:00 | Tianyi Xiong, Haonan Chen, Kelly Mahoney, Jingyin Tang, Tim Smith and Janice Bytheway | no_new_dataset | false | 0.950369 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2511.18254 | UniFlow: Zero-Shot LiDAR Scene Flow for Autonomous Vehicles | LiDAR scene flow is the task of estimating per-point 3D motion between consecutive point clouds. Recent methods achieve centimeter-level accuracy on popular autonomous vehicle (AV) datasets, but are typically only trained and evaluated on a single sensor. In this paper, we aim to learn general motion priors that transf... | cs.CV | 2026-06-26T00:00:00 | Siyi Li, Qingwen Zhang, Ishan Khatri, Kyle Vedder, Eric Eaton, Deva Ramanan, Neehar Peri | no_new_dataset | false | 0.833481 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2512.02652 | Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training | Existing methods for expressive music performance rendering, a conditional generation task that aims to generate a human-like performance from a symbolic score, rely on supervised learning over small labeled datasets, which limits scaling of both data volume and model size, despite the availability of vast unlabeled mu... | cs.SD cs.AI cs.MM | 2026-06-26T00:00:00 | Hong-Jie You, Jie-Jing Shao, Xiao-Wen Yang, Lin-Han Jia, Lan-Zhe Guo, Yu-Feng Li | no_new_dataset | false | 0.864372 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2512.04890 | Equivariant symmetry-aware head pose estimation for fetal MRI | We present E(3)-Pose, a novel fast pose estimation method that jointly and explicitly models rotation equivariance and object symmetry. Our work is motivated by the challenging problem of accounting for fetal head motion during a diagnostic MRI scan. We aim to enable automatic adaptive prescription of diagnostic 2D MRI... | cs.CV | 2026-06-26T00:00:00 | Ramya Muthukrishnan, Borjan Gagoski, Aryn Lee, P. Ellen Grant, Elfar Adalsteinsson, Benjamin Billot, Polina Golland | no_new_dataset | false | 0.95187 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2512.06401 | LLMCFG-TGen: Using LLM-Generated Control Flow Graphs to Automatically Create Test Cases from Use Cases | Appropriate test-case generation is critical in software testing and significantly impacts testing quality. Requirements-Based Test Generation (RBTG) derives test cases from software requirements to verify whether system behavior aligns with user needs and expectations. Requirements are often documented in Natural Lang... | cs.SE | 2026-06-26T00:00:00 | Zhenzhen Yang, Chenhui Cui, Tao Li, Rubing Huang, Nan Niu, Dave Towey, Shikai Guo | no_new_dataset | false | 0.922819 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2601.01084 | A UAV-Based Multispectral and RGB Dataset for Multi-Stage Paddy Crop Monitoring in Indian Agricultural Fields | We present a large-scale unmanned aerial vehicle (UAV)-based RGB and multispectral image dataset collected over paddy fields in the Vijayawada region, Andhra Pradesh, India, covering nursery to harvesting stages. We used a 20-megapixel RGB camera and a 5-megapixel four-band multispectral camera capturing red, green, re... | cs.CV eess.IV | 2026-06-26T00:00:00 | Adari Rama Sukanya, Puvvula Roopesh Naga Sri Sai, Bodduru Neshika, Rimalapudi Sarvendranath | new_dataset | true | 0.965152 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2601.01701 | Digital Twin-Driven Communication-Efficient Federated Anomaly Detection for Industrial IoT | Anomaly detection is increasingly becoming crucial for maintaining the safety, reliability, and efficiency of industrial systems. Recently, with the advent of digital twins and data-driven decision-making, several statistical and machine-learning methods have been proposed. However, these methods face several challenge... | cs.LG cs.AI | 2026-06-26T00:00:00 | Mohammed Ayalew Belay, Adil Rasheed, Pierluigi Salvo Rossi | no_new_dataset | false | 0.965397 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2601.04390 | SciFig: Towards Automating Editable Figure Generation for Scientific Papers | High-quality methodology figures are central to scientific communication, yet they remain difficult and time-consuming to create. Such figures must distill a method's components and information flow into a clear, revisable diagram as the paper evolves. Existing methodology diagram automation systems typically face a tr... | cs.AI | 2026-06-26T00:00:00 | Siyuan Huang, Yifan Zhou, Yutong Gao, Zi Yin, Juyang Bai, Xinxin Liu, Rama Chellappa, Chun Pong Lau, Cheng Peng, Sayan Nag, Shraman Pramanick | new_dataset | true | 0.971241 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2601.12062 | Learning Language-Driven Sequence-Level Modal-Invariant Representations for Video-Based Visible-Infrared Person Re-Identification | The core of video-based visible-infrared person re-identification (VVI-ReID) lies in learning sequence-level modal-invariant representations across different modalities. Recent research tends to use modality-shared language prompts generated by CLIP to guide the learning of modal-invariant representations. Despite achi... | cs.CV | 2026-06-26T00:00:00 | Xiaomei Yang, Antai Liu, Xizhan Gao, Fa Zhu, Sijie Niu, and Giancarlo Fortino | no_new_dataset | false | 0.963188 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2601.13632 | Resilient Routing: Risk-Aware Dynamic Routing in Smart Logistics via Spatiotemporal Graph Learning | With the rapid development of the e-commerce industry, the logistics network is experiencing unprecedented pressure. The traditional static routing strategy most time cannot tolerate the traffic congestion and fluctuating retail demand. In this paper, we propose a Risk-Aware Dynamic Routing(RADR) framework which integr... | cs.AI | 2026-06-26T00:00:00 | Zhiming Xue, Sichen Zhao, Yalun Qi, Xianling Zeng, Zihan Yu | no_new_dataset | false | 0.964021 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2602.13939 | Adaptive Automatic Model Selection for Demand Forecasting under Heterogeneous Demand Patterns | Demand forecasting is critical for inventory planning, procurement, replenishment, production, and capacity decisions in heterogeneous supply chains. However, selecting the most appropriate model for each demand series remains challenging because performance varies across datasets, demand structures, horizons, and eval... | cs.LG cs.AI | 2026-06-26T00:00:00 | Adolfo Gonz\'alez, V\'ictor Parada | no_new_dataset | false | 0.968194 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2602.16220 | SEMixer: Semantics Enhanced MLP-Mixer for Multiscale Mixing and Long-term Time Series Forecasting | Modeling multiscale patterns is crucial for long-term time series forecasting (TSF). However, redundancy and noise in time series, together with semantic gaps between non-adjacent scales, make the efficient alignment and integration of multi-scale temporal dependencies challenging. To address this, we propose SEMixer, ... | cs.LG | 2026-06-26T00:00:00 | Xu Zhang, Qitong Wang, Peng Wang, Wei Wang | no_new_dataset | false | 0.967578 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2602.18446 | ReportLogic: Evaluating Logical Quality in Deep Research Reports | Users increasingly rely on Large Language Models (LLMs) for Deep Research, using them to synthesize diverse sources into structured reports that support understanding and action. In this context, the practical reliability of such reports hinges on logical quality: whether the report's claims and arguments are explicitl... | cs.CL cs.AI | 2026-06-26T00:00:00 | Jujia Zhao, Zhaoxin Huan, Zihan Wang, Xiaolu Zhang, Jun Zhou, Suzan Verberne, and Zhaochun Ren | new_dataset | true | 0.976244 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2602.18900 | PrivacyBench: Privacy Isn't Free in Hybrid Privacy-Preserving Vision Systems | Privacy preserving machine learning deployments in sensitive deep learning applications; from medical imaging to autonomous systems; increasingly require combining multiple techniques. Yet, practitioners lack systematic guidance to assess the synergistic and non-additive interactions of these hybrid configurations, rel... | cs.CR cs.CV | 2026-06-26T00:00:00 | Nnaemeka Obiefuna and Samuel Oyeneye and Similoluwa Odunaiya and Iremide Oyelaja and Steven Kolawole | no_new_dataset | false | 0.927117 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2603.01195 | VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning | The effectiveness of multimodal instruction tuning depends not only on dataset scale, but critically on whether training samples genuinely require visual reasoning. However, existing instruction datasets often contain a substantial portion of visually redundant samples (solvable from text alone), as well as multimodall... | cs.CV cs.AI | 2026-06-26T00:00:00 | Mingkang Dong, Hongyi Cai, Jie Li, Sifan Zhou, Bin Ren, Kunyu Peng, Yuqian Fu | no_new_dataset | false | 0.769431 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2603.01461 | UltraStar: Semantic-Aware Star Graph Modeling for Echocardiography Navigation | Echocardiography is critical for diagnosing cardiovascular diseases, yet the shortage of skilled sonographers hinders timely patient care, due to high operational difficulties. Consequently, research on automated probe navigation has significant clinical potential. To achieve robust navigation, it is essential to lever... | cs.CV | 2026-06-26T00:00:00 | Teng Wang, Haojun Jiang, Chenxi Li, Diwen Wang, Yihang Tang, Zhenguo Sun, Yujiao Deng, Shiji Song, Gao Huang | no_new_dataset | false | 0.725539 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2603.24991 | Towards Video Anomaly Detection from Event Streams: A Baseline and Benchmark Datasets | Event-based vision, characterized by low redundancy, focus on dynamic motion, and inherent privacy-preserving properties, naturally fits the demands of video anomaly detection (VAD). However, the absence of dedicated event-stream anomaly detection datasets and effective modeling strategies has significantly hindered pr... | cs.CV | 2026-06-26T00:00:00 | Peng Wu, Yuting Yan, Guansong Pang, Yujia Sun, Qingsen Yan, Peng Wang, Yanning Zhang | new_dataset | true | 0.963291 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2604.05920 | Reference Energies for Non-Relativistic Core Ionization Potentials | Deep-lying core electrons carry highly localized, site-specific information that forms the basis of X-ray photoelectron spectroscopy. Accurately predicting their associated core ionization potentials (IPs) is a demanding theoretical task, requiring a balanced treatment of strong orbital relaxation, electron correlation... | physics.chem-ph cond-mat.mtrl-sci nucl-th | 2026-06-26T00:00:00 | Antoine Marie and Loris Burth and Pierre-Fran\c{c}ois Loos | new_dataset | true | 0.95753 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2604.08448 | AfriVoices-KE: A Multilingual Speech Dataset for Kenyan Languages | AfriVoices-KE is a large-scale multilingual speech dataset comprising approximately 3,000 hours of audio across five Kenyan languages: Dholuo, Kikuyu, Kalenjin, Maasai, and Somali. The dataset includes 750 hours of scripted speech and 2,250 hours of spontaneous speech, collected from 4,777 native speakers across divers... | cs.CL | 2026-06-26T00:00:00 | Lilian Wanzare, Cynthia Amol, Ezekiel Maina, Nelson Odhiambo, Hope Kerubo, Leila Misula, Vivian Oloo, Rennish Mboya, Edwin Onkoba, Edward Ombui, Joseph Muguro, Ciira wa Maina, Andrew Kipkebut, Alfred Omondi Otom, Ian Ndung'u Kang'ethe, Angela Wambui Kanyi, Brian Gichana Omwenga | new_dataset | true | 0.969665 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2604.17420 | TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering | Money laundering poses severe risks to global financial systems, driving the widespread adoption of machine learning for transaction monitoring. However, progress remains stifled by the lack of realistic benchmarks. Existing transaction-graph datasets suffer from two pervasive limitations: (i) they provide sparse node-... | cs.LG cs.AI cs.SI | 2026-06-26T00:00:00 | Keyang Chen, Mingxuan Jiang, Yongsheng Zhao, Zeping Li, Zaiyuan Chen, Weiqi Luo, Zhixin Li, Sen Liu, Yinan Jing, Guangnan Ye, Xihong Wu, Hongfeng Chai | new_dataset | true | 0.969492 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2605.03680 | Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs | While deep-learning-based image restoration has achieved unprecedented fidelity, deployment on mobile Neural Processing Units (NPUs) remains bottlenecked by operator incompatibility and memory-access overhead. We propose an NPU-aware hardware-algorithm co-design approach for real-world image denoising on mobile NPUs. O... | cs.CV cs.LG | 2026-06-26T00:00:00 | Faraz Kayani, Sarmad Kayani, Asad Ahmed, Radu Timofte, and Dmitry Ignatov | new_dataset | true | 0.945709 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2605.13693 | StayStill: a large-scale 3D idle animation dataset | Idle animations are essential for virtual characters, as they convey realistic behaviour during inactive states. While automatic animation generation has been widely studied, limited attention has been given to idle motion due to the absence of dedicated training datasets. We introduce StayStill, a large-scale dataset ... | cs.GR | 2026-06-26T00:00:00 | Eneko Atxa Landa, Igor Rodriguez, Elena Lazkano, Taras Kucherenko | new_dataset | true | 0.973324 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2605.24417 | LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots | Supervised classification on tabular data remains a central machine learning task, but its dependence on large labeled datasets limits its applicability in data-scarce settings. Few-shot methods such as TabPFN achieve strong performance through large-scale synthetic pretraining, yet still require labeled context exampl... | cs.LG | 2026-06-26T00:00:00 | Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov | new_dataset | true | 0.965517 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2605.24696 | CALIBURN: Operationally Calibrated Streaming Intrusion Detection with Regime-Dependent Conformal Risk Control | Streaming intrusion detection systems must process flows continuously under bounded memory, yet most leave alerting-threshold selection as a post-hoc tuning problem incompatible with production, where operators commit in advance to alert budgets, misclassification costs, and Service Level Objectives. We present CALIBUR... | cs.CR cs.LG | 2026-06-26T00:00:00 | Michel A. Youssef | no_new_dataset | false | 0.868595 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.00827 | Beyond Independent Manipulation: Individual Fairness-aware Strategic Classification with Peer Imitation | Strategic classification (SC) investigates scenarios where agents manipulate their features to obtain favorable decisions from predictive models. Existing fairness-aware SC approaches primarily focus on group fairness and typically assume that agents respond independently. However, when individual fairness is required,... | cs.LG cs.AI | 2026-06-26T00:00:00 | Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu, Jinxuan Yang, Yuanlong Chen, Wangrong Huang, Shaowu Yang, Wenjing Yang, Xinwang Liu, Peng Cui, Haotian Wang | no_new_dataset | false | 0.956471 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.03549 | How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration | Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the ... | cs.LG math.PR | 2026-06-26T00:00:00 | Vadim Porvatov, Andrey Dukhovny, Andrey Lange | no_new_dataset | false | 0.963651 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.06408 | MODIS Thermal Infrared Sounding (MOTIS): Estimating Tropical Cyclone Central Pressure from Warm-Core Anomalies | This study presents a novel framework for estimating the central sea-level pressure ($P_\mathrm{c}$) of tropical cyclones (TCs) using infrared radiometers. We leverage the long-overlooked combination of high spatial resolution and sounding capability of the Moderate Resolution Imaging Spectroradiometer (MODIS) to measu... | physics.ao-ph | 2026-06-26T00:00:00 | Jinghuai Yao, Chi Yan Kwok, Puyuan Du, Yubo Wang, and Derrick Herndon | new_dataset | true | 0.958641 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.12716 | Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review | The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significan... | cs.CL | 2026-06-26T00:00:00 | Xinyu Zhao, Rana Muhammad Shahroz Khan, Zhen Xu, Zhen Tan, Tianlong Chen | new_dataset | true | 0.976679 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.13042 | Augmentation techniques for video surveillance in the visible and thermal spectral range | In intelligent video surveillance, cameras record image sequences during day and night. Commonly, this demands different sensors. To achieve a better performance it is not unusual to combine them. We focus on the case that a long-wave infrared camera records continuously and in addition to this, another camera records ... | cs.AI cs.CV | 2026-06-26T00:00:00 | Vanessa Buhrmester, Ann-Kristin Grosselfinger, David Munch, and Michael Arens | no_new_dataset | false | 0.876493 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.14668 | When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing | Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central desig... | cs.LG | 2026-06-26T00:00:00 | Baijia Zhang, Yining Huang | no_new_dataset | false | 0.961054 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.16325 | Attention-Based Prototype Calibration for Multi-Rater Few-Shot Medical Image Segmentation | Few-shot medical image segmentation methods typically assume a single ground-truth annotation, overlooking systematic variability across expert raters commonly observed in clinical datasets. We propose an attention-based prototype calibration framework for few-shot multi-rater segmentation that models rater-specific de... | cs.CV | 2026-06-26T00:00:00 | Truong Vu, Minh Khoi Ho, Yutong Xie | no_new_dataset | false | 0.959122 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.21097 | GRAG: Generic Response-Augmented Generation Framework for Personalized Conversational Systems | Deploying highly capable personalized conversational agents in resource-constrained or privacy-sensitive environments remains a significant challenge. We identify a fundamental bottleneck in the existing approaches: current training paradigms treat personalization and grounding as a single monolithic learning problem. ... | cs.CL cs.LG | 2026-06-26T00:00:00 | Junfeng Liu, Christopher T. Symons, Ranga Raju Vatsavai | no_new_dataset | false | 0.954078 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.21649 | EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory | Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for retrieval. It is tailored for long-context scenarios, where information... | cs.CL | 2026-06-26T00:00:00 | Chang Nie, Chaoyou Fu, Junlan Feng, Caifeng Shan | new_dataset | true | 0.970956 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.22076 | Learning Cross-View Semantic Priors for Single-Reference Unseen Object Pose Estimation | Single-reference unseen object 6D pose estimation reduces object onboarding by estimating poses of arbitrary novel objects from only one reference view. Recent correspondence-based pipelines have achieved robust performance with vision foundation model (VFM) features. However, they typically treat these features as int... | cs.CV | 2026-06-26T00:00:00 | Jiahong Chen, Jinghao Wang, Ziwen Wang, Zi Wang, Banglei Guan and Qifeng Yu | no_new_dataset | false | 0.944891 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.22537 | NegAS: Negative Label Guided Attention and Scoring for Out-of-Distribution Object Detection with Vision-Language Models | Out-of-Distribution (OOD) detection is essential for ensuring the robustness and reliability of object detection systems deployed in safety-critical applications. While prior research has mainly focused on uni-modal detectors or vision-language model (VLM) based classifiers, the potential of VLM-based object detectors ... | cs.CV | 2026-06-26T00:00:00 | Yingjie Zhang, Shuai Li, Peng Wang | no_new_dataset | false | 0.952479 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.24890 | Small edits, large models: How Wikipedia advocacy shapes LLM values | Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text. The Pro-Animal Wikipedians (PAW), a group of advocates who add source... | cs.CL cs.AI cs.CY | 2026-06-26T00:00:00 | Jasmine Brazilek, Maria Navas, Alexa Gnauck | no_new_dataset | false | 0.891778 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.25006 | Scalable Peptide Design via Memory-Efficient Equivariant Transformer | Target-specific peptide design requires sequence and structure co-design under full atom geometric constraints. Latent generative frameworks offer an effective route for this problem by compressing fine grained atomic structures into block level latent representations and performing conditional generation in a compact ... | cs.LG | 2026-06-26T00:00:00 | Rui Jiao, Xiangzhe Kong, Yinjun Jia, Yijia Zhang, Ziyi Yang, Yang Liu and Jianzhu Ma | no_new_dataset | false | 0.945915 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.25832 | MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources | Achieving strong optimization generalization across diverse optimization problems while requiring limited training resources remains a challenging problem for optimization-oriented large language models (LLMs). Existing approaches typically rely on large-scale supervised datasets, costly reasoning annotations, and expe... | cs.LG cs.AI | 2026-06-26T00:00:00 | Ke Zhao, Zixiang Di, Hong Qian, Xiang Shu, Yaolin Wen, Qitao Shi, Bingdong Li, Xingyu Lu, Xiangfeng Wang, Jun Zhou, Ke Tang, Yang Yu | no_new_dataset | false | 0.952707 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.25996 | Autodata: An agentic data scientist to create high quality synthetic data | We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical im... | cs.AI cs.CL cs.LG | 2026-06-26T00:00:00 | Ilia Kulikov, Chenxi Whitehouse, Tianhao Wu, Yixin Nie, Swarnadeep Saha, Eryk Helenowski, Weizhe Yuan, Olga Golovneva, Jack Lanchantin, Yoram Bachrach, Jakob Foerster, Xian Li, Han Fang, Sainbayar Sukhbaatar, Jason Weston | no_new_dataset | false | 0.905912 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26099 | Benchmarking Open-Weight Foundation Models for Global AI Technical Governance | Large language models (LLMs) are increasingly deployed in artificial intelligence (AI) governance analysis across national and international organisations. There is, however, growing evidence that such models produce significantly less accurate responses for countries that are underrepresented in their training data-a ... | cs.CY cs.AI | 2026-06-26T00:00:00 | Jason Hung | new_dataset | true | 0.955401 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26101 | Know2Guess: A Contamination-Aware Multi-Zone Benchmark for Knowledge-Boundary Evaluation in Large Language Models | Reliable evaluation of large language models should separate supported answering from unsupported guessing without conflating either with data contamination, prompt idiosyncrasy, or generic refusal behavior. We present a contamination-aware, multi-zone benchmark for measuring the transition from answerable knowledge to... | cs.CL cs.AI | 2026-06-26T00:00:00 | Renwei Meng, Bowen Zhang, Jian Wang, Xican Wang, Haoyi Wu, Xuanyan Qiu, and Shengan Yang | new_dataset | true | 0.970531 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26102 | Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training | Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate whether the domain of post-training data differentially affects the retention of animal ... | cs.CL cs.AI cs.CY | 2026-06-26T00:00:00 | Jasmine Brazilek, Juliana Seawell | no_new_dataset | false | 0.805479 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26103 | Investigating LLM's Problem Solving Capability -- a Study on Statics Questions | Large Language Models (LLMs) have rapidly influenced many aspects of society, particularly education, due to their demonstrated ability to complete assignments and examinations across a wide range of subjects. Although prior studies have examined the educational impact of LLMs, much of the existing work relies on publi... | cs.CL cs.AI | 2026-06-26T00:00:00 | Tanner Culleton and Hung-Fu Chang | new_dataset | true | 0.95255 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26107 | Low Resource Multimodal Translation of Nepali Spoken Words into Emotion-Conditioned Sign Language Avatars | Sign language communication systems, that integrate emotional expression remain underexplored, particularly for low-resource languages. This pilot study presents NEST-V1 (Nepali Emotion and Speech Transformer - Version 1), a proof-of-concept multimodal framework that demonstrates the feasibility of generating emotion-c... | cs.CL cs.AI | 2026-06-26T00:00:00 | Jatin Bhusal and Salma Tamang | no_new_dataset | false | 0.829837 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26108 | Where Larger Models Excel: The Primacy of Constraint-Guided Reasoning | Larger language models consistently outperform smaller ones on reasoning benchmarks, yet the reasoning differences underlying this gap remain underexplored. Across benchmarks in mathematics, physics, chemistry, and programming, we observe stable performance gaps: averaged over datasets, Qwen3-32B outperforms Qwen3-8B b... | cs.CL | 2026-06-26T00:00:00 | Guan-Yi Lin, Hen-Hsen Huang | no_new_dataset | false | 0.944968 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26130 | Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods | Large Language Models (LLMs) are increasingly used to guide research methodology, yet their default methodological tendencies under minimal prompting remain unclear. Here, we prompt GPT-5.1, Gemini 3 Pro, and DeepSeek-V3.2 with an LLM-extracted research question from each of 1,000 recent arXiv computer-science papers a... | cs.CL cs.AI cs.DL | 2026-06-26T00:00:00 | Francesca Carlon, Brecht Verbeken, Vincent Ginis, Andres Algaba | no_new_dataset | false | 0.817338 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26151 | Unsupervised Memory-Enhanced Video Transformers: Obstacle Detection for Autonomous Agricultural Rover | While autonomous rovers have become indispensable to precision farming, achieving consistent operational safety remains a critical challenge. Conventional safety sensors, such as LiDAR, fail to detect obstacles positioned below the plant canopy, posing a significant risk. While camera-based supervised learning methods ... | cs.RO cs.AI | 2026-06-26T00:00:00 | Th\'eo Biardeau (XLIM-ASALI, UFR SFA (Poitiers)), Anne-Sophie Capelle-Laiz\'e (UP, XLIM-ASALI, XLIM-ASALI), Salwan Alwan, David Helbert (UFR SFA (Poitiers), XLIM-ASALI, LabCom I3M (Poitiers)) | no_new_dataset | false | 0.890574 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26165 | Predicting Fruit Quality with a Hybrid Machine Learning and Image Processing Approach | Fruit spoilage is a significant issue in agriculture, leading to substantial economic losses. Addressing this, our study introduces a hybrid approach combining image processing and deep learning to assess fruit freshness. We developed an image processing algorithm that quantifies spoilage on a scale from 0 (fully fresh... | cs.CV | 2026-06-26T00:00:00 | Amir Reza Hashemi, Shahram Amiri | no_new_dataset | false | 0.944468 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26168 | Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration | Living systems navigate environments using noisy and incomplete sensory signals. In unicellular algae, phototaxis is often modeled as a mechanistic run--tumble process driven by stimulus--response rules. However, such descriptions overlook how organisms actively sample their environment to reduce sensory ambiguity. Fro... | cs.LG q-bio.QM | 2026-06-26T00:00:00 | Ruyi Tang, Gr\'egoire Sergeant-Perthuis (LCQB-AG), David Colliaux | no_new_dataset | false | 0.94643 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26169 | Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis | Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in manual design. This paper provides a comprehensive review of NAS methods applied to GANs, ... | cs.LG cs.AI | 2026-06-26T00:00:00 | Abrar Alotaibi, Moataz Ahmed | no_new_dataset | false | 0.976102 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26171 | LCG: Long-Context Consistent Image Generation with Sparse Relational Attention | Recent image generation models achieve impressive quality in single-image synthesis, but often fail to maintain consistency across sequential outputs, as required in comics, storyboards, and visual narratives. We propose Long-Context Generation (LCG), a framework for long-context multi-image text-to-image generation, t... | cs.CV cs.AI | 2026-06-26T00:00:00 | Zihao Wang, Yijia Xu, Haoze Zheng, Xuran Ma, Haokun Gui, and Harry Yang | new_dataset | true | 0.977465 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26176 | Toward Mitigating Process-Induced Performance Degradation in 3.5D Heterogeneous Packages via Pre-Silicon Firmware Co-Optimization | This paper presents a pre-silicon analysis of XRM-SSD V24/V7.0, a physics-aware predictive firmware scheduling layer for Intel's 3.5D heterogeneous integrated packages (Foveros Direct 3D + PowerVia + EMIB-T + UCIe + HBM5). Using detailed thermal-electrical co-simulation over a 90,000-step LLM inference dataset, we show... | cs.AR | 2026-06-26T00:00:00 | Chi Fei Chung (Dollarchip Technology Inc.), Nikolai Nedovodin (STARGA Inc.) | no_new_dataset | false | 0.805611 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26179 | KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction | While WGS-based AMR prediction has reached high accuracy, existing models lack a mechanism to ground neural attributions in established biological pathways. We present KG-TRACE, a novel neuro-symbolic framework that integrates the WHO mutation knowledge graph (KG) as a structured biological constraint on a neural genom... | cs.LG cs.AI q-bio.QM | 2026-06-26T00:00:00 | Naman Garg, Sarika Jain, Sourav Yadav, Bharat K. Bhargava, Ghanapriya Singh, Abhishek Srivastava, Parimal Kar | no_new_dataset | false | 0.739974 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26192 | Federated Hash Projected Latent Factor Learning | Hash Learning (HL) is an efficient representation learning approach that maps real-valued data into compact binary representations. Traditional HL methods typically require users to upload personal data to a central server, which is incompatible with increasingly stringent data security regulations. Federated Learning ... | cs.LG cs.CR | 2026-06-26T00:00:00 | Jialan He | no_new_dataset | false | 0.969032 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26195 | Soroll-IA: A Weakly Labeled Audio Dataset for Real-World Industrial Port Monitoring | Soroll-IA is a weakly labeled environmental audio dataset recorded in a real-world industrial port environment in Valencia (Spain) using two fixed sensing nodes. The dataset comprises approximately 22 hours of audio segmented into 7,396 clips and covers 26 sound event classes representative of industrial port acoustic ... | cs.SD | 2026-06-26T00:00:00 | Javier Naranjo-Alcazar, Jordi Grau-Haro, Ruben Ribes-Serrano, Marta Garcia-Ballesteros, Pedro Zuccarello | new_dataset | true | 0.969195 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26201 | OmniContact: Chaining Meta-Skills via Contact Flow for Generalizable Humanoid Loco-Manipulation | Learning long-horizon humanoid loco-manipulation poses a dual challenge: it requires not only the robust execution of meta-skills but also their seamless, closed-loop chaining equipped with autonomous recovery. Existing approaches remain limited: explicit humanoid-object interaction representations offer precision but ... | cs.RO | 2026-06-26T00:00:00 | Runyi Yu, Xiaoyi Lin, Ji Ma, Yinhuai Wang, Koukou Luo, Jiahao Ji, Huayi Wang, Wenjia Wang, Runhan Zhang, Ping Tan, Ting Wu, Ruoli Dai, Qifeng Chen, and Lei Han | new_dataset | true | 0.968443 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26204 | Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery | Floods frequently impact regions around the world. Rapid and accurate flood detection is crucial for emergency response and timely mitigation of human and economic loss. The expanding availability of satellite data and advances in artificial intelligence have enhanced monitoring of environmental hazards, but many flood... | cs.LG | 2026-06-26T00:00:00 | Sophia Li, Max Zhao, Raghu G. Raj, Tianyu Chen | new_dataset | true | 0.951033 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26207 | The Role of Input Dimensionality in the Emergence and Targeted Control of Adversarial Examples | Several theoretical works have tried to explain the adversarial vulnerability of deep neural networks through properties of high-dimensional geometry. However, the assumptions underlying these works are rarely examined empirically, and systematic evidence remains limited. In this work, we present a systematic study of ... | stat.ML cs.CR cs.LG | 2026-06-26T00:00:00 | Nasrin Malekzadeh Goradel, Niccolo Pancino, Yaser Gholizade Atani, Benedetta Tondi, Giovanni Bellettini, Mauro Barni | no_new_dataset | false | 0.955921 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26211 | Data Facts: A Metadata Schema for Structured Data Exchange in the NANDini Multi-Agent Ecosystem | NANDini (Networked Agents Natural Distillation of Interconnected Nodal Intelligence) envisions an automated ecosystem where intelligent agents independently create, process, and exchange data to drive decisions at scale. Realizing this vision requires infrastructure beyond agent discovery and communication: agents must... | cs.CR | 2026-06-26T00:00:00 | Jin Gao, Maria Gorskikh, Pradyumna Chari, Brittany Box, Mukul Kemla, Pratik Behera, Abhishek Mehta, Ramesh Raskar | new_dataset | true | 0.852629 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26257 | Dataset Usage Inference without Shadow Models or Held-out Data | How much of my data was used to train a machine learning model? Dataset Usage Inference (DUI) aims to answer this by estimating what fraction of a dataset contributed to a model's training. However, existing DUI methods rely on assumptions that rarely hold in practice: they require training expensive shadow models to i... | cs.LG | 2026-06-26T00:00:00 | Wojciech {\L}apacz, Stanis{\l}aw Pawlak, Jan Dubi\'nski, Franziska Boenisch, and Adam Dziedzic | no_new_dataset | false | 0.950233 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26260 | A multi-task spatiotemporal deep neural network for predicting penetration depth and morphology in laser welding | In laser penetration welding, the assessment of penetration state and weld seam morphology plays a crucial role in determining the weld quality. This paper presents a comprehensive introduction of the innovative muti-task deep learning model that has the capability to predict penetration state, depth, and weld seam mor... | cs.CV cs.AI | 2026-06-26T00:00:00 | Sen Li, Haichao Cui, Chendong Shao, Yaqi Wang, Xinhua Tang | new_dataset | true | 0.890045 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26285 | TEMPO-Diffusion: Temporally Exposed Malicious Poisoning of Diffusion Models | Noise-based backdoor attacks on diffusion models typically rely on input-time trigger injection, untargeted activation, and out-of-distribution target generation. Such assumptions reduce both the stealthiness and the practical relevance of these attacks. In this work, we present TEMPO-Diffusion, a targeted backdoor fra... | cs.CR cs.AI | 2026-06-26T00:00:00 | William Aiken, Paula Branco, Guy-Vincent Jourdan, Iosif-Viorel Onut | new_dataset | true | 0.969982 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26289 | Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption | AI coding agents are penetrating open-source software development at an unprecedented pace, yet existing research predominantly treats human contributors as a static backdrop rather than as the subject of inquiry. This paper presents the first large-scale empirical study that takes the human contributor ecosystem as it... | cs.SE | 2026-06-26T00:00:00 | Weixing Zhang, Bowen Jiang, Anne Koziolek | no_new_dataset | false | 0.572945 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26294 | The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators | Self-improving agents are state-of-the-art (SOTA) on agentic coding benchmarks and have recently been extended to general domains. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier, benchmark, or labeled dataset that remains valid as the agent improves. This ignores a ce... | cs.LG cs.AI cs.MA cs.NE | 2026-06-26T00:00:00 | Alex Iacob, Andrej Jovanovi\'c, William F. Shen, Daniel Burkhardt, Meghdad Kurmanji, Nurbek Tastan, Lorenzo Sani, Niccol\`o Alberto Elia Venanzi, Ambroise Odonnat, Zeyu Cao, Bill Marino, Xinchi Qiu, and Nicholas D. Lane | no_new_dataset | false | 0.928897 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26295 | Beyond Aesthetics: Quantifying Information Loss in Turbid Scenes | Visibility in underwater environments degrades rapidly under turbid conditions, yet the effects on computer-vision models remain unclear. This issue is compounded by reliance on synthetic turbidity datasets, which may misrepresent real-world information loss. To address this gap, we introduce the Turbid Underwater Base... | cs.CV | 2026-06-26T00:00:00 | Vasiliki Ismiroglou, Stefan H. Bengtson, Tasos Benos, Thomas B. Moeslund, Malte Pedersen | new_dataset | true | 0.968034 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26312 | Tailor Made Embeddings for Quantum Machine Learning | Autoencoders transformed classical machine learning by solving the curse of dimensionality, enabling principled weight initialization and learning compact, structured representations. In this work, we extend this paradigm to quantum machine learning by introducing a variational autoencoder framework that learns task-sp... | quant-ph cs.CV cs.LG | 2026-06-26T00:00:00 | Aldo Lamarre and Dominik \v{S}afr\'anek | no_new_dataset | false | 0.951024 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26317 | Parametric Generalized Adaptive Moment Features (PG-AMF) for Bearing Fault Diagnosis and Machine Health Monitoring | Accurate fault diagnosis of rolling element bearings in rotating machinery is considered essential for ensuring industrial safety and enabling predictive maintenance. Conventional statistical feature-based methods rely on predefined descriptors, whose diagnostic sensitivity is constrained by fixed configurations and li... | eess.SP cs.AI | 2026-06-26T00:00:00 | Rajeev Kumar | no_new_dataset | false | 0.948911 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26337 | EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening | Gradient Boosted Decision Trees (GBDT), exemplified by LightGBM, spend a dominant fraction of training time -- typically 65-70% -- constructing per-feature histograms. Existing approaches such as random feature subsampling (feature_fraction) discard features without regard for their predictive utility. We propose EMA-b... | cs.LG | 2026-06-26T00:00:00 | Yan Song | no_new_dataset | false | 0.954801 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26347 | Health feature extraction from battery energy storage system field fault data | Health monitoring methods are critical for lithium-ion battery modules connected to the grid to prevent faults that can lead to catastrophic events. However, assessing the health of cells in modules from their operational data presents challenges including variable operating conditions, which directly confound health f... | eess.SY cs.SY | 2026-06-26T00:00:00 | Clement Wong, Andrew Weng, Xin Hui Ooi, Zhiwen Wan, Jeesoon Choi, Seung Yoon Yang, Heejun Jin, Jason Siegel, Anna Stefanopoulou | new_dataset | true | 0.612508 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26363 | Bayesian Changepoint Detection for Smart Sensing of Battery Degradation: Cycle-Level Health Indicators and PyMC Implementation | Reliable detection of the onset of accelerated degradation is central to safe and cost-efficient operation of lithium-ion batteries. This paper presents a Bayesian single-changepoint model applied to a simple but physically meaningful cycle-level health indicator (HI), defined as the ratio of charge time to discharge t... | eess.SY cs.SY | 2026-06-26T00:00:00 | Waldemar Bauer, Anna Jarosz-Kozyro and Jerzy Baranowski | no_new_dataset | false | 0.938239 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26377 | Verifying Intent and Harm: A Unified Defense Against LLM-Generated Threats | Large language models (LLMs) are increasingly deployed in interactive applications, yet they remain vulnerable to adversarial interactions that induce harmful, deceptive, or policy-violating outputs. Existing defenses typically analyze either user prompts or generated outputs, but not both. However, many real-world att... | cs.CR | 2026-06-26T00:00:00 | Poojitha Thota, Yun Lei, Santhosh Thangaraj, Siddhartha Reddy Jonnalagadda, Shirin Nilizadeh | no_new_dataset | false | 0.940374 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26379 | Layer-Specific Prompt Fusion Discovery via Differentiable Search in Vision Foundation Models | Visual prompt tuning has emerged as a parameter-efficient fine-tuning approach for adapting large-scale Vision Transformers (ViTs) to downstream tasks. As its learnable prompts are applied in input and feature spaces, prior to jointly going through attention in transformer layers, the most commonly used scheme for fusi... | cs.CV | 2026-06-26T00:00:00 | Xi Xiao, Xingjian Li, Yunbei Zhang, Cheng Han, Tianming Liu, Tianyang Wang, Runmin Jiang, Jihun Hamm, Xiao Wang, Min Xu | no_new_dataset | false | 0.962961 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26382 | Charting the Growth of Social-Physical HRI (spHRI): A Systematic Review Pipeline Augmented by Small Language Models | Social-physical human-robot interaction (spHRI) has grown rapidly across robotics, human-computer interaction, human-robot interaction, and haptics. Yet, fragmented terminology and inconsistent methodologies make systematic synthesis difficult. To support scalable review practices, we evaluated the extent to which smal... | cs.CL cs.AI cs.DL cs.HC cs.RO | 2026-06-26T00:00:00 | Mayumi Mohan, Ju-Hung Chen, and Alexis E. Block | no_new_dataset | false | 0.926668 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26398 | DinoLink: A Token-Centric Representation Compression Framework for Bandwidth-Constrained Collaborative V2X Perception | High-precision remote perception is often hindered by the severe bandwidth constraints of Vehicle-to-Everything (V2X) networks. We propose \textit{DinoLink}, a token-centric compression framework that replaces raw pixel streaming with discrete semantic communication for vehicle-cloud collaborative inference. DinoLink e... | cs.CV | 2026-06-26T00:00:00 | Tianle Zhu, Haohua Que, Handong Yao, Hongyi Xu and Zhipeng Bao | no_new_dataset | false | 0.952268 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26416 | Methane-Plume Segmentation From Hyperspectral Satellite Imagery Via Multimodal Deep Learning | Efficient detection of methane plumes is crucial for understanding and mitigating global warming, as accurately identifying and segmenting them in earth observation imagery remain essential for large-scale monitoring. In this work, we propose a multimodal deep learning model that integrates a feature-guided methane enh... | cs.CV | 2026-06-26T00:00:00 | Brayan Quintero, Jeferson Acevedo, Samuel Traslavi\~na, Hoover Rueda-Chac\'on | no_new_dataset | false | 0.954043 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26422 | Estimating Uncertainty in Classifier Performance with Applications to Large Language Models and Nested Data | Researchers increasingly use text classification--supervised models or large language models--to measure constructs from natural language, providing metrics such as recall and precision as evidence of their validity. Yet, though these metrics are point estimates subject to sampling variation, measures of uncertainty ar... | cs.AI | 2026-06-26T00:00:00 | Kylie Anglin | no_new_dataset | false | 0.951272 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26432 | Embedding Foundation Model Predictions in Discrete-Choice Models with Structural Guarantees | Tabular foundation models achieve strong accuracy on choice prediction tasks, but their predictions often violate the economic logic those tasks require: raising a price can increase predicted demand, implied willingness-to-pay estimates are frequently negative or implausible, and unavailable alternatives receive nonze... | cs.LG econ.EM | 2026-06-26T00:00:00 | Yingshuo Wang, Xian Sun, Yanhang Li, Zhichao Fan, Zexin Zhuang | no_new_dataset | false | 0.94731 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26442 | AXLE: A Cloud Infrastructure for Lean 4 Theorem Proving Utilities | We present AXLE (Axiom Lean Engine), a cloud service for Lean 4 proof manipulation, extraction, and verification. Recent progress in AI for mathematics -- reinforcement learning pipelines, agentic proving workflows, dataset curation -- demands Lean 4 tooling that scales to millions of requests while remaining correct a... | cs.LO cs.AI | 2026-06-26T00:00:00 | Jimmy Xin, Alex Schneidman, Chris Cummins, Karun Ram, Srihari Ganesh, Jannis Limperg | no_new_dataset | false | 0.887967 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26443 | WatchAct: A Benchmark for Behavior-Grounded Robot Manipulation | A robot working alongside people must reason about what they have done, in what order, and with what intent. Video carries the spatial layouts, object histories, and gestures that language leaves underspecified, yet today's manipulation benchmarks pair an instruction with a single current image, offering no way to eval... | cs.RO cs.AI cs.CV | 2026-06-26T00:00:00 | Baiqi Li, Ce Zhang, Yu Fang, Yue Yang, Shangzhe Li, Mingyu Ding, Gedas Bertasius | new_dataset | true | 0.972998 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26451 | Listening Like a Judge: A Music-Aware Framework for Automatic Singing Performance Evaluation | Automatic singing quality assessment (SQA) requires evaluating lyrical correctness and musical fidelity while handling expressive variations. However, existing systems largely rely on either acoustic cues or lyric transcriptions exclusively, limiting holistic performance evaluation. Furthermore, their integration is no... | cs.SD cs.LG | 2026-06-26T00:00:00 | Neelam Saini, Sourav Ghosh | no_new_dataset | false | 0.912851 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26452 | AnySimLite: A Lightweight Few-Shot Similarity Encoder for On-Device Speech-Adjacent Classification | To minimize privacy concerns and inference latency on edge devices like smartphones, lightweight on-device models remain important for end-user applications. Many of these applications involve natural language classification, but deploying multiple specialized models creates a memory footprint challenge. We investigate... | cs.CL cs.SD | 2026-06-26T00:00:00 | Sourav Ghosh, Yash Bhatia, Keshav Goyal, Sahil Singh Bagri, Mohamed Akram Ulla Shariff, Saravana Balaji Shanmugam | no_new_dataset | false | 0.937087 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26455 | Active Adversarial Perturbation-driven Associative Memory Retrieval for RGB-Event Visual Object Tracking | RGB-Event tracking improves localization robustness by fusing RGB appearance textures and dense temporal motion cues from event sensors. While this multi-modal scheme broadens tracking applicability, real-world scenes suffer diverse structured signal degradations that hinder traditional multi-modal fusion. In harsh env... | cs.CV cs.AI cs.LG | 2026-06-26T00:00:00 | Xiao Wang, Xufeng Lou, Zikang Yan, Lan Chen, Sibao Chen, Yaowei Wang, Yonghong Tian, Jin Tang | no_new_dataset | false | 0.951387 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
2606.26458 | MKG-RAG-Bench: Benchmarking Retrieval in Multimodal Knowledge Graph-Augmented Generation | Retrieval-augmented generation (RAG) over knowledge graphs has emerged as a promising approach for grounding large language models, yet existing benchmarks largely overlook the challenges of retrieval in multimodal knowledge graph RAG (MKG-RAG). In practice, retrieval is a critical bottleneck: multimodal knowledge is h... | cs.AI | 2026-06-26T00:00:00 | Xiaochen Wang, Bao Hoang, Han Liu, Ting Wang, Fenglong Ma | new_dataset | true | 0.971777 | 2026-06-30T02:08:35.223196 | librarian-bots/arxiv-new-datasets-modernbert-v4 | null | null |
Subsets and Splits
No community queries yet
The top public SQL queries from the community will appear here once available.