Datasets:

librarian-bots
/

arxiv-cs-papers-classified

Modalities:

Text

Formats:

Size:

Libraries:

Dataset card Data Studio Files Files and versions

xet

Community

Dataset Viewer

Auto-converted to Parquet Duplicate

Split (1)

train · 231k rows

id large_stringlengths 9 16	title large_stringlengths 5 245	abstract large_stringlengths 83 4.03k	categories large_stringlengths 5 108	update_date timestamp[ms]date 2007-05-23 00:00:00 2026-06-26 00:00:00	authors large_stringlengths 5 24.7k	classification_label large_stringclasses 2 values	is_new_dataset bool 2 classes	confidence_score float64 0.5 0.98	classification_date large_stringdate 2026-06-30 02:08:35 2026-06-30 02:08:35	model_version large_stringclasses 1 value	embedding large listlengths	embedding_model large_stringclasses 0 values
2310.11714	Consistent Distributed Ranking of Generative Models via Kernel Distances	Ranking generative models based on the fidelity and diversity of their outputs is required to identify the best generator in a group of candidate generative AI models. To rank a group of models in a conventional centralized setting, a standard score is commonly evaluated for each involved model. The selection and desig...	cs.LG	2026-06-26T00:00:00	Zixiao Wang, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu	no_new_dataset	false	0.965777	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2409.16395	HELIOT: LLM-Based CDSS for Adverse Drug Reaction Management	Medication errors significantly threaten patient safety, leading to adverse drug events and substantial economic burdens on healthcare systems. Clinical Decision Support Systems (CDSSs) aimed at mitigating these errors often face limitations when processing unstructured clinical data, including reliance on static datab...	cs.AI	2026-06-26T00:00:00	Gabriele De Vito, Filomena Ferrucci, Athanasios Angelakis	new_dataset	true	0.964089	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2412.09959	Towards Consistent and Efficient Dataset Distillation via Diffusion-Driven Selection	Dataset distillation provides an effective approach to reduce memory and computational costs by optimizing a compact dataset that achieves performance comparable to the full original. However, for large-scale datasets and complex deep networks (e.g., ImageNet-1K with ResNet-101), the vast optimization space hinders dis...	cs.CV	2026-06-26T00:00:00	Xinhao Zhong, Shuoyang Sun, Zhaoyang Xu, Xulin Gu, Bin Chen, Min Zhang, Yaowei Wang	no_new_dataset	false	0.961972	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2501.07526	Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization	Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where communication is more expensive than computation, the scalability and performance of th...	cs.DC stat.ML	2026-06-26T00:00:00	Aditya Devarakonda, Ramakrishnan Kannan	no_new_dataset	false	0.961576	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2501.13955	Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs?	This study explores the potential of Large Language Models (LLMs) to generate artificial surveys, with a focus on personal mobility preferences in Germany. By leveraging LLMs for synthetic data creation, we aim to address the limitations of traditional survey methods, such as high costs, inefficiency and scalability ch...	cs.CL cs.AI cs.CY	2026-06-26T00:00:00	Ioannis Tzachristas, Santhanakrishnan Narayanan and Constantinos Antoniou	no_new_dataset	false	0.615637	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2501.19274	GO: The Great Outdoors Multimodal Dataset	The Great Outdoors (GO) dataset is a multi-modal annotated data resource aimed at advancing ground robotics research in unstructured environments. Existing off-road datasets often lack sensor diversity and exclude vital modalities like thermal and radar that are critical for operation in degraded conditions (e.g., low ...	cs.RO	2026-06-26T00:00:00	Peng Jiang, Kasi Viswanath, Akhil Nagariya, George Chustz, Maggie Wigness, Philip Osteen, Timothy Overbye, Christian Ellis, Long Quang, Jia Huang, Srikanth Saripalli	new_dataset	true	0.970379	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2502.06890	LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison	The increasing volume of drug combinations in modern therapeutic regimens needs reliable methods for predicting drug-drug interactions (DDIs). While Large Language Models (LLMs) have revolutionized various domains, their potential in pharmaceutical research, particularly in DDI prediction, remains largely unexplored. T...	cs.LG cs.AI q-bio.QM	2026-06-26T00:00:00	Gabriele De Vito, Filomena Ferrucci, Athanasios Angelakis	no_new_dataset	false	0.950446	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2504.13432	Circular Quasiconformal Deturbulence: Geometry-Based Restoration from Multiple Turbulent Frames	Imaging through inhomogeneous media often results in severe distortions, posing significant challenges to downstream image-processing tasks. The lack of clean paired images makes supervised learning impractical, motivating unsupervised restoration approaches. In this work, we propose the Circular Quasi-Conformal Deturb...	cs.CV	2026-06-26T00:00:00	Chu Chen, Han Zhang, Lok Ming Lui	no_new_dataset	false	0.963591	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2505.20178	No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference	Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic \enquote{free lunch} for PPI++, an adaptive form of PPI, showing that the \textit{asymptotic} variance of PPI++ is always less than ...	stat.ML cs.LG	2026-06-26T00:00:00	Pranav Mani, Peng Xu, Zachary C. Lipton, Michael Oberst	no_new_dataset	false	0.963184	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2507.03122	Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings	This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline comb...	cs.IR cs.CL cs.LG	2026-06-26T00:00:00	Binbin Xu, G\'erard Dray	no_new_dataset	false	0.96394	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2508.08005	Learning to Select Maximum Clique Algorithms: From Traditional Machine Learning to a Dual-Channel Hybrid Neural Architecture	The Maximum Clique Problem (MCP) is an NP-hard problem with wide-ranging applications in fields such as bioinformatics, network science, and social computing, yet no single algorithm consistently outperforms all others across diverse graph instances. This underscores the critical need for instance-aware algorithm selec...	cs.LG cs.AI	2026-06-26T00:00:00	Xiang Li, Shanshan Wang, Chenglong Xiao	new_dataset	true	0.966797	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2508.17916	EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images	Depth estimation is a foundational component for 3D reconstruction in minimally invasive endoscopic surgeries. However, existing monocular depth estimation techniques often exhibit limited performance to the varying illumination and complex textures of the surgical environment. While applying foundation models offers a...	cs.CV	2026-06-26T00:00:00	Xinning Yao, Bo Liu, Bojian Li, Jingjing Wang, Jinghua Yue, Fugen Zhou	no_new_dataset	false	0.968252	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2508.21221	Uncertainty-Aware Ankle Exoskeleton Control	Lower limb exoskeletons show promise to assist human movement, but their utility is limited by controllers designed for discrete, predefined actions in controlled environments, restricting their real-world applicability. We present an uncertainty-aware control framework that enables ankle exoskeletons to operate safely...	cs.RO	2026-06-26T00:00:00	Fatima Mumtaza Tourk, Bishoy Galoaa, Sanat Shajan, Aaron J. Young, Michael Everett, Max K. Shepherd	no_new_dataset	false	0.943242	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2509.09960	Limited Reference, Reliable Generation: A Two-Component Framework for Tabular Data Generation in Low-Data Regimes	Synthetic tabular data generation is increasingly essential in machine learning, supporting downstream applications when real-world, high-quality tabular data is insufficient. Existing tabular generation approaches, such as generative adversarial networks (GANs) and fine-tuned Large Language Models (LLMs), typically re...	cs.LG cs.AI	2026-06-26T00:00:00	Mingxuan Jiang, Keyang Chen, Yongxin Wang, Yongsheng Zhao, Ziyue Dai, Yicun Liu, Zeping Li, Qiuyang Zhang, Hongyi Nie, Hongbin Zhu, Sen Liu, Guangnan Ye, and Hongfeng Chai	no_new_dataset	false	0.95743	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2510.00586	Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors	Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned documents for each target phrase. We introduce Eyes-on-Me, a modular attack that decomposes an adversarial document into reusable Attention Attractors and **Focus Regions...	cs.LG cs.CL cs.CR	2026-06-26T00:00:00	Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen	no_new_dataset	false	0.92902	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2510.02809	Relevance-Aware Thresholding in Online Conformal Prediction for Time Series	Uncertainty quantification has received considerable interest in recent works in Machine Learning. In particular, Conformal Prediction (CP) gains ground in this field. For the case of time series, Online Conformal Prediction (OCP) becomes an option to address the problem of data distribution shift over time. Indeed, th...	cs.LG cs.AI	2026-06-26T00:00:00	Th\'eo Dupuy and Binbin Xu and St\'ephane Perrey and Jacky Montmain and Abdelhak Imoussaten	no_new_dataset	false	0.970307	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2510.17459	Estimating Orbital Parameters of Direct Imaging Exoplanet Using Neural Network	In this work, we propose a flow-matching Markov chain Monte Carlo (FM-MCMC) algorithm for estimating the orbital parameters of exoplanetary systems, especially for those only one exoplanet is involved. Compared to traditional methods that rely on random sampling within the Bayesian framework, our approach first leverag...	astro-ph.EP astro-ph.GA cs.LG	2026-06-26T00:00:00	Bo Liang, Hanlin Song, Chang Liu, Tianyu Zhao, Yuxiang Xu, Zihao Xiao, Manjia Liang, Minghui Du, Wei-Liang Qian, Li-e Qiang, Peng Xu, Ziren Luo	no_new_dataset	false	0.966104	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2510.20769	CSU-PCAST: A Dual-Branch Transformer Framework for medium-range ensemble Precipitation Forecasting	Accurate medium-range precipitation forecasting is essential for hydrometeorological risk management but remains challenging for both numerical weather prediction (NWP) systems and data-driven models. We present CSU-PCAST, a deep learning-based ensemble forecasting framework for global precipitation prediction. The mod...	physics.ao-ph cs.LG	2026-06-26T00:00:00	Tianyi Xiong, Haonan Chen, Kelly Mahoney, Jingyin Tang, Tim Smith and Janice Bytheway	no_new_dataset	false	0.950369	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2511.18254	UniFlow: Zero-Shot LiDAR Scene Flow for Autonomous Vehicles	LiDAR scene flow is the task of estimating per-point 3D motion between consecutive point clouds. Recent methods achieve centimeter-level accuracy on popular autonomous vehicle (AV) datasets, but are typically only trained and evaluated on a single sensor. In this paper, we aim to learn general motion priors that transf...	cs.CV	2026-06-26T00:00:00	Siyi Li, Qingwen Zhang, Ishan Khatri, Kyle Vedder, Eric Eaton, Deva Ramanan, Neehar Peri	no_new_dataset	false	0.833481	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2512.02652	Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training	Existing methods for expressive music performance rendering, a conditional generation task that aims to generate a human-like performance from a symbolic score, rely on supervised learning over small labeled datasets, which limits scaling of both data volume and model size, despite the availability of vast unlabeled mu...	cs.SD cs.AI cs.MM	2026-06-26T00:00:00	Hong-Jie You, Jie-Jing Shao, Xiao-Wen Yang, Lin-Han Jia, Lan-Zhe Guo, Yu-Feng Li	no_new_dataset	false	0.864372	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2512.04890	Equivariant symmetry-aware head pose estimation for fetal MRI	We present E(3)-Pose, a novel fast pose estimation method that jointly and explicitly models rotation equivariance and object symmetry. Our work is motivated by the challenging problem of accounting for fetal head motion during a diagnostic MRI scan. We aim to enable automatic adaptive prescription of diagnostic 2D MRI...	cs.CV	2026-06-26T00:00:00	Ramya Muthukrishnan, Borjan Gagoski, Aryn Lee, P. Ellen Grant, Elfar Adalsteinsson, Benjamin Billot, Polina Golland	no_new_dataset	false	0.95187	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2512.06401	LLMCFG-TGen: Using LLM-Generated Control Flow Graphs to Automatically Create Test Cases from Use Cases	Appropriate test-case generation is critical in software testing and significantly impacts testing quality. Requirements-Based Test Generation (RBTG) derives test cases from software requirements to verify whether system behavior aligns with user needs and expectations. Requirements are often documented in Natural Lang...	cs.SE	2026-06-26T00:00:00	Zhenzhen Yang, Chenhui Cui, Tao Li, Rubing Huang, Nan Niu, Dave Towey, Shikai Guo	no_new_dataset	false	0.922819	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2601.01084	A UAV-Based Multispectral and RGB Dataset for Multi-Stage Paddy Crop Monitoring in Indian Agricultural Fields	We present a large-scale unmanned aerial vehicle (UAV)-based RGB and multispectral image dataset collected over paddy fields in the Vijayawada region, Andhra Pradesh, India, covering nursery to harvesting stages. We used a 20-megapixel RGB camera and a 5-megapixel four-band multispectral camera capturing red, green, re...	cs.CV eess.IV	2026-06-26T00:00:00	Adari Rama Sukanya, Puvvula Roopesh Naga Sri Sai, Bodduru Neshika, Rimalapudi Sarvendranath	new_dataset	true	0.965152	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2601.01701	Digital Twin-Driven Communication-Efficient Federated Anomaly Detection for Industrial IoT	Anomaly detection is increasingly becoming crucial for maintaining the safety, reliability, and efficiency of industrial systems. Recently, with the advent of digital twins and data-driven decision-making, several statistical and machine-learning methods have been proposed. However, these methods face several challenge...	cs.LG cs.AI	2026-06-26T00:00:00	Mohammed Ayalew Belay, Adil Rasheed, Pierluigi Salvo Rossi	no_new_dataset	false	0.965397	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2601.04390	SciFig: Towards Automating Editable Figure Generation for Scientific Papers	High-quality methodology figures are central to scientific communication, yet they remain difficult and time-consuming to create. Such figures must distill a method's components and information flow into a clear, revisable diagram as the paper evolves. Existing methodology diagram automation systems typically face a tr...	cs.AI	2026-06-26T00:00:00	Siyuan Huang, Yifan Zhou, Yutong Gao, Zi Yin, Juyang Bai, Xinxin Liu, Rama Chellappa, Chun Pong Lau, Cheng Peng, Sayan Nag, Shraman Pramanick	new_dataset	true	0.971241	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2601.12062	Learning Language-Driven Sequence-Level Modal-Invariant Representations for Video-Based Visible-Infrared Person Re-Identification	The core of video-based visible-infrared person re-identification (VVI-ReID) lies in learning sequence-level modal-invariant representations across different modalities. Recent research tends to use modality-shared language prompts generated by CLIP to guide the learning of modal-invariant representations. Despite achi...	cs.CV	2026-06-26T00:00:00	Xiaomei Yang, Antai Liu, Xizhan Gao, Fa Zhu, Sijie Niu, and Giancarlo Fortino	no_new_dataset	false	0.963188	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2601.13632	Resilient Routing: Risk-Aware Dynamic Routing in Smart Logistics via Spatiotemporal Graph Learning	With the rapid development of the e-commerce industry, the logistics network is experiencing unprecedented pressure. The traditional static routing strategy most time cannot tolerate the traffic congestion and fluctuating retail demand. In this paper, we propose a Risk-Aware Dynamic Routing(RADR) framework which integr...	cs.AI	2026-06-26T00:00:00	Zhiming Xue, Sichen Zhao, Yalun Qi, Xianling Zeng, Zihan Yu	no_new_dataset	false	0.964021	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2602.13939	Adaptive Automatic Model Selection for Demand Forecasting under Heterogeneous Demand Patterns	Demand forecasting is critical for inventory planning, procurement, replenishment, production, and capacity decisions in heterogeneous supply chains. However, selecting the most appropriate model for each demand series remains challenging because performance varies across datasets, demand structures, horizons, and eval...	cs.LG cs.AI	2026-06-26T00:00:00	Adolfo Gonz\'alez, V\'ictor Parada	no_new_dataset	false	0.968194	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2602.16220	SEMixer: Semantics Enhanced MLP-Mixer for Multiscale Mixing and Long-term Time Series Forecasting	Modeling multiscale patterns is crucial for long-term time series forecasting (TSF). However, redundancy and noise in time series, together with semantic gaps between non-adjacent scales, make the efficient alignment and integration of multi-scale temporal dependencies challenging. To address this, we propose SEMixer, ...	cs.LG	2026-06-26T00:00:00	Xu Zhang, Qitong Wang, Peng Wang, Wei Wang	no_new_dataset	false	0.967578	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2602.18446	ReportLogic: Evaluating Logical Quality in Deep Research Reports	Users increasingly rely on Large Language Models (LLMs) for Deep Research, using them to synthesize diverse sources into structured reports that support understanding and action. In this context, the practical reliability of such reports hinges on logical quality: whether the report's claims and arguments are explicitl...	cs.CL cs.AI	2026-06-26T00:00:00	Jujia Zhao, Zhaoxin Huan, Zihan Wang, Xiaolu Zhang, Jun Zhou, Suzan Verberne, and Zhaochun Ren	new_dataset	true	0.976244	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2602.18900	PrivacyBench: Privacy Isn't Free in Hybrid Privacy-Preserving Vision Systems	Privacy preserving machine learning deployments in sensitive deep learning applications; from medical imaging to autonomous systems; increasingly require combining multiple techniques. Yet, practitioners lack systematic guidance to assess the synergistic and non-additive interactions of these hybrid configurations, rel...	cs.CR cs.CV	2026-06-26T00:00:00	Nnaemeka Obiefuna and Samuel Oyeneye and Similoluwa Odunaiya and Iremide Oyelaja and Steven Kolawole	no_new_dataset	false	0.927117	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2603.01195	VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning	The effectiveness of multimodal instruction tuning depends not only on dataset scale, but critically on whether training samples genuinely require visual reasoning. However, existing instruction datasets often contain a substantial portion of visually redundant samples (solvable from text alone), as well as multimodall...	cs.CV cs.AI	2026-06-26T00:00:00	Mingkang Dong, Hongyi Cai, Jie Li, Sifan Zhou, Bin Ren, Kunyu Peng, Yuqian Fu	no_new_dataset	false	0.769431	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2603.01461	UltraStar: Semantic-Aware Star Graph Modeling for Echocardiography Navigation	Echocardiography is critical for diagnosing cardiovascular diseases, yet the shortage of skilled sonographers hinders timely patient care, due to high operational difficulties. Consequently, research on automated probe navigation has significant clinical potential. To achieve robust navigation, it is essential to lever...	cs.CV	2026-06-26T00:00:00	Teng Wang, Haojun Jiang, Chenxi Li, Diwen Wang, Yihang Tang, Zhenguo Sun, Yujiao Deng, Shiji Song, Gao Huang	no_new_dataset	false	0.725539	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2603.24991	Towards Video Anomaly Detection from Event Streams: A Baseline and Benchmark Datasets	Event-based vision, characterized by low redundancy, focus on dynamic motion, and inherent privacy-preserving properties, naturally fits the demands of video anomaly detection (VAD). However, the absence of dedicated event-stream anomaly detection datasets and effective modeling strategies has significantly hindered pr...	cs.CV	2026-06-26T00:00:00	Peng Wu, Yuting Yan, Guansong Pang, Yujia Sun, Qingsen Yan, Peng Wang, Yanning Zhang	new_dataset	true	0.963291	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2604.05920	Reference Energies for Non-Relativistic Core Ionization Potentials	Deep-lying core electrons carry highly localized, site-specific information that forms the basis of X-ray photoelectron spectroscopy. Accurately predicting their associated core ionization potentials (IPs) is a demanding theoretical task, requiring a balanced treatment of strong orbital relaxation, electron correlation...	physics.chem-ph cond-mat.mtrl-sci nucl-th	2026-06-26T00:00:00	Antoine Marie and Loris Burth and Pierre-Fran\c{c}ois Loos	new_dataset	true	0.95753	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2604.08448	AfriVoices-KE: A Multilingual Speech Dataset for Kenyan Languages	AfriVoices-KE is a large-scale multilingual speech dataset comprising approximately 3,000 hours of audio across five Kenyan languages: Dholuo, Kikuyu, Kalenjin, Maasai, and Somali. The dataset includes 750 hours of scripted speech and 2,250 hours of spontaneous speech, collected from 4,777 native speakers across divers...	cs.CL	2026-06-26T00:00:00	Lilian Wanzare, Cynthia Amol, Ezekiel Maina, Nelson Odhiambo, Hope Kerubo, Leila Misula, Vivian Oloo, Rennish Mboya, Edwin Onkoba, Edward Ombui, Joseph Muguro, Ciira wa Maina, Andrew Kipkebut, Alfred Omondi Otom, Ian Ndung'u Kang'ethe, Angela Wambui Kanyi, Brian Gichana Omwenga	new_dataset	true	0.969665	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2604.17420	TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering	Money laundering poses severe risks to global financial systems, driving the widespread adoption of machine learning for transaction monitoring. However, progress remains stifled by the lack of realistic benchmarks. Existing transaction-graph datasets suffer from two pervasive limitations: (i) they provide sparse node-...	cs.LG cs.AI cs.SI	2026-06-26T00:00:00	Keyang Chen, Mingxuan Jiang, Yongsheng Zhao, Zeping Li, Zaiyuan Chen, Weiqi Luo, Zhixin Li, Sen Liu, Yinan Jing, Guangnan Ye, Xihong Wu, Hongfeng Chai	new_dataset	true	0.969492	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2605.03680	Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs	While deep-learning-based image restoration has achieved unprecedented fidelity, deployment on mobile Neural Processing Units (NPUs) remains bottlenecked by operator incompatibility and memory-access overhead. We propose an NPU-aware hardware-algorithm co-design approach for real-world image denoising on mobile NPUs. O...	cs.CV cs.LG	2026-06-26T00:00:00	Faraz Kayani, Sarmad Kayani, Asad Ahmed, Radu Timofte, and Dmitry Ignatov	new_dataset	true	0.945709	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2605.13693	StayStill: a large-scale 3D idle animation dataset	Idle animations are essential for virtual characters, as they convey realistic behaviour during inactive states. While automatic animation generation has been widely studied, limited attention has been given to idle motion due to the absence of dedicated training datasets. We introduce StayStill, a large-scale dataset ...	cs.GR	2026-06-26T00:00:00	Eneko Atxa Landa, Igor Rodriguez, Elena Lazkano, Taras Kucherenko	new_dataset	true	0.973324	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2605.24417	LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots	Supervised classification on tabular data remains a central machine learning task, but its dependence on large labeled datasets limits its applicability in data-scarce settings. Few-shot methods such as TabPFN achieve strong performance through large-scale synthetic pretraining, yet still require labeled context exampl...	cs.LG	2026-06-26T00:00:00	Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov	new_dataset	true	0.965517	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2605.24696	CALIBURN: Operationally Calibrated Streaming Intrusion Detection with Regime-Dependent Conformal Risk Control	Streaming intrusion detection systems must process flows continuously under bounded memory, yet most leave alerting-threshold selection as a post-hoc tuning problem incompatible with production, where operators commit in advance to alert budgets, misclassification costs, and Service Level Objectives. We present CALIBUR...	cs.CR cs.LG	2026-06-26T00:00:00	Michel A. Youssef	no_new_dataset	false	0.868595	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.00827	Beyond Independent Manipulation: Individual Fairness-aware Strategic Classification with Peer Imitation	Strategic classification (SC) investigates scenarios where agents manipulate their features to obtain favorable decisions from predictive models. Existing fairness-aware SC approaches primarily focus on group fairness and typically assume that agents respond independently. However, when individual fairness is required,...	cs.LG cs.AI	2026-06-26T00:00:00	Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu, Jinxuan Yang, Yuanlong Chen, Wangrong Huang, Shaowu Yang, Wenjing Yang, Xinwang Liu, Peng Cui, Haotian Wang	no_new_dataset	false	0.956471	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.03549	How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration	Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the ...	cs.LG math.PR	2026-06-26T00:00:00	Vadim Porvatov, Andrey Dukhovny, Andrey Lange	no_new_dataset	false	0.963651	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.06408	MODIS Thermal Infrared Sounding (MOTIS): Estimating Tropical Cyclone Central Pressure from Warm-Core Anomalies	This study presents a novel framework for estimating the central sea-level pressure ($P_\mathrm{c}$) of tropical cyclones (TCs) using infrared radiometers. We leverage the long-overlooked combination of high spatial resolution and sounding capability of the Moderate Resolution Imaging Spectroradiometer (MODIS) to measu...	physics.ao-ph	2026-06-26T00:00:00	Jinghuai Yao, Chi Yan Kwok, Puyuan Du, Yubo Wang, and Derrick Herndon	new_dataset	true	0.958641	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.12716	Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review	The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significan...	cs.CL	2026-06-26T00:00:00	Xinyu Zhao, Rana Muhammad Shahroz Khan, Zhen Xu, Zhen Tan, Tianlong Chen	new_dataset	true	0.976679	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.13042	Augmentation techniques for video surveillance in the visible and thermal spectral range	In intelligent video surveillance, cameras record image sequences during day and night. Commonly, this demands different sensors. To achieve a better performance it is not unusual to combine them. We focus on the case that a long-wave infrared camera records continuously and in addition to this, another camera records ...	cs.AI cs.CV	2026-06-26T00:00:00	Vanessa Buhrmester, Ann-Kristin Grosselfinger, David Munch, and Michael Arens	no_new_dataset	false	0.876493	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.14668	When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing	Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central desig...	cs.LG	2026-06-26T00:00:00	Baijia Zhang, Yining Huang	no_new_dataset	false	0.961054	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.16325	Attention-Based Prototype Calibration for Multi-Rater Few-Shot Medical Image Segmentation	Few-shot medical image segmentation methods typically assume a single ground-truth annotation, overlooking systematic variability across expert raters commonly observed in clinical datasets. We propose an attention-based prototype calibration framework for few-shot multi-rater segmentation that models rater-specific de...	cs.CV	2026-06-26T00:00:00	Truong Vu, Minh Khoi Ho, Yutong Xie	no_new_dataset	false	0.959122	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.21097	GRAG: Generic Response-Augmented Generation Framework for Personalized Conversational Systems	Deploying highly capable personalized conversational agents in resource-constrained or privacy-sensitive environments remains a significant challenge. We identify a fundamental bottleneck in the existing approaches: current training paradigms treat personalization and grounding as a single monolithic learning problem. ...	cs.CL cs.LG	2026-06-26T00:00:00	Junfeng Liu, Christopher T. Symons, Ranga Raju Vatsavai	no_new_dataset	false	0.954078	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.21649	EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory	Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for retrieval. It is tailored for long-context scenarios, where information...	cs.CL	2026-06-26T00:00:00	Chang Nie, Chaoyou Fu, Junlan Feng, Caifeng Shan	new_dataset	true	0.970956	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.22076	Learning Cross-View Semantic Priors for Single-Reference Unseen Object Pose Estimation	Single-reference unseen object 6D pose estimation reduces object onboarding by estimating poses of arbitrary novel objects from only one reference view. Recent correspondence-based pipelines have achieved robust performance with vision foundation model (VFM) features. However, they typically treat these features as int...	cs.CV	2026-06-26T00:00:00	Jiahong Chen, Jinghao Wang, Ziwen Wang, Zi Wang, Banglei Guan and Qifeng Yu	no_new_dataset	false	0.944891	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.22537	NegAS: Negative Label Guided Attention and Scoring for Out-of-Distribution Object Detection with Vision-Language Models	Out-of-Distribution (OOD) detection is essential for ensuring the robustness and reliability of object detection systems deployed in safety-critical applications. While prior research has mainly focused on uni-modal detectors or vision-language model (VLM) based classifiers, the potential of VLM-based object detectors ...	cs.CV	2026-06-26T00:00:00	Yingjie Zhang, Shuai Li, Peng Wang	no_new_dataset	false	0.952479	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.24890	Small edits, large models: How Wikipedia advocacy shapes LLM values	Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text. The Pro-Animal Wikipedians (PAW), a group of advocates who add source...	cs.CL cs.AI cs.CY	2026-06-26T00:00:00	Jasmine Brazilek, Maria Navas, Alexa Gnauck	no_new_dataset	false	0.891778	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.25006	Scalable Peptide Design via Memory-Efficient Equivariant Transformer	Target-specific peptide design requires sequence and structure co-design under full atom geometric constraints. Latent generative frameworks offer an effective route for this problem by compressing fine grained atomic structures into block level latent representations and performing conditional generation in a compact ...	cs.LG	2026-06-26T00:00:00	Rui Jiao, Xiangzhe Kong, Yinjun Jia, Yijia Zhang, Ziyi Yang, Yang Liu and Jianzhu Ma	no_new_dataset	false	0.945915	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.25832	MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources	Achieving strong optimization generalization across diverse optimization problems while requiring limited training resources remains a challenging problem for optimization-oriented large language models (LLMs). Existing approaches typically rely on large-scale supervised datasets, costly reasoning annotations, and expe...	cs.LG cs.AI	2026-06-26T00:00:00	Ke Zhao, Zixiang Di, Hong Qian, Xiang Shu, Yaolin Wen, Qitao Shi, Bingdong Li, Xingyu Lu, Xiangfeng Wang, Jun Zhou, Ke Tang, Yang Yu	no_new_dataset	false	0.952707	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.25996	Autodata: An agentic data scientist to create high quality synthetic data	We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical im...	cs.AI cs.CL cs.LG	2026-06-26T00:00:00	Ilia Kulikov, Chenxi Whitehouse, Tianhao Wu, Yixin Nie, Swarnadeep Saha, Eryk Helenowski, Weizhe Yuan, Olga Golovneva, Jack Lanchantin, Yoram Bachrach, Jakob Foerster, Xian Li, Han Fang, Sainbayar Sukhbaatar, Jason Weston	no_new_dataset	false	0.905912	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26099	Benchmarking Open-Weight Foundation Models for Global AI Technical Governance	Large language models (LLMs) are increasingly deployed in artificial intelligence (AI) governance analysis across national and international organisations. There is, however, growing evidence that such models produce significantly less accurate responses for countries that are underrepresented in their training data-a ...	cs.CY cs.AI	2026-06-26T00:00:00	Jason Hung	new_dataset	true	0.955401	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26101	Know2Guess: A Contamination-Aware Multi-Zone Benchmark for Knowledge-Boundary Evaluation in Large Language Models	Reliable evaluation of large language models should separate supported answering from unsupported guessing without conflating either with data contamination, prompt idiosyncrasy, or generic refusal behavior. We present a contamination-aware, multi-zone benchmark for measuring the transition from answerable knowledge to...	cs.CL cs.AI	2026-06-26T00:00:00	Renwei Meng, Bowen Zhang, Jian Wang, Xican Wang, Haoyi Wu, Xuanyan Qiu, and Shengan Yang	new_dataset	true	0.970531	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26102	Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training	Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate whether the domain of post-training data differentially affects the retention of animal ...	cs.CL cs.AI cs.CY	2026-06-26T00:00:00	Jasmine Brazilek, Juliana Seawell	no_new_dataset	false	0.805479	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26103	Investigating LLM's Problem Solving Capability -- a Study on Statics Questions	Large Language Models (LLMs) have rapidly influenced many aspects of society, particularly education, due to their demonstrated ability to complete assignments and examinations across a wide range of subjects. Although prior studies have examined the educational impact of LLMs, much of the existing work relies on publi...	cs.CL cs.AI	2026-06-26T00:00:00	Tanner Culleton and Hung-Fu Chang	new_dataset	true	0.95255	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26107	Low Resource Multimodal Translation of Nepali Spoken Words into Emotion-Conditioned Sign Language Avatars	Sign language communication systems, that integrate emotional expression remain underexplored, particularly for low-resource languages. This pilot study presents NEST-V1 (Nepali Emotion and Speech Transformer - Version 1), a proof-of-concept multimodal framework that demonstrates the feasibility of generating emotion-c...	cs.CL cs.AI	2026-06-26T00:00:00	Jatin Bhusal and Salma Tamang	no_new_dataset	false	0.829837	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26108	Where Larger Models Excel: The Primacy of Constraint-Guided Reasoning	Larger language models consistently outperform smaller ones on reasoning benchmarks, yet the reasoning differences underlying this gap remain underexplored. Across benchmarks in mathematics, physics, chemistry, and programming, we observe stable performance gaps: averaged over datasets, Qwen3-32B outperforms Qwen3-8B b...	cs.CL	2026-06-26T00:00:00	Guan-Yi Lin, Hen-Hsen Huang	no_new_dataset	false	0.944968	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26130	Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods	Large Language Models (LLMs) are increasingly used to guide research methodology, yet their default methodological tendencies under minimal prompting remain unclear. Here, we prompt GPT-5.1, Gemini 3 Pro, and DeepSeek-V3.2 with an LLM-extracted research question from each of 1,000 recent arXiv computer-science papers a...	cs.CL cs.AI cs.DL	2026-06-26T00:00:00	Francesca Carlon, Brecht Verbeken, Vincent Ginis, Andres Algaba	no_new_dataset	false	0.817338	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26151	Unsupervised Memory-Enhanced Video Transformers: Obstacle Detection for Autonomous Agricultural Rover	While autonomous rovers have become indispensable to precision farming, achieving consistent operational safety remains a critical challenge. Conventional safety sensors, such as LiDAR, fail to detect obstacles positioned below the plant canopy, posing a significant risk. While camera-based supervised learning methods ...	cs.RO cs.AI	2026-06-26T00:00:00	Th\'eo Biardeau (XLIM-ASALI, UFR SFA (Poitiers)), Anne-Sophie Capelle-Laiz\'e (UP, XLIM-ASALI, XLIM-ASALI), Salwan Alwan, David Helbert (UFR SFA (Poitiers), XLIM-ASALI, LabCom I3M (Poitiers))	no_new_dataset	false	0.890574	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26165	Predicting Fruit Quality with a Hybrid Machine Learning and Image Processing Approach	Fruit spoilage is a significant issue in agriculture, leading to substantial economic losses. Addressing this, our study introduces a hybrid approach combining image processing and deep learning to assess fruit freshness. We developed an image processing algorithm that quantifies spoilage on a scale from 0 (fully fresh...	cs.CV	2026-06-26T00:00:00	Amir Reza Hashemi, Shahram Amiri	no_new_dataset	false	0.944468	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26168	Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration	Living systems navigate environments using noisy and incomplete sensory signals. In unicellular algae, phototaxis is often modeled as a mechanistic run--tumble process driven by stimulus--response rules. However, such descriptions overlook how organisms actively sample their environment to reduce sensory ambiguity. Fro...	cs.LG q-bio.QM	2026-06-26T00:00:00	Ruyi Tang, Gr\'egoire Sergeant-Perthuis (LCQB-AG), David Colliaux	no_new_dataset	false	0.94643	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26169	Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis	Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in manual design. This paper provides a comprehensive review of NAS methods applied to GANs, ...	cs.LG cs.AI	2026-06-26T00:00:00	Abrar Alotaibi, Moataz Ahmed	no_new_dataset	false	0.976102	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26171	LCG: Long-Context Consistent Image Generation with Sparse Relational Attention	Recent image generation models achieve impressive quality in single-image synthesis, but often fail to maintain consistency across sequential outputs, as required in comics, storyboards, and visual narratives. We propose Long-Context Generation (LCG), a framework for long-context multi-image text-to-image generation, t...	cs.CV cs.AI	2026-06-26T00:00:00	Zihao Wang, Yijia Xu, Haoze Zheng, Xuran Ma, Haokun Gui, and Harry Yang	new_dataset	true	0.977465	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26176	Toward Mitigating Process-Induced Performance Degradation in 3.5D Heterogeneous Packages via Pre-Silicon Firmware Co-Optimization	This paper presents a pre-silicon analysis of XRM-SSD V24/V7.0, a physics-aware predictive firmware scheduling layer for Intel's 3.5D heterogeneous integrated packages (Foveros Direct 3D + PowerVia + EMIB-T + UCIe + HBM5). Using detailed thermal-electrical co-simulation over a 90,000-step LLM inference dataset, we show...	cs.AR	2026-06-26T00:00:00	Chi Fei Chung (Dollarchip Technology Inc.), Nikolai Nedovodin (STARGA Inc.)	no_new_dataset	false	0.805611	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26179	KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction	While WGS-based AMR prediction has reached high accuracy, existing models lack a mechanism to ground neural attributions in established biological pathways. We present KG-TRACE, a novel neuro-symbolic framework that integrates the WHO mutation knowledge graph (KG) as a structured biological constraint on a neural genom...	cs.LG cs.AI q-bio.QM	2026-06-26T00:00:00	Naman Garg, Sarika Jain, Sourav Yadav, Bharat K. Bhargava, Ghanapriya Singh, Abhishek Srivastava, Parimal Kar	no_new_dataset	false	0.739974	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26192	Federated Hash Projected Latent Factor Learning	Hash Learning (HL) is an efficient representation learning approach that maps real-valued data into compact binary representations. Traditional HL methods typically require users to upload personal data to a central server, which is incompatible with increasingly stringent data security regulations. Federated Learning ...	cs.LG cs.CR	2026-06-26T00:00:00	Jialan He	no_new_dataset	false	0.969032	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26195	Soroll-IA: A Weakly Labeled Audio Dataset for Real-World Industrial Port Monitoring	Soroll-IA is a weakly labeled environmental audio dataset recorded in a real-world industrial port environment in Valencia (Spain) using two fixed sensing nodes. The dataset comprises approximately 22 hours of audio segmented into 7,396 clips and covers 26 sound event classes representative of industrial port acoustic ...	cs.SD	2026-06-26T00:00:00	Javier Naranjo-Alcazar, Jordi Grau-Haro, Ruben Ribes-Serrano, Marta Garcia-Ballesteros, Pedro Zuccarello	new_dataset	true	0.969195	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26201	OmniContact: Chaining Meta-Skills via Contact Flow for Generalizable Humanoid Loco-Manipulation	Learning long-horizon humanoid loco-manipulation poses a dual challenge: it requires not only the robust execution of meta-skills but also their seamless, closed-loop chaining equipped with autonomous recovery. Existing approaches remain limited: explicit humanoid-object interaction representations offer precision but ...	cs.RO	2026-06-26T00:00:00	Runyi Yu, Xiaoyi Lin, Ji Ma, Yinhuai Wang, Koukou Luo, Jiahao Ji, Huayi Wang, Wenjia Wang, Runhan Zhang, Ping Tan, Ting Wu, Ruoli Dai, Qifeng Chen, and Lei Han	new_dataset	true	0.968443	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26204	Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery	Floods frequently impact regions around the world. Rapid and accurate flood detection is crucial for emergency response and timely mitigation of human and economic loss. The expanding availability of satellite data and advances in artificial intelligence have enhanced monitoring of environmental hazards, but many flood...	cs.LG	2026-06-26T00:00:00	Sophia Li, Max Zhao, Raghu G. Raj, Tianyu Chen	new_dataset	true	0.951033	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26207	The Role of Input Dimensionality in the Emergence and Targeted Control of Adversarial Examples	Several theoretical works have tried to explain the adversarial vulnerability of deep neural networks through properties of high-dimensional geometry. However, the assumptions underlying these works are rarely examined empirically, and systematic evidence remains limited. In this work, we present a systematic study of ...	stat.ML cs.CR cs.LG	2026-06-26T00:00:00	Nasrin Malekzadeh Goradel, Niccolo Pancino, Yaser Gholizade Atani, Benedetta Tondi, Giovanni Bellettini, Mauro Barni	no_new_dataset	false	0.955921	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26211	Data Facts: A Metadata Schema for Structured Data Exchange in the NANDini Multi-Agent Ecosystem	NANDini (Networked Agents Natural Distillation of Interconnected Nodal Intelligence) envisions an automated ecosystem where intelligent agents independently create, process, and exchange data to drive decisions at scale. Realizing this vision requires infrastructure beyond agent discovery and communication: agents must...	cs.CR	2026-06-26T00:00:00	Jin Gao, Maria Gorskikh, Pradyumna Chari, Brittany Box, Mukul Kemla, Pratik Behera, Abhishek Mehta, Ramesh Raskar	new_dataset	true	0.852629	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26257	Dataset Usage Inference without Shadow Models or Held-out Data	How much of my data was used to train a machine learning model? Dataset Usage Inference (DUI) aims to answer this by estimating what fraction of a dataset contributed to a model's training. However, existing DUI methods rely on assumptions that rarely hold in practice: they require training expensive shadow models to i...	cs.LG	2026-06-26T00:00:00	Wojciech {\L}apacz, Stanis{\l}aw Pawlak, Jan Dubi\'nski, Franziska Boenisch, and Adam Dziedzic	no_new_dataset	false	0.950233	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26260	A multi-task spatiotemporal deep neural network for predicting penetration depth and morphology in laser welding	In laser penetration welding, the assessment of penetration state and weld seam morphology plays a crucial role in determining the weld quality. This paper presents a comprehensive introduction of the innovative muti-task deep learning model that has the capability to predict penetration state, depth, and weld seam mor...	cs.CV cs.AI	2026-06-26T00:00:00	Sen Li, Haichao Cui, Chendong Shao, Yaqi Wang, Xinhua Tang	new_dataset	true	0.890045	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26285	TEMPO-Diffusion: Temporally Exposed Malicious Poisoning of Diffusion Models	Noise-based backdoor attacks on diffusion models typically rely on input-time trigger injection, untargeted activation, and out-of-distribution target generation. Such assumptions reduce both the stealthiness and the practical relevance of these attacks. In this work, we present TEMPO-Diffusion, a targeted backdoor fra...	cs.CR cs.AI	2026-06-26T00:00:00	William Aiken, Paula Branco, Guy-Vincent Jourdan, Iosif-Viorel Onut	new_dataset	true	0.969982	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26289	Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption	AI coding agents are penetrating open-source software development at an unprecedented pace, yet existing research predominantly treats human contributors as a static backdrop rather than as the subject of inquiry. This paper presents the first large-scale empirical study that takes the human contributor ecosystem as it...	cs.SE	2026-06-26T00:00:00	Weixing Zhang, Bowen Jiang, Anne Koziolek	no_new_dataset	false	0.572945	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26294	The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators	Self-improving agents are state-of-the-art (SOTA) on agentic coding benchmarks and have recently been extended to general domains. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier, benchmark, or labeled dataset that remains valid as the agent improves. This ignores a ce...	cs.LG cs.AI cs.MA cs.NE	2026-06-26T00:00:00	Alex Iacob, Andrej Jovanovi\'c, William F. Shen, Daniel Burkhardt, Meghdad Kurmanji, Nurbek Tastan, Lorenzo Sani, Niccol\`o Alberto Elia Venanzi, Ambroise Odonnat, Zeyu Cao, Bill Marino, Xinchi Qiu, and Nicholas D. Lane	no_new_dataset	false	0.928897	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26295	Beyond Aesthetics: Quantifying Information Loss in Turbid Scenes	Visibility in underwater environments degrades rapidly under turbid conditions, yet the effects on computer-vision models remain unclear. This issue is compounded by reliance on synthetic turbidity datasets, which may misrepresent real-world information loss. To address this gap, we introduce the Turbid Underwater Base...	cs.CV	2026-06-26T00:00:00	Vasiliki Ismiroglou, Stefan H. Bengtson, Tasos Benos, Thomas B. Moeslund, Malte Pedersen	new_dataset	true	0.968034	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26312	Tailor Made Embeddings for Quantum Machine Learning	Autoencoders transformed classical machine learning by solving the curse of dimensionality, enabling principled weight initialization and learning compact, structured representations. In this work, we extend this paradigm to quantum machine learning by introducing a variational autoencoder framework that learns task-sp...	quant-ph cs.CV cs.LG	2026-06-26T00:00:00	Aldo Lamarre and Dominik \v{S}afr\'anek	no_new_dataset	false	0.951024	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26317	Parametric Generalized Adaptive Moment Features (PG-AMF) for Bearing Fault Diagnosis and Machine Health Monitoring	Accurate fault diagnosis of rolling element bearings in rotating machinery is considered essential for ensuring industrial safety and enabling predictive maintenance. Conventional statistical feature-based methods rely on predefined descriptors, whose diagnostic sensitivity is constrained by fixed configurations and li...	eess.SP cs.AI	2026-06-26T00:00:00	Rajeev Kumar	no_new_dataset	false	0.948911	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26337	EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening	Gradient Boosted Decision Trees (GBDT), exemplified by LightGBM, spend a dominant fraction of training time -- typically 65-70% -- constructing per-feature histograms. Existing approaches such as random feature subsampling (feature_fraction) discard features without regard for their predictive utility. We propose EMA-b...	cs.LG	2026-06-26T00:00:00	Yan Song	no_new_dataset	false	0.954801	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26347	Health feature extraction from battery energy storage system field fault data	Health monitoring methods are critical for lithium-ion battery modules connected to the grid to prevent faults that can lead to catastrophic events. However, assessing the health of cells in modules from their operational data presents challenges including variable operating conditions, which directly confound health f...	eess.SY cs.SY	2026-06-26T00:00:00	Clement Wong, Andrew Weng, Xin Hui Ooi, Zhiwen Wan, Jeesoon Choi, Seung Yoon Yang, Heejun Jin, Jason Siegel, Anna Stefanopoulou	new_dataset	true	0.612508	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26363	Bayesian Changepoint Detection for Smart Sensing of Battery Degradation: Cycle-Level Health Indicators and PyMC Implementation	Reliable detection of the onset of accelerated degradation is central to safe and cost-efficient operation of lithium-ion batteries. This paper presents a Bayesian single-changepoint model applied to a simple but physically meaningful cycle-level health indicator (HI), defined as the ratio of charge time to discharge t...	eess.SY cs.SY	2026-06-26T00:00:00	Waldemar Bauer, Anna Jarosz-Kozyro and Jerzy Baranowski	no_new_dataset	false	0.938239	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26377	Verifying Intent and Harm: A Unified Defense Against LLM-Generated Threats	Large language models (LLMs) are increasingly deployed in interactive applications, yet they remain vulnerable to adversarial interactions that induce harmful, deceptive, or policy-violating outputs. Existing defenses typically analyze either user prompts or generated outputs, but not both. However, many real-world att...	cs.CR	2026-06-26T00:00:00	Poojitha Thota, Yun Lei, Santhosh Thangaraj, Siddhartha Reddy Jonnalagadda, Shirin Nilizadeh	no_new_dataset	false	0.940374	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26379	Layer-Specific Prompt Fusion Discovery via Differentiable Search in Vision Foundation Models	Visual prompt tuning has emerged as a parameter-efficient fine-tuning approach for adapting large-scale Vision Transformers (ViTs) to downstream tasks. As its learnable prompts are applied in input and feature spaces, prior to jointly going through attention in transformer layers, the most commonly used scheme for fusi...	cs.CV	2026-06-26T00:00:00	Xi Xiao, Xingjian Li, Yunbei Zhang, Cheng Han, Tianming Liu, Tianyang Wang, Runmin Jiang, Jihun Hamm, Xiao Wang, Min Xu	no_new_dataset	false	0.962961	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26382	Charting the Growth of Social-Physical HRI (spHRI): A Systematic Review Pipeline Augmented by Small Language Models	Social-physical human-robot interaction (spHRI) has grown rapidly across robotics, human-computer interaction, human-robot interaction, and haptics. Yet, fragmented terminology and inconsistent methodologies make systematic synthesis difficult. To support scalable review practices, we evaluated the extent to which smal...	cs.CL cs.AI cs.DL cs.HC cs.RO	2026-06-26T00:00:00	Mayumi Mohan, Ju-Hung Chen, and Alexis E. Block	no_new_dataset	false	0.926668	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26398	DinoLink: A Token-Centric Representation Compression Framework for Bandwidth-Constrained Collaborative V2X Perception	High-precision remote perception is often hindered by the severe bandwidth constraints of Vehicle-to-Everything (V2X) networks. We propose \textit{DinoLink}, a token-centric compression framework that replaces raw pixel streaming with discrete semantic communication for vehicle-cloud collaborative inference. DinoLink e...	cs.CV	2026-06-26T00:00:00	Tianle Zhu, Haohua Que, Handong Yao, Hongyi Xu and Zhipeng Bao	no_new_dataset	false	0.952268	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26416	Methane-Plume Segmentation From Hyperspectral Satellite Imagery Via Multimodal Deep Learning	Efficient detection of methane plumes is crucial for understanding and mitigating global warming, as accurately identifying and segmenting them in earth observation imagery remain essential for large-scale monitoring. In this work, we propose a multimodal deep learning model that integrates a feature-guided methane enh...	cs.CV	2026-06-26T00:00:00	Brayan Quintero, Jeferson Acevedo, Samuel Traslavi\~na, Hoover Rueda-Chac\'on	no_new_dataset	false	0.954043	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26422	Estimating Uncertainty in Classifier Performance with Applications to Large Language Models and Nested Data	Researchers increasingly use text classification--supervised models or large language models--to measure constructs from natural language, providing metrics such as recall and precision as evidence of their validity. Yet, though these metrics are point estimates subject to sampling variation, measures of uncertainty ar...	cs.AI	2026-06-26T00:00:00	Kylie Anglin	no_new_dataset	false	0.951272	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26432	Embedding Foundation Model Predictions in Discrete-Choice Models with Structural Guarantees	Tabular foundation models achieve strong accuracy on choice prediction tasks, but their predictions often violate the economic logic those tasks require: raising a price can increase predicted demand, implied willingness-to-pay estimates are frequently negative or implausible, and unavailable alternatives receive nonze...	cs.LG econ.EM	2026-06-26T00:00:00	Yingshuo Wang, Xian Sun, Yanhang Li, Zhichao Fan, Zexin Zhuang	no_new_dataset	false	0.94731	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null
2606.26442	AXLE: A Cloud Infrastructure for Lean 4 Theorem Proving Utilities	We present AXLE (Axiom Lean Engine), a cloud service for Lean 4 proof manipulation, extraction, and verification. Recent progress in AI for mathematics -- reinforcement learning pipelines, agentic proving workflows, dataset curation -- demands Lean 4 tooling that scales to millions of requests while remaining correct a...	cs.LO cs.AI	2026-06-26T00:00:00	Jimmy Xin, Alex Schneidman, Chris Cummins, Karun Ram, Srihari Ganesh, Jannis Limperg	no_new_dataset	false	0.887967	2026-06-30T02:08:35.223196	librarian-bots/arxiv-new-datasets-modernbert-v4	null	null

End of preview. Expand in Data Studio

No dataset card yet

Downloads last month: 139