Dataset Viewer
Auto-converted to Parquet Duplicate
id
large_stringlengths
9
16
title
large_stringlengths
5
245
abstract
large_stringlengths
83
4.03k
categories
large_stringlengths
5
108
update_date
timestamp[ms]date
2007-05-23 00:00:00
2026-06-26 00:00:00
authors
large_stringlengths
5
24.7k
classification_label
large_stringclasses
2 values
is_new_dataset
bool
2 classes
confidence_score
float64
0.5
0.98
classification_date
large_stringdate
2026-06-30 02:08:35
2026-06-30 02:08:35
model_version
large_stringclasses
1 value
embedding
large listlengths
embedding_model
large_stringclasses
0 values
2310.11714
Consistent Distributed Ranking of Generative Models via Kernel Distances
Ranking generative models based on the fidelity and diversity of their outputs is required to identify the best generator in a group of candidate generative AI models. To rank a group of models in a conventional centralized setting, a standard score is commonly evaluated for each involved model. The selection and desig...
cs.LG
2026-06-26T00:00:00
Zixiao Wang, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu
no_new_dataset
false
0.965777
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2409.16395
HELIOT: LLM-Based CDSS for Adverse Drug Reaction Management
Medication errors significantly threaten patient safety, leading to adverse drug events and substantial economic burdens on healthcare systems. Clinical Decision Support Systems (CDSSs) aimed at mitigating these errors often face limitations when processing unstructured clinical data, including reliance on static datab...
cs.AI
2026-06-26T00:00:00
Gabriele De Vito, Filomena Ferrucci, Athanasios Angelakis
new_dataset
true
0.964089
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2412.09959
Towards Consistent and Efficient Dataset Distillation via Diffusion-Driven Selection
Dataset distillation provides an effective approach to reduce memory and computational costs by optimizing a compact dataset that achieves performance comparable to the full original. However, for large-scale datasets and complex deep networks (e.g., ImageNet-1K with ResNet-101), the vast optimization space hinders dis...
cs.CV
2026-06-26T00:00:00
Xinhao Zhong, Shuoyang Sun, Zhaoyang Xu, Xulin Gu, Bin Chen, Min Zhang, Yaowei Wang
no_new_dataset
false
0.961972
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2501.07526
Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization
Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where communication is more expensive than computation, the scalability and performance of th...
cs.DC stat.ML
2026-06-26T00:00:00
Aditya Devarakonda, Ramakrishnan Kannan
no_new_dataset
false
0.961576
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2501.13955
Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs?
This study explores the potential of Large Language Models (LLMs) to generate artificial surveys, with a focus on personal mobility preferences in Germany. By leveraging LLMs for synthetic data creation, we aim to address the limitations of traditional survey methods, such as high costs, inefficiency and scalability ch...
cs.CL cs.AI cs.CY
2026-06-26T00:00:00
Ioannis Tzachristas, Santhanakrishnan Narayanan and Constantinos Antoniou
no_new_dataset
false
0.615637
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2501.19274
GO: The Great Outdoors Multimodal Dataset
The Great Outdoors (GO) dataset is a multi-modal annotated data resource aimed at advancing ground robotics research in unstructured environments. Existing off-road datasets often lack sensor diversity and exclude vital modalities like thermal and radar that are critical for operation in degraded conditions (e.g., low ...
cs.RO
2026-06-26T00:00:00
Peng Jiang, Kasi Viswanath, Akhil Nagariya, George Chustz, Maggie Wigness, Philip Osteen, Timothy Overbye, Christian Ellis, Long Quang, Jia Huang, Srikanth Saripalli
new_dataset
true
0.970379
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2502.06890
LLMs for Drug-Drug Interaction Prediction: A Comprehensive Comparison
The increasing volume of drug combinations in modern therapeutic regimens needs reliable methods for predicting drug-drug interactions (DDIs). While Large Language Models (LLMs) have revolutionized various domains, their potential in pharmaceutical research, particularly in DDI prediction, remains largely unexplored. T...
cs.LG cs.AI q-bio.QM
2026-06-26T00:00:00
Gabriele De Vito, Filomena Ferrucci, Athanasios Angelakis
no_new_dataset
false
0.950446
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2504.13432
Circular Quasiconformal Deturbulence: Geometry-Based Restoration from Multiple Turbulent Frames
Imaging through inhomogeneous media often results in severe distortions, posing significant challenges to downstream image-processing tasks. The lack of clean paired images makes supervised learning impractical, motivating unsupervised restoration approaches. In this work, we propose the Circular Quasi-Conformal Deturb...
cs.CV
2026-06-26T00:00:00
Chu Chen, Han Zhang, Lok Ming Lui
no_new_dataset
false
0.963591
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2505.20178
No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic \enquote{free lunch} for PPI++, an adaptive form of PPI, showing that the \textit{asymptotic} variance of PPI++ is always less than ...
stat.ML cs.LG
2026-06-26T00:00:00
Pranav Mani, Peng Xu, Zachary C. Lipton, Michael Oberst
no_new_dataset
false
0.963184
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2507.03122
Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings
This study investigates the feasibility and performance of federated learning (FL) for multi-label ICD code classification using clinical notes from the MIMIC-IV dataset. Unlike previous approaches that rely on centralized training or fine-tuned large language models, we propose a lightweight and scalable pipeline comb...
cs.IR cs.CL cs.LG
2026-06-26T00:00:00
Binbin Xu, G\'erard Dray
no_new_dataset
false
0.96394
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2508.08005
Learning to Select Maximum Clique Algorithms: From Traditional Machine Learning to a Dual-Channel Hybrid Neural Architecture
The Maximum Clique Problem (MCP) is an NP-hard problem with wide-ranging applications in fields such as bioinformatics, network science, and social computing, yet no single algorithm consistently outperforms all others across diverse graph instances. This underscores the critical need for instance-aware algorithm selec...
cs.LG cs.AI
2026-06-26T00:00:00
Xiang Li, Shanshan Wang, Chenglong Xiao
new_dataset
true
0.966797
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2508.17916
EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images
Depth estimation is a foundational component for 3D reconstruction in minimally invasive endoscopic surgeries. However, existing monocular depth estimation techniques often exhibit limited performance to the varying illumination and complex textures of the surgical environment. While applying foundation models offers a...
cs.CV
2026-06-26T00:00:00
Xinning Yao, Bo Liu, Bojian Li, Jingjing Wang, Jinghua Yue, Fugen Zhou
no_new_dataset
false
0.968252
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2508.21221
Uncertainty-Aware Ankle Exoskeleton Control
Lower limb exoskeletons show promise to assist human movement, but their utility is limited by controllers designed for discrete, predefined actions in controlled environments, restricting their real-world applicability. We present an uncertainty-aware control framework that enables ankle exoskeletons to operate safely...
cs.RO
2026-06-26T00:00:00
Fatima Mumtaza Tourk, Bishoy Galoaa, Sanat Shajan, Aaron J. Young, Michael Everett, Max K. Shepherd
no_new_dataset
false
0.943242
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2509.09960
Limited Reference, Reliable Generation: A Two-Component Framework for Tabular Data Generation in Low-Data Regimes
Synthetic tabular data generation is increasingly essential in machine learning, supporting downstream applications when real-world, high-quality tabular data is insufficient. Existing tabular generation approaches, such as generative adversarial networks (GANs) and fine-tuned Large Language Models (LLMs), typically re...
cs.LG cs.AI
2026-06-26T00:00:00
Mingxuan Jiang, Keyang Chen, Yongxin Wang, Yongsheng Zhao, Ziyue Dai, Yicun Liu, Zeping Li, Qiuyang Zhang, Hongyi Nie, Hongbin Zhu, Sen Liu, Guangnan Ye, and Hongfeng Chai
no_new_dataset
false
0.95743
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2510.00586
Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors
Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned documents for each target phrase. We introduce Eyes-on-Me, a modular attack that decomposes an adversarial document into reusable **Attention Attractors** and **Focus Regions...
cs.LG cs.CL cs.CR
2026-06-26T00:00:00
Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen
no_new_dataset
false
0.92902
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2510.02809
Relevance-Aware Thresholding in Online Conformal Prediction for Time Series
Uncertainty quantification has received considerable interest in recent works in Machine Learning. In particular, Conformal Prediction (CP) gains ground in this field. For the case of time series, Online Conformal Prediction (OCP) becomes an option to address the problem of data distribution shift over time. Indeed, th...
cs.LG cs.AI
2026-06-26T00:00:00
Th\'eo Dupuy and Binbin Xu and St\'ephane Perrey and Jacky Montmain and Abdelhak Imoussaten
no_new_dataset
false
0.970307
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2510.17459
Estimating Orbital Parameters of Direct Imaging Exoplanet Using Neural Network
In this work, we propose a flow-matching Markov chain Monte Carlo (FM-MCMC) algorithm for estimating the orbital parameters of exoplanetary systems, especially for those only one exoplanet is involved. Compared to traditional methods that rely on random sampling within the Bayesian framework, our approach first leverag...
astro-ph.EP astro-ph.GA cs.LG
2026-06-26T00:00:00
Bo Liang, Hanlin Song, Chang Liu, Tianyu Zhao, Yuxiang Xu, Zihao Xiao, Manjia Liang, Minghui Du, Wei-Liang Qian, Li-e Qiang, Peng Xu, Ziren Luo
no_new_dataset
false
0.966104
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2510.20769
CSU-PCAST: A Dual-Branch Transformer Framework for medium-range ensemble Precipitation Forecasting
Accurate medium-range precipitation forecasting is essential for hydrometeorological risk management but remains challenging for both numerical weather prediction (NWP) systems and data-driven models. We present CSU-PCAST, a deep learning-based ensemble forecasting framework for global precipitation prediction. The mod...
physics.ao-ph cs.LG
2026-06-26T00:00:00
Tianyi Xiong, Haonan Chen, Kelly Mahoney, Jingyin Tang, Tim Smith and Janice Bytheway
no_new_dataset
false
0.950369
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2511.18254
UniFlow: Zero-Shot LiDAR Scene Flow for Autonomous Vehicles
LiDAR scene flow is the task of estimating per-point 3D motion between consecutive point clouds. Recent methods achieve centimeter-level accuracy on popular autonomous vehicle (AV) datasets, but are typically only trained and evaluated on a single sensor. In this paper, we aim to learn general motion priors that transf...
cs.CV
2026-06-26T00:00:00
Siyi Li, Qingwen Zhang, Ishan Khatri, Kyle Vedder, Eric Eaton, Deva Ramanan, Neehar Peri
no_new_dataset
false
0.833481
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2512.02652
Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training
Existing methods for expressive music performance rendering, a conditional generation task that aims to generate a human-like performance from a symbolic score, rely on supervised learning over small labeled datasets, which limits scaling of both data volume and model size, despite the availability of vast unlabeled mu...
cs.SD cs.AI cs.MM
2026-06-26T00:00:00
Hong-Jie You, Jie-Jing Shao, Xiao-Wen Yang, Lin-Han Jia, Lan-Zhe Guo, Yu-Feng Li
no_new_dataset
false
0.864372
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2512.04890
Equivariant symmetry-aware head pose estimation for fetal MRI
We present E(3)-Pose, a novel fast pose estimation method that jointly and explicitly models rotation equivariance and object symmetry. Our work is motivated by the challenging problem of accounting for fetal head motion during a diagnostic MRI scan. We aim to enable automatic adaptive prescription of diagnostic 2D MRI...
cs.CV
2026-06-26T00:00:00
Ramya Muthukrishnan, Borjan Gagoski, Aryn Lee, P. Ellen Grant, Elfar Adalsteinsson, Benjamin Billot, Polina Golland
no_new_dataset
false
0.95187
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2512.06401
LLMCFG-TGen: Using LLM-Generated Control Flow Graphs to Automatically Create Test Cases from Use Cases
Appropriate test-case generation is critical in software testing and significantly impacts testing quality. Requirements-Based Test Generation (RBTG) derives test cases from software requirements to verify whether system behavior aligns with user needs and expectations. Requirements are often documented in Natural Lang...
cs.SE
2026-06-26T00:00:00
Zhenzhen Yang, Chenhui Cui, Tao Li, Rubing Huang, Nan Niu, Dave Towey, Shikai Guo
no_new_dataset
false
0.922819
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2601.01084
A UAV-Based Multispectral and RGB Dataset for Multi-Stage Paddy Crop Monitoring in Indian Agricultural Fields
We present a large-scale unmanned aerial vehicle (UAV)-based RGB and multispectral image dataset collected over paddy fields in the Vijayawada region, Andhra Pradesh, India, covering nursery to harvesting stages. We used a 20-megapixel RGB camera and a 5-megapixel four-band multispectral camera capturing red, green, re...
cs.CV eess.IV
2026-06-26T00:00:00
Adari Rama Sukanya, Puvvula Roopesh Naga Sri Sai, Bodduru Neshika, Rimalapudi Sarvendranath
new_dataset
true
0.965152
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2601.01701
Digital Twin-Driven Communication-Efficient Federated Anomaly Detection for Industrial IoT
Anomaly detection is increasingly becoming crucial for maintaining the safety, reliability, and efficiency of industrial systems. Recently, with the advent of digital twins and data-driven decision-making, several statistical and machine-learning methods have been proposed. However, these methods face several challenge...
cs.LG cs.AI
2026-06-26T00:00:00
Mohammed Ayalew Belay, Adil Rasheed, Pierluigi Salvo Rossi
no_new_dataset
false
0.965397
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2601.04390
SciFig: Towards Automating Editable Figure Generation for Scientific Papers
High-quality methodology figures are central to scientific communication, yet they remain difficult and time-consuming to create. Such figures must distill a method's components and information flow into a clear, revisable diagram as the paper evolves. Existing methodology diagram automation systems typically face a tr...
cs.AI
2026-06-26T00:00:00
Siyuan Huang, Yifan Zhou, Yutong Gao, Zi Yin, Juyang Bai, Xinxin Liu, Rama Chellappa, Chun Pong Lau, Cheng Peng, Sayan Nag, Shraman Pramanick
new_dataset
true
0.971241
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2601.12062
Learning Language-Driven Sequence-Level Modal-Invariant Representations for Video-Based Visible-Infrared Person Re-Identification
The core of video-based visible-infrared person re-identification (VVI-ReID) lies in learning sequence-level modal-invariant representations across different modalities. Recent research tends to use modality-shared language prompts generated by CLIP to guide the learning of modal-invariant representations. Despite achi...
cs.CV
2026-06-26T00:00:00
Xiaomei Yang, Antai Liu, Xizhan Gao, Fa Zhu, Sijie Niu, and Giancarlo Fortino
no_new_dataset
false
0.963188
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2601.13632
Resilient Routing: Risk-Aware Dynamic Routing in Smart Logistics via Spatiotemporal Graph Learning
With the rapid development of the e-commerce industry, the logistics network is experiencing unprecedented pressure. The traditional static routing strategy most time cannot tolerate the traffic congestion and fluctuating retail demand. In this paper, we propose a Risk-Aware Dynamic Routing(RADR) framework which integr...
cs.AI
2026-06-26T00:00:00
Zhiming Xue, Sichen Zhao, Yalun Qi, Xianling Zeng, Zihan Yu
no_new_dataset
false
0.964021
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2602.13939
Adaptive Automatic Model Selection for Demand Forecasting under Heterogeneous Demand Patterns
Demand forecasting is critical for inventory planning, procurement, replenishment, production, and capacity decisions in heterogeneous supply chains. However, selecting the most appropriate model for each demand series remains challenging because performance varies across datasets, demand structures, horizons, and eval...
cs.LG cs.AI
2026-06-26T00:00:00
Adolfo Gonz\'alez, V\'ictor Parada
no_new_dataset
false
0.968194
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2602.16220
SEMixer: Semantics Enhanced MLP-Mixer for Multiscale Mixing and Long-term Time Series Forecasting
Modeling multiscale patterns is crucial for long-term time series forecasting (TSF). However, redundancy and noise in time series, together with semantic gaps between non-adjacent scales, make the efficient alignment and integration of multi-scale temporal dependencies challenging. To address this, we propose SEMixer, ...
cs.LG
2026-06-26T00:00:00
Xu Zhang, Qitong Wang, Peng Wang, Wei Wang
no_new_dataset
false
0.967578
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2602.18446
ReportLogic: Evaluating Logical Quality in Deep Research Reports
Users increasingly rely on Large Language Models (LLMs) for Deep Research, using them to synthesize diverse sources into structured reports that support understanding and action. In this context, the practical reliability of such reports hinges on logical quality: whether the report's claims and arguments are explicitl...
cs.CL cs.AI
2026-06-26T00:00:00
Jujia Zhao, Zhaoxin Huan, Zihan Wang, Xiaolu Zhang, Jun Zhou, Suzan Verberne, and Zhaochun Ren
new_dataset
true
0.976244
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2602.18900
PrivacyBench: Privacy Isn't Free in Hybrid Privacy-Preserving Vision Systems
Privacy preserving machine learning deployments in sensitive deep learning applications; from medical imaging to autonomous systems; increasingly require combining multiple techniques. Yet, practitioners lack systematic guidance to assess the synergistic and non-additive interactions of these hybrid configurations, rel...
cs.CR cs.CV
2026-06-26T00:00:00
Nnaemeka Obiefuna and Samuel Oyeneye and Similoluwa Odunaiya and Iremide Oyelaja and Steven Kolawole
no_new_dataset
false
0.927117
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2603.01195
VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning
The effectiveness of multimodal instruction tuning depends not only on dataset scale, but critically on whether training samples genuinely require visual reasoning. However, existing instruction datasets often contain a substantial portion of visually redundant samples (solvable from text alone), as well as multimodall...
cs.CV cs.AI
2026-06-26T00:00:00
Mingkang Dong, Hongyi Cai, Jie Li, Sifan Zhou, Bin Ren, Kunyu Peng, Yuqian Fu
no_new_dataset
false
0.769431
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2603.01461
UltraStar: Semantic-Aware Star Graph Modeling for Echocardiography Navigation
Echocardiography is critical for diagnosing cardiovascular diseases, yet the shortage of skilled sonographers hinders timely patient care, due to high operational difficulties. Consequently, research on automated probe navigation has significant clinical potential. To achieve robust navigation, it is essential to lever...
cs.CV
2026-06-26T00:00:00
Teng Wang, Haojun Jiang, Chenxi Li, Diwen Wang, Yihang Tang, Zhenguo Sun, Yujiao Deng, Shiji Song, Gao Huang
no_new_dataset
false
0.725539
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2603.24991
Towards Video Anomaly Detection from Event Streams: A Baseline and Benchmark Datasets
Event-based vision, characterized by low redundancy, focus on dynamic motion, and inherent privacy-preserving properties, naturally fits the demands of video anomaly detection (VAD). However, the absence of dedicated event-stream anomaly detection datasets and effective modeling strategies has significantly hindered pr...
cs.CV
2026-06-26T00:00:00
Peng Wu, Yuting Yan, Guansong Pang, Yujia Sun, Qingsen Yan, Peng Wang, Yanning Zhang
new_dataset
true
0.963291
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2604.05920
Reference Energies for Non-Relativistic Core Ionization Potentials
Deep-lying core electrons carry highly localized, site-specific information that forms the basis of X-ray photoelectron spectroscopy. Accurately predicting their associated core ionization potentials (IPs) is a demanding theoretical task, requiring a balanced treatment of strong orbital relaxation, electron correlation...
physics.chem-ph cond-mat.mtrl-sci nucl-th
2026-06-26T00:00:00
Antoine Marie and Loris Burth and Pierre-Fran\c{c}ois Loos
new_dataset
true
0.95753
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2604.08448
AfriVoices-KE: A Multilingual Speech Dataset for Kenyan Languages
AfriVoices-KE is a large-scale multilingual speech dataset comprising approximately 3,000 hours of audio across five Kenyan languages: Dholuo, Kikuyu, Kalenjin, Maasai, and Somali. The dataset includes 750 hours of scripted speech and 2,250 hours of spontaneous speech, collected from 4,777 native speakers across divers...
cs.CL
2026-06-26T00:00:00
Lilian Wanzare, Cynthia Amol, Ezekiel Maina, Nelson Odhiambo, Hope Kerubo, Leila Misula, Vivian Oloo, Rennish Mboya, Edwin Onkoba, Edward Ombui, Joseph Muguro, Ciira wa Maina, Andrew Kipkebut, Alfred Omondi Otom, Ian Ndung'u Kang'ethe, Angela Wambui Kanyi, Brian Gichana Omwenga
new_dataset
true
0.969665
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2604.17420
TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering
Money laundering poses severe risks to global financial systems, driving the widespread adoption of machine learning for transaction monitoring. However, progress remains stifled by the lack of realistic benchmarks. Existing transaction-graph datasets suffer from two pervasive limitations: (i) they provide sparse node-...
cs.LG cs.AI cs.SI
2026-06-26T00:00:00
Keyang Chen, Mingxuan Jiang, Yongsheng Zhao, Zeping Li, Zaiyuan Chen, Weiqi Luo, Zhixin Li, Sen Liu, Yinan Jing, Guangnan Ye, Xihong Wu, Hongfeng Chai
new_dataset
true
0.969492
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2605.03680
Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs
While deep-learning-based image restoration has achieved unprecedented fidelity, deployment on mobile Neural Processing Units (NPUs) remains bottlenecked by operator incompatibility and memory-access overhead. We propose an NPU-aware hardware-algorithm co-design approach for real-world image denoising on mobile NPUs. O...
cs.CV cs.LG
2026-06-26T00:00:00
Faraz Kayani, Sarmad Kayani, Asad Ahmed, Radu Timofte, and Dmitry Ignatov
new_dataset
true
0.945709
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2605.13693
StayStill: a large-scale 3D idle animation dataset
Idle animations are essential for virtual characters, as they convey realistic behaviour during inactive states. While automatic animation generation has been widely studied, limited attention has been given to idle motion due to the absence of dedicated training datasets. We introduce StayStill, a large-scale dataset ...
cs.GR
2026-06-26T00:00:00
Eneko Atxa Landa, Igor Rodriguez, Elena Lazkano, Taras Kucherenko
new_dataset
true
0.973324
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2605.24417
LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots
Supervised classification on tabular data remains a central machine learning task, but its dependence on large labeled datasets limits its applicability in data-scarce settings. Few-shot methods such as TabPFN achieve strong performance through large-scale synthetic pretraining, yet still require labeled context exampl...
cs.LG
2026-06-26T00:00:00
Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov
new_dataset
true
0.965517
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2605.24696
CALIBURN: Operationally Calibrated Streaming Intrusion Detection with Regime-Dependent Conformal Risk Control
Streaming intrusion detection systems must process flows continuously under bounded memory, yet most leave alerting-threshold selection as a post-hoc tuning problem incompatible with production, where operators commit in advance to alert budgets, misclassification costs, and Service Level Objectives. We present CALIBUR...
cs.CR cs.LG
2026-06-26T00:00:00
Michel A. Youssef
no_new_dataset
false
0.868595
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.00827
Beyond Independent Manipulation: Individual Fairness-aware Strategic Classification with Peer Imitation
Strategic classification (SC) investigates scenarios where agents manipulate their features to obtain favorable decisions from predictive models. Existing fairness-aware SC approaches primarily focus on group fairness and typically assume that agents respond independently. However, when individual fairness is required,...
cs.LG cs.AI
2026-06-26T00:00:00
Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu, Jinxuan Yang, Yuanlong Chen, Wangrong Huang, Shaowu Yang, Wenjing Yang, Xinwang Liu, Peng Cui, Haotian Wang
no_new_dataset
false
0.956471
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.03549
How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration
Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the ...
cs.LG math.PR
2026-06-26T00:00:00
Vadim Porvatov, Andrey Dukhovny, Andrey Lange
no_new_dataset
false
0.963651
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.06408
MODIS Thermal Infrared Sounding (MOTIS): Estimating Tropical Cyclone Central Pressure from Warm-Core Anomalies
This study presents a novel framework for estimating the central sea-level pressure ($P_\mathrm{c}$) of tropical cyclones (TCs) using infrared radiometers. We leverage the long-overlooked combination of high spatial resolution and sounding capability of the Moderate Resolution Imaging Spectroradiometer (MODIS) to measu...
physics.ao-ph
2026-06-26T00:00:00
Jinghuai Yao, Chi Yan Kwok, Puyuan Du, Yubo Wang, and Derrick Herndon
new_dataset
true
0.958641
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.12716
Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review
The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significan...
cs.CL
2026-06-26T00:00:00
Xinyu Zhao, Rana Muhammad Shahroz Khan, Zhen Xu, Zhen Tan, Tianlong Chen
new_dataset
true
0.976679
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.13042
Augmentation techniques for video surveillance in the visible and thermal spectral range
In intelligent video surveillance, cameras record image sequences during day and night. Commonly, this demands different sensors. To achieve a better performance it is not unusual to combine them. We focus on the case that a long-wave infrared camera records continuously and in addition to this, another camera records ...
cs.AI cs.CV
2026-06-26T00:00:00
Vanessa Buhrmester, Ann-Kristin Grosselfinger, David Munch, and Michael Arens
no_new_dataset
false
0.876493
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.14668
When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing
Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central desig...
cs.LG
2026-06-26T00:00:00
Baijia Zhang, Yining Huang
no_new_dataset
false
0.961054
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.16325
Attention-Based Prototype Calibration for Multi-Rater Few-Shot Medical Image Segmentation
Few-shot medical image segmentation methods typically assume a single ground-truth annotation, overlooking systematic variability across expert raters commonly observed in clinical datasets. We propose an attention-based prototype calibration framework for few-shot multi-rater segmentation that models rater-specific de...
cs.CV
2026-06-26T00:00:00
Truong Vu, Minh Khoi Ho, Yutong Xie
no_new_dataset
false
0.959122
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.21097
GRAG: Generic Response-Augmented Generation Framework for Personalized Conversational Systems
Deploying highly capable personalized conversational agents in resource-constrained or privacy-sensitive environments remains a significant challenge. We identify a fundamental bottleneck in the existing approaches: current training paradigms treat personalization and grounding as a single monolithic learning problem. ...
cs.CL cs.LG
2026-06-26T00:00:00
Junfeng Liu, Christopher T. Symons, Ranga Raju Vatsavai
no_new_dataset
false
0.954078
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.21649
EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory
Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for retrieval. It is tailored for long-context scenarios, where information...
cs.CL
2026-06-26T00:00:00
Chang Nie, Chaoyou Fu, Junlan Feng, Caifeng Shan
new_dataset
true
0.970956
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.22076
Learning Cross-View Semantic Priors for Single-Reference Unseen Object Pose Estimation
Single-reference unseen object 6D pose estimation reduces object onboarding by estimating poses of arbitrary novel objects from only one reference view. Recent correspondence-based pipelines have achieved robust performance with vision foundation model (VFM) features. However, they typically treat these features as int...
cs.CV
2026-06-26T00:00:00
Jiahong Chen, Jinghao Wang, Ziwen Wang, Zi Wang, Banglei Guan and Qifeng Yu
no_new_dataset
false
0.944891
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.22537
NegAS: Negative Label Guided Attention and Scoring for Out-of-Distribution Object Detection with Vision-Language Models
Out-of-Distribution (OOD) detection is essential for ensuring the robustness and reliability of object detection systems deployed in safety-critical applications. While prior research has mainly focused on uni-modal detectors or vision-language model (VLM) based classifiers, the potential of VLM-based object detectors ...
cs.CV
2026-06-26T00:00:00
Yingjie Zhang, Shuai Li, Peng Wang
no_new_dataset
false
0.952479
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.24890
Small edits, large models: How Wikipedia advocacy shapes LLM values
Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text. The Pro-Animal Wikipedians (PAW), a group of advocates who add source...
cs.CL cs.AI cs.CY
2026-06-26T00:00:00
Jasmine Brazilek, Maria Navas, Alexa Gnauck
no_new_dataset
false
0.891778
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.25006
Scalable Peptide Design via Memory-Efficient Equivariant Transformer
Target-specific peptide design requires sequence and structure co-design under full atom geometric constraints. Latent generative frameworks offer an effective route for this problem by compressing fine grained atomic structures into block level latent representations and performing conditional generation in a compact ...
cs.LG
2026-06-26T00:00:00
Rui Jiao, Xiangzhe Kong, Yinjun Jia, Yijia Zhang, Ziyi Yang, Yang Liu and Jianzhu Ma
no_new_dataset
false
0.945915
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.25832
MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources
Achieving strong optimization generalization across diverse optimization problems while requiring limited training resources remains a challenging problem for optimization-oriented large language models (LLMs). Existing approaches typically rely on large-scale supervised datasets, costly reasoning annotations, and expe...
cs.LG cs.AI
2026-06-26T00:00:00
Ke Zhao, Zixiang Di, Hong Qian, Xiang Shu, Yaolin Wen, Qitao Shi, Bingdong Li, Xingyu Lu, Xiangfeng Wang, Jun Zhou, Ke Tang, Yang Yu
no_new_dataset
false
0.952707
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.25996
Autodata: An agentic data scientist to create high quality synthetic data
We introduce Autodata, a general method that enables AI agents to act as data scientists who build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it learns to create even stronger data. We describe the overall formulation, and a specific practical im...
cs.AI cs.CL cs.LG
2026-06-26T00:00:00
Ilia Kulikov, Chenxi Whitehouse, Tianhao Wu, Yixin Nie, Swarnadeep Saha, Eryk Helenowski, Weizhe Yuan, Olga Golovneva, Jack Lanchantin, Yoram Bachrach, Jakob Foerster, Xian Li, Han Fang, Sainbayar Sukhbaatar, Jason Weston
no_new_dataset
false
0.905912
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26099
Benchmarking Open-Weight Foundation Models for Global AI Technical Governance
Large language models (LLMs) are increasingly deployed in artificial intelligence (AI) governance analysis across national and international organisations. There is, however, growing evidence that such models produce significantly less accurate responses for countries that are underrepresented in their training data-a ...
cs.CY cs.AI
2026-06-26T00:00:00
Jason Hung
new_dataset
true
0.955401
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26101
Know2Guess: A Contamination-Aware Multi-Zone Benchmark for Knowledge-Boundary Evaluation in Large Language Models
Reliable evaluation of large language models should separate supported answering from unsupported guessing without conflating either with data contamination, prompt idiosyncrasy, or generic refusal behavior. We present a contamination-aware, multi-zone benchmark for measuring the transition from answerable knowledge to...
cs.CL cs.AI
2026-06-26T00:00:00
Renwei Meng, Bowen Zhang, Jian Wang, Xican Wang, Haoyi Wu, Xuanyan Qiu, and Shengan Yang
new_dataset
true
0.970531
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26102
Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training
Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate whether the domain of post-training data differentially affects the retention of animal ...
cs.CL cs.AI cs.CY
2026-06-26T00:00:00
Jasmine Brazilek, Juliana Seawell
no_new_dataset
false
0.805479
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26103
Investigating LLM's Problem Solving Capability -- a Study on Statics Questions
Large Language Models (LLMs) have rapidly influenced many aspects of society, particularly education, due to their demonstrated ability to complete assignments and examinations across a wide range of subjects. Although prior studies have examined the educational impact of LLMs, much of the existing work relies on publi...
cs.CL cs.AI
2026-06-26T00:00:00
Tanner Culleton and Hung-Fu Chang
new_dataset
true
0.95255
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26107
Low Resource Multimodal Translation of Nepali Spoken Words into Emotion-Conditioned Sign Language Avatars
Sign language communication systems, that integrate emotional expression remain underexplored, particularly for low-resource languages. This pilot study presents NEST-V1 (Nepali Emotion and Speech Transformer - Version 1), a proof-of-concept multimodal framework that demonstrates the feasibility of generating emotion-c...
cs.CL cs.AI
2026-06-26T00:00:00
Jatin Bhusal and Salma Tamang
no_new_dataset
false
0.829837
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26108
Where Larger Models Excel: The Primacy of Constraint-Guided Reasoning
Larger language models consistently outperform smaller ones on reasoning benchmarks, yet the reasoning differences underlying this gap remain underexplored. Across benchmarks in mathematics, physics, chemistry, and programming, we observe stable performance gaps: averaged over datasets, Qwen3-32B outperforms Qwen3-8B b...
cs.CL
2026-06-26T00:00:00
Guan-Yi Lin, Hen-Hsen Huang
no_new_dataset
false
0.944968
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26130
Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods
Large Language Models (LLMs) are increasingly used to guide research methodology, yet their default methodological tendencies under minimal prompting remain unclear. Here, we prompt GPT-5.1, Gemini 3 Pro, and DeepSeek-V3.2 with an LLM-extracted research question from each of 1,000 recent arXiv computer-science papers a...
cs.CL cs.AI cs.DL
2026-06-26T00:00:00
Francesca Carlon, Brecht Verbeken, Vincent Ginis, Andres Algaba
no_new_dataset
false
0.817338
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26151
Unsupervised Memory-Enhanced Video Transformers: Obstacle Detection for Autonomous Agricultural Rover
While autonomous rovers have become indispensable to precision farming, achieving consistent operational safety remains a critical challenge. Conventional safety sensors, such as LiDAR, fail to detect obstacles positioned below the plant canopy, posing a significant risk. While camera-based supervised learning methods ...
cs.RO cs.AI
2026-06-26T00:00:00
Th\'eo Biardeau (XLIM-ASALI, UFR SFA (Poitiers)), Anne-Sophie Capelle-Laiz\'e (UP, XLIM-ASALI, XLIM-ASALI), Salwan Alwan, David Helbert (UFR SFA (Poitiers), XLIM-ASALI, LabCom I3M (Poitiers))
no_new_dataset
false
0.890574
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26165
Predicting Fruit Quality with a Hybrid Machine Learning and Image Processing Approach
Fruit spoilage is a significant issue in agriculture, leading to substantial economic losses. Addressing this, our study introduces a hybrid approach combining image processing and deep learning to assess fruit freshness. We developed an image processing algorithm that quantifies spoilage on a scale from 0 (fully fresh...
cs.CV
2026-06-26T00:00:00
Amir Reza Hashemi, Shahram Amiri
no_new_dataset
false
0.944468
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26168
Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration
Living systems navigate environments using noisy and incomplete sensory signals. In unicellular algae, phototaxis is often modeled as a mechanistic run--tumble process driven by stimulus--response rules. However, such descriptions overlook how organisms actively sample their environment to reduce sensory ambiguity. Fro...
cs.LG q-bio.QM
2026-06-26T00:00:00
Ruyi Tang, Gr\'egoire Sergeant-Perthuis (LCQB-AG), David Colliaux
no_new_dataset
false
0.94643
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26169
Neural Architecture Search for Generative Adversarial Networks: A Comprehensive Review and Critical Analysis
Neural Architecture Search (NAS) has emerged as a pivotal technique in optimizing the design of Generative Adversarial Networks (GANs), automating the search for effective architectures while addressing the challenges inherent in manual design. This paper provides a comprehensive review of NAS methods applied to GANs, ...
cs.LG cs.AI
2026-06-26T00:00:00
Abrar Alotaibi, Moataz Ahmed
no_new_dataset
false
0.976102
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26171
LCG: Long-Context Consistent Image Generation with Sparse Relational Attention
Recent image generation models achieve impressive quality in single-image synthesis, but often fail to maintain consistency across sequential outputs, as required in comics, storyboards, and visual narratives. We propose Long-Context Generation (LCG), a framework for long-context multi-image text-to-image generation, t...
cs.CV cs.AI
2026-06-26T00:00:00
Zihao Wang, Yijia Xu, Haoze Zheng, Xuran Ma, Haokun Gui, and Harry Yang
new_dataset
true
0.977465
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26176
Toward Mitigating Process-Induced Performance Degradation in 3.5D Heterogeneous Packages via Pre-Silicon Firmware Co-Optimization
This paper presents a pre-silicon analysis of XRM-SSD V24/V7.0, a physics-aware predictive firmware scheduling layer for Intel's 3.5D heterogeneous integrated packages (Foveros Direct 3D + PowerVia + EMIB-T + UCIe + HBM5). Using detailed thermal-electrical co-simulation over a 90,000-step LLM inference dataset, we show...
cs.AR
2026-06-26T00:00:00
Chi Fei Chung (Dollarchip Technology Inc.), Nikolai Nedovodin (STARGA Inc.)
no_new_dataset
false
0.805611
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26179
KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction
While WGS-based AMR prediction has reached high accuracy, existing models lack a mechanism to ground neural attributions in established biological pathways. We present KG-TRACE, a novel neuro-symbolic framework that integrates the WHO mutation knowledge graph (KG) as a structured biological constraint on a neural genom...
cs.LG cs.AI q-bio.QM
2026-06-26T00:00:00
Naman Garg, Sarika Jain, Sourav Yadav, Bharat K. Bhargava, Ghanapriya Singh, Abhishek Srivastava, Parimal Kar
no_new_dataset
false
0.739974
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26192
Federated Hash Projected Latent Factor Learning
Hash Learning (HL) is an efficient representation learning approach that maps real-valued data into compact binary representations. Traditional HL methods typically require users to upload personal data to a central server, which is incompatible with increasingly stringent data security regulations. Federated Learning ...
cs.LG cs.CR
2026-06-26T00:00:00
Jialan He
no_new_dataset
false
0.969032
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26195
Soroll-IA: A Weakly Labeled Audio Dataset for Real-World Industrial Port Monitoring
Soroll-IA is a weakly labeled environmental audio dataset recorded in a real-world industrial port environment in Valencia (Spain) using two fixed sensing nodes. The dataset comprises approximately 22 hours of audio segmented into 7,396 clips and covers 26 sound event classes representative of industrial port acoustic ...
cs.SD
2026-06-26T00:00:00
Javier Naranjo-Alcazar, Jordi Grau-Haro, Ruben Ribes-Serrano, Marta Garcia-Ballesteros, Pedro Zuccarello
new_dataset
true
0.969195
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26201
OmniContact: Chaining Meta-Skills via Contact Flow for Generalizable Humanoid Loco-Manipulation
Learning long-horizon humanoid loco-manipulation poses a dual challenge: it requires not only the robust execution of meta-skills but also their seamless, closed-loop chaining equipped with autonomous recovery. Existing approaches remain limited: explicit humanoid-object interaction representations offer precision but ...
cs.RO
2026-06-26T00:00:00
Runyi Yu, Xiaoyi Lin, Ji Ma, Yinhuai Wang, Koukou Luo, Jiahao Ji, Huayi Wang, Wenjia Wang, Runhan Zhang, Ping Tan, Ting Wu, Ruoli Dai, Qifeng Chen, and Lei Han
new_dataset
true
0.968443
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26204
Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery
Floods frequently impact regions around the world. Rapid and accurate flood detection is crucial for emergency response and timely mitigation of human and economic loss. The expanding availability of satellite data and advances in artificial intelligence have enhanced monitoring of environmental hazards, but many flood...
cs.LG
2026-06-26T00:00:00
Sophia Li, Max Zhao, Raghu G. Raj, Tianyu Chen
new_dataset
true
0.951033
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26207
The Role of Input Dimensionality in the Emergence and Targeted Control of Adversarial Examples
Several theoretical works have tried to explain the adversarial vulnerability of deep neural networks through properties of high-dimensional geometry. However, the assumptions underlying these works are rarely examined empirically, and systematic evidence remains limited. In this work, we present a systematic study of ...
stat.ML cs.CR cs.LG
2026-06-26T00:00:00
Nasrin Malekzadeh Goradel, Niccolo Pancino, Yaser Gholizade Atani, Benedetta Tondi, Giovanni Bellettini, Mauro Barni
no_new_dataset
false
0.955921
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26211
Data Facts: A Metadata Schema for Structured Data Exchange in the NANDini Multi-Agent Ecosystem
NANDini (Networked Agents Natural Distillation of Interconnected Nodal Intelligence) envisions an automated ecosystem where intelligent agents independently create, process, and exchange data to drive decisions at scale. Realizing this vision requires infrastructure beyond agent discovery and communication: agents must...
cs.CR
2026-06-26T00:00:00
Jin Gao, Maria Gorskikh, Pradyumna Chari, Brittany Box, Mukul Kemla, Pratik Behera, Abhishek Mehta, Ramesh Raskar
new_dataset
true
0.852629
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26257
Dataset Usage Inference without Shadow Models or Held-out Data
How much of my data was used to train a machine learning model? Dataset Usage Inference (DUI) aims to answer this by estimating what fraction of a dataset contributed to a model's training. However, existing DUI methods rely on assumptions that rarely hold in practice: they require training expensive shadow models to i...
cs.LG
2026-06-26T00:00:00
Wojciech {\L}apacz, Stanis{\l}aw Pawlak, Jan Dubi\'nski, Franziska Boenisch, and Adam Dziedzic
no_new_dataset
false
0.950233
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26260
A multi-task spatiotemporal deep neural network for predicting penetration depth and morphology in laser welding
In laser penetration welding, the assessment of penetration state and weld seam morphology plays a crucial role in determining the weld quality. This paper presents a comprehensive introduction of the innovative muti-task deep learning model that has the capability to predict penetration state, depth, and weld seam mor...
cs.CV cs.AI
2026-06-26T00:00:00
Sen Li, Haichao Cui, Chendong Shao, Yaqi Wang, Xinhua Tang
new_dataset
true
0.890045
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26285
TEMPO-Diffusion: Temporally Exposed Malicious Poisoning of Diffusion Models
Noise-based backdoor attacks on diffusion models typically rely on input-time trigger injection, untargeted activation, and out-of-distribution target generation. Such assumptions reduce both the stealthiness and the practical relevance of these attacks. In this work, we present TEMPO-Diffusion, a targeted backdoor fra...
cs.CR cs.AI
2026-06-26T00:00:00
William Aiken, Paula Branco, Guy-Vincent Jourdan, Iosif-Viorel Onut
new_dataset
true
0.969982
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26289
Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption
AI coding agents are penetrating open-source software development at an unprecedented pace, yet existing research predominantly treats human contributors as a static backdrop rather than as the subject of inquiry. This paper presents the first large-scale empirical study that takes the human contributor ecosystem as it...
cs.SE
2026-06-26T00:00:00
Weixing Zhang, Bowen Jiang, Anne Koziolek
no_new_dataset
false
0.572945
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26294
The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators
Self-improving agents are state-of-the-art (SOTA) on agentic coding benchmarks and have recently been extended to general domains. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier, benchmark, or labeled dataset that remains valid as the agent improves. This ignores a ce...
cs.LG cs.AI cs.MA cs.NE
2026-06-26T00:00:00
Alex Iacob, Andrej Jovanovi\'c, William F. Shen, Daniel Burkhardt, Meghdad Kurmanji, Nurbek Tastan, Lorenzo Sani, Niccol\`o Alberto Elia Venanzi, Ambroise Odonnat, Zeyu Cao, Bill Marino, Xinchi Qiu, and Nicholas D. Lane
no_new_dataset
false
0.928897
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26295
Beyond Aesthetics: Quantifying Information Loss in Turbid Scenes
Visibility in underwater environments degrades rapidly under turbid conditions, yet the effects on computer-vision models remain unclear. This issue is compounded by reliance on synthetic turbidity datasets, which may misrepresent real-world information loss. To address this gap, we introduce the Turbid Underwater Base...
cs.CV
2026-06-26T00:00:00
Vasiliki Ismiroglou, Stefan H. Bengtson, Tasos Benos, Thomas B. Moeslund, Malte Pedersen
new_dataset
true
0.968034
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26312
Tailor Made Embeddings for Quantum Machine Learning
Autoencoders transformed classical machine learning by solving the curse of dimensionality, enabling principled weight initialization and learning compact, structured representations. In this work, we extend this paradigm to quantum machine learning by introducing a variational autoencoder framework that learns task-sp...
quant-ph cs.CV cs.LG
2026-06-26T00:00:00
Aldo Lamarre and Dominik \v{S}afr\'anek
no_new_dataset
false
0.951024
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26317
Parametric Generalized Adaptive Moment Features (PG-AMF) for Bearing Fault Diagnosis and Machine Health Monitoring
Accurate fault diagnosis of rolling element bearings in rotating machinery is considered essential for ensuring industrial safety and enabling predictive maintenance. Conventional statistical feature-based methods rely on predefined descriptors, whose diagnostic sensitivity is constrained by fixed configurations and li...
eess.SP cs.AI
2026-06-26T00:00:00
Rajeev Kumar
no_new_dataset
false
0.948911
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26337
EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening
Gradient Boosted Decision Trees (GBDT), exemplified by LightGBM, spend a dominant fraction of training time -- typically 65-70% -- constructing per-feature histograms. Existing approaches such as random feature subsampling (feature_fraction) discard features without regard for their predictive utility. We propose EMA-b...
cs.LG
2026-06-26T00:00:00
Yan Song
no_new_dataset
false
0.954801
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26347
Health feature extraction from battery energy storage system field fault data
Health monitoring methods are critical for lithium-ion battery modules connected to the grid to prevent faults that can lead to catastrophic events. However, assessing the health of cells in modules from their operational data presents challenges including variable operating conditions, which directly confound health f...
eess.SY cs.SY
2026-06-26T00:00:00
Clement Wong, Andrew Weng, Xin Hui Ooi, Zhiwen Wan, Jeesoon Choi, Seung Yoon Yang, Heejun Jin, Jason Siegel, Anna Stefanopoulou
new_dataset
true
0.612508
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26363
Bayesian Changepoint Detection for Smart Sensing of Battery Degradation: Cycle-Level Health Indicators and PyMC Implementation
Reliable detection of the onset of accelerated degradation is central to safe and cost-efficient operation of lithium-ion batteries. This paper presents a Bayesian single-changepoint model applied to a simple but physically meaningful cycle-level health indicator (HI), defined as the ratio of charge time to discharge t...
eess.SY cs.SY
2026-06-26T00:00:00
Waldemar Bauer, Anna Jarosz-Kozyro and Jerzy Baranowski
no_new_dataset
false
0.938239
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26377
Verifying Intent and Harm: A Unified Defense Against LLM-Generated Threats
Large language models (LLMs) are increasingly deployed in interactive applications, yet they remain vulnerable to adversarial interactions that induce harmful, deceptive, or policy-violating outputs. Existing defenses typically analyze either user prompts or generated outputs, but not both. However, many real-world att...
cs.CR
2026-06-26T00:00:00
Poojitha Thota, Yun Lei, Santhosh Thangaraj, Siddhartha Reddy Jonnalagadda, Shirin Nilizadeh
no_new_dataset
false
0.940374
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26379
Layer-Specific Prompt Fusion Discovery via Differentiable Search in Vision Foundation Models
Visual prompt tuning has emerged as a parameter-efficient fine-tuning approach for adapting large-scale Vision Transformers (ViTs) to downstream tasks. As its learnable prompts are applied in input and feature spaces, prior to jointly going through attention in transformer layers, the most commonly used scheme for fusi...
cs.CV
2026-06-26T00:00:00
Xi Xiao, Xingjian Li, Yunbei Zhang, Cheng Han, Tianming Liu, Tianyang Wang, Runmin Jiang, Jihun Hamm, Xiao Wang, Min Xu
no_new_dataset
false
0.962961
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26382
Charting the Growth of Social-Physical HRI (spHRI): A Systematic Review Pipeline Augmented by Small Language Models
Social-physical human-robot interaction (spHRI) has grown rapidly across robotics, human-computer interaction, human-robot interaction, and haptics. Yet, fragmented terminology and inconsistent methodologies make systematic synthesis difficult. To support scalable review practices, we evaluated the extent to which smal...
cs.CL cs.AI cs.DL cs.HC cs.RO
2026-06-26T00:00:00
Mayumi Mohan, Ju-Hung Chen, and Alexis E. Block
no_new_dataset
false
0.926668
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26398
DinoLink: A Token-Centric Representation Compression Framework for Bandwidth-Constrained Collaborative V2X Perception
High-precision remote perception is often hindered by the severe bandwidth constraints of Vehicle-to-Everything (V2X) networks. We propose \textit{DinoLink}, a token-centric compression framework that replaces raw pixel streaming with discrete semantic communication for vehicle-cloud collaborative inference. DinoLink e...
cs.CV
2026-06-26T00:00:00
Tianle Zhu, Haohua Que, Handong Yao, Hongyi Xu and Zhipeng Bao
no_new_dataset
false
0.952268
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26416
Methane-Plume Segmentation From Hyperspectral Satellite Imagery Via Multimodal Deep Learning
Efficient detection of methane plumes is crucial for understanding and mitigating global warming, as accurately identifying and segmenting them in earth observation imagery remain essential for large-scale monitoring. In this work, we propose a multimodal deep learning model that integrates a feature-guided methane enh...
cs.CV
2026-06-26T00:00:00
Brayan Quintero, Jeferson Acevedo, Samuel Traslavi\~na, Hoover Rueda-Chac\'on
no_new_dataset
false
0.954043
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26422
Estimating Uncertainty in Classifier Performance with Applications to Large Language Models and Nested Data
Researchers increasingly use text classification--supervised models or large language models--to measure constructs from natural language, providing metrics such as recall and precision as evidence of their validity. Yet, though these metrics are point estimates subject to sampling variation, measures of uncertainty ar...
cs.AI
2026-06-26T00:00:00
Kylie Anglin
no_new_dataset
false
0.951272
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26432
Embedding Foundation Model Predictions in Discrete-Choice Models with Structural Guarantees
Tabular foundation models achieve strong accuracy on choice prediction tasks, but their predictions often violate the economic logic those tasks require: raising a price can increase predicted demand, implied willingness-to-pay estimates are frequently negative or implausible, and unavailable alternatives receive nonze...
cs.LG econ.EM
2026-06-26T00:00:00
Yingshuo Wang, Xian Sun, Yanhang Li, Zhichao Fan, Zexin Zhuang
no_new_dataset
false
0.94731
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
2606.26442
AXLE: A Cloud Infrastructure for Lean 4 Theorem Proving Utilities
We present AXLE (Axiom Lean Engine), a cloud service for Lean 4 proof manipulation, extraction, and verification. Recent progress in AI for mathematics -- reinforcement learning pipelines, agentic proving workflows, dataset curation -- demands Lean 4 tooling that scales to millions of requests while remaining correct a...
cs.LO cs.AI
2026-06-26T00:00:00
Jimmy Xin, Alex Schneidman, Chris Cummins, Karun Ram, Srihari Ganesh, Jannis Limperg
no_new_dataset
false
0.887967
2026-06-30T02:08:35.223196
librarian-bots/arxiv-new-datasets-modernbert-v4
null
null
End of preview. Expand in Data Studio

No dataset card yet

Downloads last month
139