SVG Image AI4Research: A Survey of Artificial Intelligence for Scientific Research

LARG, Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology1, School of Computer Science and Engineering, Central South University2, The University of Hong Kong3, Independent Researcher4, Fudan University5, Chinese University of Hong Kong6, ByteDance Seed (China)7

Abstract

Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs) such as OpenAI-o1 and DeepSeek-R1, have demonstrated remarkable capabilities in complex domains such as logical reasoning and experimental coding. Motivated by these advancements, numerous studies have explored the application of AI in the innovation process, particularly in the context of scientific research. These AI technologies primarily aim to develop systems that can autonomously conduct research processes across a wide range of scientific disciplines. Despite these significant strides, a comprehensive survey on AI for Research (AI4Research) remains absent, which hampers our understanding and impedes further development in this field. To address this gap, we present a comprehensive survey and offer a unified perspective on AI4Research. Specifically, the main contributions of our work are as follows: (1) Systematic taxonomy: We first introduce a systematic taxonomy to classify six mainstream tasks in AI4Research. (2) New frontiers: Then, we identify key research gaps and highlight promising future directions, focusing on the rigor and scalability of automated experiments, as well as the societal impact. (3) Abundant resources: Finally, we compile a wealth of open-source resources, including relevant papers, data corpora, and leaderboards. We hope our work will provide the research community with quick access to these resources and stimulate innovative breakthroughs in AI4Research.

Paper List

1. AI for Scientific Comprehension

1.1 Textual Scientific Comprehension

  • Open-retrieval conversational question answering, Qu et al., Other Source Badge
  • A non-factoid question-answering taxonomy, Bolotova et al., Other Source Badge
  • How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora, Mansour et al., PDF Badge

1.1.1 Semi-Automatic Scientific Comprehension

  • Scholarchemqa: Unveiling the power of language models in chemical research question answering, Chen et al., arXiv Badge
  • Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers, Hilgert et al., PDF Badge
  • Are plain language summaries more readable than scientific abstracts? Evidence from six biomedical and life sciences journals, Wen et al., Other Source Badge
Human-Guided Scientific Comprehension
  • Clam: Selective clarification for ambiguous questions with generative language models, Kuhn et al., arXiv Badge
  • Clarify when necessary: Resolving ambiguity through interaction with lms, Zhang et al., arXiv Badge
  • Empowering language models with active inquiry for deeper understanding, Pang et al., arXiv Badge
  • Iqa-eval: Automatic evaluation of human-model interactive question answering, Li et al., NeurIPS Badge
  • The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search, Yamada et al., arXiv Badge
  • Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation, Yang et al., arXiv Badge
Tool-Augmented Scientific Comprehension
  • CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding, Wright et al., ACL Findings Badge
  • Scienceqa: A novel resource for question answering on scholarly articles, Saikh et al., Other Source Badge
  • Human and technological infrastructures of fact-checking, Juneja et al., Other Source Badge
  • Paperqa: Retrieval-augmented generative agent for scientific research, Lala et al., arXiv Badge
  • Efficacy analysis of online artificial intelligence fact-checking tools, Hartley et al., Other Source Badge
  • Language agents achieve superhuman synthesis of scientific knowledge, Skarlinski et al., arXiv Badge
  • Graphusion: a RAG framework for Knowledge Graph Construction with a global perspective, Yang et al., arXiv Badge
  • SciAgent: Tool-augmented Language Models for Scientific Reasoning, Ma et al., PDF Badge
  • Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks, Gosmar et al., arXiv Badge
  • MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation, Kim et al., arXiv Badge
  • Towards reasoning era: A survey of long chain-of-thought for reasoning large language models, Chen et al., arXiv Badge
  • Self-Critique Guided Iterative Reasoning for Multi-hop Question Answering, Chu et al., arXiv Badge
  • CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models, Zhang et al., arXiv Badge
Self-guided Scientific Comprehension
  • Boolq: Exploring the surprising difficulty of natural yes/no questions, Clark et al., arXiv Badge
  • SciBERT: A Pretrained Language Model for Scientific Text, Beltagy et al., PDF Badge
  • CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice, Raza et al., BMC bioinformatics Badge
  • Quaser: Question answering with scalable extractive rationalization, Ghoshal et al., Other Source Badge
  • Spaceqa: Answering questions about the design of space missions and space craft concepts, Garcia-Silva et al., Other Source Badge
  • What if: Generating code to answer simulation questions in chemistry texts, Peretz et al., Other Source Badge
  • Biomedlm: A 2.7 b parameter language model trained on biomedical text, Bolton et al., arXiv Badge
  • Scifibench: Benchmarking large multimodal models for scientific figure interpretation, Roberts et al., arXiv Badge
  • Scholarchemqa: Unveiling the power of language models in chemical research question answering, Chen et al., arXiv Badge
  • Mmsci: A dataset for graduate-level multi-discipline multimodal scientific understanding, Li et al., arXiv Badge
  • Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models, Li et al., PDF Badge
  • What are the essential factors in crafting effective long context multi-hop instruction datasets? insights and best practices, Chen et al., arXiv Badge
  • Fine-Tuning Large Language Models for Scientific Text Classification: A Comparative Study, Rostam et al., Other Source Badge
  • L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?, Tang et al., arXiv Badge
  • Toward expert-level medical question answering with large language models, Singhal et al., Nature Medicine Badge
  • A comprehensive survey on long context language modeling, Liu et al., arXiv Badge
  • A survey on transformer context extension: Approaches and evaluation, Liu et al., arXiv Badge
  • Scholarchemqa: Unveiling the power of language models in chemical research question answering, Chen et al., arXiv Badge
  • Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers, Hilgert et al., PDF Badge
  • Are plain language summaries more readable than scientific abstracts? Evidence from six biomedical and life sciences journals, Wen et al., Public Understanding of Science Badge

1.1.2 Full-Automatic Scientific Comprehension

Summarization-guided Automatic Scientific Comprehension
  • Straight from the scientist's mouth—plain language summaries promote laypeople's comprehension and knowledge acquisition when reading about individual research findings in psychology, Kerwer et al., Collabra: Psychology Badge
  • Hierarchical attention graph for scientific document summarization in global and local level, Zhao et al., arXiv Badge
  • Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?, Fonseca et al., PDF Badge
  • Autonomous LLM-Driven Research—from Data to Human-Verifiable Research Papers, Ifargan et al., NEJM AI Badge
Self-Questioning & Self-Reflection Automatic Scientific Comprehension
  • Large language models can self-improve, Huang et al., arXiv Badge
  • Selfcheck: Using llms to zero-shot check their own step-by-step reasoning, Miao et al., arXiv Badge
  • Enabling Language Models to Implicitly Learn Self-Improvement, Wang et al., arXiv Badge
  • Sciglm: Training scientific language models with self-reflective instruction annotation and tuning, Zhang et al., arXiv Badge
  • Generating Multiple Choice Questions from Scientific Literature via Large Language Models, Luo et al., Other Source Badge
  • SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation, Wan et al., arXiv Badge
  • Recursive introspection: Teaching language model agents how to self-improve, Qu et al., NeurIPS Badge
  • Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models, Song et al., arXiv Badge
  • FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights, Yu et al., arXiv Badge
  • Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment, Wu et al., arXiv Badge
  • Open-retrieval conversational question answering, Qu et al., Other Source Badge
  • A non-factoid question-answering taxonomy, Bolotova et al., Other Source Badge
  • How Well Do Large Language Models Extract Keywords? A Systematic Evaluation on Scientific Corpora, Mansour et al., PDF Badge

1.2 Table & Chart Scientific Comprehension

  • How well do large language models understand tables in materials science?, Circi et al., PDF Badge
  • ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models, Newman et al., PDF Badge
  • Sciverse: Unveiling the knowledge comprehension and visual reasoning of lmms on multi-modal scientific problems, Guo et al., arXiv Badge

1.2.1 Table Understanding

  • A survey on table-and-text hybridqa: Concepts, methods, challenges and future directions, Wang et al., arXiv Badge
  • Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding, Wang et al., PDF Badge
  • Improving demonstration diversity by human-free fusing for text-to-SQL, Wang et al., arXiv Badge
  • Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study, Sui et al., PDF Badge
  • Multimodal Table Understanding, Zheng et al., ACL Badge
  • Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale Table Understanding, Ji et al., arXiv Badge
  • Tablemaster: A recipe to advance table understanding with language models, Cao et al., arXiv Badge
  • A survey of table reasoning with large language models, Zhang et al., Frontiers of Computer Science Badge
  • The Mighty ToRR: A Benchmark for Table Reasoning and Robustness, Ashury-Tahan et al., arXiv Badge
  • Tablebench: A comprehensive and complex benchmark for table question answering, Wu et al., AAAI Badge

1.2.2 Chart Understanding

  • Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning, Meng et al., arXiv Badge
  • SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers, Pramanick et al., PDF Badge
  • ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning, Masry et al., PDF Badge
  • ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning, Meng et al., PDF Badge
  • SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark, Liang et al., PDF Badge
  • Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models, Li et al., PDF Badge
  • SynChart: Synthesizing Charts from Language Models, Liu et al., arXiv Badge
  • NovaChart: A Large-scale Dataset towards Chart Understanding and Generation of Multimodal Large Language Models, Hu et al., PDF Badge
  • ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild, Masry et al., PDF Badge
  • ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding, Huang et al., arXiv Badge
  • Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework, Yang et al., arXiv Badge
  • How well do large language models understand tables in materials science?, Circi et al., PDF Badge
  • ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models, Newman et al., PDF Badge
  • Sciverse: Unveiling the knowledge comprehension and visual reasoning of lmms on multi-modal scientific problems, Guo et al., arXiv Badge

2. AI for Academic Survey

  • Pre-writing: The stage of discovery in the writing process, Rohman et al., Other Source Badge
  • Paper recommender systems: a literature survey, Beel et al., Other Source Badge
  • A Review on Personalized Academic Paper Recommendation., Li et al., Other Source Badge
  • Insights into relevant knowledge extraction techniques: a comprehensive review, Shahid et al., Other Source Badge
  • A survey on rag meeting llms: Towards retrieval-augmented large language models, Fan et al., Other Source Badge
Semantic-Guided Retrieval
  • Scientific paper recommendation: A survey, Bai et al., Ieee Access Badge
  • SPLADE v2: Sparse lexical and expansion model for information retrieval, Formal et al., arXiv Badge
  • Scientific paper recommendation systems: a literature review of recent publications, Kreutz et al., Other Source Badge
  • Clinical Trial Retrieval via Multi-grained Similarity Learning, Luo et al., Other Source Badge
  • Related Work and Citation Text Generation: A Survey, Li et al., PDF Badge
  • MIR: Methodology Inspiration Retrieval for Scientific Research Problems, Garikaparthi et al., arXiv Badge
Graph-Guided Retrieval
  • From who you know to what you read: Augmenting scientific recommendations with implicit social networks, Kang et al., Other Source Badge
  • Comlittee: Literature discovery with personal elected author committees, Kang et al., Other Source Badge
  • Citationsum: Citation-aware graph contrastive learning for scientific paper summarization, Luo et al., Other Source Badge
  • Explaining relationships among research papers, Li et al., arXiv Badge
  • KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction, Boylan et al., arXiv Badge
  • An academic recommender system on large citation data based on clustering, graph modeling and deep learning, Stergiopoulos et al., Knowledge and Information Systems Badge
  • ArZiGo: A recommendation system for scientific articles, Pinedo et al., Information Systems Badge
  • Graphusion: a RAG framework for Knowledge Graph Construction with a global perspective, Yang et al., arXiv Badge
  • Taxonomy Tree Generation from Citation Graph, Hu et al., arXiv Badge
  • Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model, Ye et al., NeurIPS Badge
  • Docs2KG: A Human-LLM Collaborative Approach to Unified Knowledge Graph Construction from Heterogeneous Documents, Sun et al., Other Source Badge
LLM-Augmented Retrieval
  • Paperweaver: Enriching topical paper alerts by contextualizing recommended papers with user-collected papers, Lee et al., Other Source Badge
  • Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source Question-Answer Systems using Large Language Models, Seabra et al., arXiv Badge
  • Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, Singh et al., arXiv Badge
  • PaSa: An LLM Agent for Comprehensive Academic Paper Search, He et al., arXiv Badge
  • CuriousLLM: Elevating multi-document question answering with llm-enhanced knowledge graph reasoning, Yang et al., ACL Badge
  • Introducing Deep Research, {OpenAI} et al., PDF Badge
  • LitLLMs, LLMs for Literature Review: Are we there yet?, Agarwal et al., PDF Badge
  • Select, Read, and Write: A Multi-Agent Framework of Full-Text-based Related Work Generation, Liu et al., arXiv Badge
  • GPT-4o Search Preview, {OpenAI} et al., PDF Badge
  • WebDancer: Towards Autonomous Information Seeking Agency, Wu et al., arXiv Badge
  • Iterative self-incentivization empowers large language models as agentic searchers, Shi et al., arXiv Badge
  • Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework, Yang et al., arXiv Badge
  • DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents, Du et al., arXiv Badge
  • AcademicBrowse: Benchmarking Academic Browse Ability of LLMs, Zhou et al., arXiv Badge
  • Paper recommender systems: a literature survey, Beel et al., Other Source Badge
  • A Review on Personalized Academic Paper Recommendation., Li et al., Comput. Inf. Sci. Badge
  • Insights into relevant knowledge extraction techniques: a comprehensive review, Shahid et al., The Journal of Supercomputing Badge
  • A survey on rag meeting llms: Towards retrieval-augmented large language models, Fan et al., Other Source Badge

2.2 Overview Report Generation

  • Towards automated related work summarization, Hoang et al., Other Source Badge

2.2.1 Research Roadmap Mapping

  • Hierarchical catalogue generation for literature review: a benchmark, Zhu et al., arXiv Badge
  • Assisting in writing wikipedia-like articles from scratch with large language models, Shao et al., arXiv Badge
  • Chime: Llm-assisted hierarchical organization of scientific studies for literature review support, Hsu et al., arXiv Badge
  • Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature, Katz et al., arXiv Badge
  • Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning, Zhuang et al., PDF Badge
  • Artificial intelligence for literature reviews: Opportunities and challenges, Bolanos et al., Artificial Intelligence Review Badge
  • Taxonomy Tree Generation from Citation Graph, Hu et al., arXiv Badge
  • LLMs for Literature Review: Are we there yet?, Agarwal et al., arXiv Badge
  • Autosurvey: Large language models can automatically write surveys, Wang et al., NeurIPS Badge
  • SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing, Yan et al., arXiv Badge
  • Towards reasoning era: A survey of long chain-of-thought for reasoning large language models, Chen et al., arXiv Badge
  • Ai2 Scholar QA: Organized Literature Synthesis with Attribution, Singh et al., arXiv Badge
  • Towards automated related work summarization, Hoang et al., Other Source Badge
  • Capturing relations between scientific papers: An abstractive model for related work section generation, Chen et al., Other Source Badge
  • Target-aware abstractive related work generation with contrastive learning, Chen et al., Other Source Badge
  • The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study, Ovelman et al., Other Source Badge
  • Related Work and Citation Text Generation: A Survey, Li et al., PDF Badge
  • 376 Using a large language model to create lay summaries of clinical study descriptions, Kaiser et al., Other Source Badge
  • Select, Read, and Write: A Multi-Agent Framework of Full-Text-based Related Work Generation, Liu et al., arXiv Badge
Extractive Related Work.
  • Towards automated related work summarization, Hoang et al., Other Source Badge
  • Automatic generation of related work sections in scientific papers: an optimization approach, Hu et al., EMNLP Badge
  • Neural related work summarization with a joint context-driven attention mechanism, Wang et al., arXiv Badge
  • Automatic generation of related work through summarizing citations, Chen et al., Other Source Badge
  • Toc-rwg: Explore the combination of topic model and citation information for automatic related work generation, Wang et al., Ieee Access Badge
  • Automatic Related Work Section Generation by Sentence Extraction and Reordering., Deng et al., Other Source Badge
Generative Related Work.
  • Neural related work summarization with a joint context-driven attention mechanism, Wang et al., arXiv Badge
  • Automated lay language summarization of biomedical scientific reviews, Guo et al., AAAI Badge
  • BACO: A background knowledge-and content-based framework for citing sentence generation, Ge et al., ACL Badge
  • Capturing relations between scientific papers: An abstractive model for related work section generation, Chen et al., Other Source Badge
  • Target-aware abstractive related work generation with contrastive learning, Chen et al., Other Source Badge
  • Multi-document scientific summarization from a knowledge graph-centric view, Wang et al., arXiv Badge
  • Controllable citation sentence generation with language models, Gu et al., arXiv Badge
  • Causal intervention for abstractive related work generation, Liu et al., arXiv Badge
  • Cited text spans for citation text generation, Li et al., arXiv Badge
  • Towards a unified framework for reference retrieval and related work generation, Shi et al., EMNLP Findings Badge
  • Explaining relationships among research papers, Li et al., arXiv Badge
  • Shallow synthesis of knowledge in gpt-generated texts: A case study in automatic related work composition, Martin-Boyle et al., arXiv Badge
  • Related work and citation text generation: A survey, Li et al., arXiv Badge
  • RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization, Pu et al., NAACL Badge
  • Reinforced Subject-Aware Graph Neural Network for Related Work Generation, Yu et al., Other Source Badge
  • Disentangling Instructive Information from Ranked Multiple Candidates for Multi-Document Scientific Summarization, Wang et al., Other Source Badge
  • Toward Related Work Generation with Structure and Novelty Statement, Nishimura et al., Other Source Badge
  • Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization, Pratapa et al., arXiv Badge
  • Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization, Achkar et al., arXiv Badge
  • Towards automated related work summarization, Hoang et al., Other Source Badge
  • Capturing relations between scientific papers: An abstractive model for related work section generation, Chen et al., Other Source Badge
  • Target-aware abstractive related work generation with contrastive learning, Chen et al., Other Source Badge
  • The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study, Ovelman et al., Other Source Badge
  • Related Work and Citation Text Generation: A Survey, Li et al., PDF Badge
  • 376 Using a large language model to create lay summaries of clinical study descriptions, Kaiser et al., Other Source Badge
  • Select, Read, and Write: A Multi-Agent Framework of Full-Text-based Related Work Generation, Liu et al., arXiv Badge

2.2.3 Document-level Survey Generation

  • Analyzing the past to prepare for the future: Writing a literature review, Webster et al., MIS quarterly Badge
  • Hierarchical catalogue generation for literature review: a benchmark, Zhu et al., arXiv Badge
  • Bio-sieve: exploring instruction tuning large language models for systematic review automation, Robinson et al., arXiv Badge
  • Litllm: A toolkit for scientific literature review, Agarwal et al., arXiv Badge
  • Assisting in writing wikipedia-like articles from scratch with large language models, Shao et al., arXiv Badge
  • Artificial intelligence for literature reviews: Opportunities and challenges, Bolanos et al., Artificial Intelligence Review Badge
  • Language agents achieve superhuman synthesis of scientific knowledge, Skarlinski et al., arXiv Badge
  • Instruct Large Language Models to Generate Scientific Literature Survey Step by Step, Lai et al., NLPCC Badge
  • Openscholar: Synthesizing scientific literature with retrieval-augmented lms, Asai et al., arXiv Badge
  • Intelligent summaries: Will Artificial Intelligence mark the finale for biomedical literature reviews?, Galli et al., Learned Publishing Badge
  • Autosurvey: Large language models can automatically write surveys, Wang et al., NeurIPS Badge
  • LAG: LLM agents for Leaderboard Auto Generation on Demanding, Wu et al., arXiv Badge
  • SurveyX: Academic Survey Automation via Large Language Models, Liang et al., arXiv Badge
  • Automating research synthesis with domain-specific large language model fine-tuning, Susnjak et al., Other Source Badge
  • SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing, Yan et al., arXiv Badge
  • Towards automated related work summarization, Hoang et al., Other Source Badge
  • Pre-writing: The stage of discovery in the writing process, Rohman et al., College Composition & Communication Badge

3. AI for Scientific Discovery

  • Scientific discovery in the age of artificial intelligence, Wang et al., Nature Badge
  • Beyond Benchmarking: Automated Capability Discovery via Model Self-Exploration, Lu et al., Other Source Badge
  • AIRUS: a simple workflow for AI-assisted exploration of scientific data, Harris et al., Other Source Badge
  • On the Rise of New Mathematical Spaces and Towards AI-Driven Scientific Discovery, Raeini et al., Other Source Badge
  • From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models, He et al., arXiv Badge
  • AI-Driven Discovery: The Transformative Impact of Machine Learning on Research and Development, Roy et al., Other Source Badge

3.1 Idea Mining

  • Can Large Language Models Unlock Novel Scientific Research Ideas?, Kumar et al., arXiv Badge
  • Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers, Si et al., arXiv Badge
  • LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research, Gu et al., arXiv Badge
  • Large language models for causal hypothesis generation in science, Cohrs et al., Other Source Badge
  • Futuregen: Llm-rag approach to generate the future work of scientific article, Azher et al., arXiv Badge
  • ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition, Liu et al., arXiv Badge
  • Sparks of science: Hypothesis generation using structured paper data, O'Neill et al., arXiv Badge
  • Spark: A System for Scientifically Creative Idea Generation, Sanyal et al., arXiv Badge
  • CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature, Sternlicht et al., arXiv Badge
  • Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation, Lin et al., arXiv Badge

3.1.1 Idea Mining from Internal Knowledge

  • Ideas are dimes a dozen: Large language models for idea generation in innovation, Girotra et al., Other Source Badge
  • Prompting Diverse Ideas: Increasing AI Idea Variance, Meincke et al., Other Source Badge
  • Using Large Language Models for Idea Generation in Innovation, Meincke et al., Other Source Badge
  • Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers, Si et al., arXiv Badge
  • Can Large Language Models Unlock Novel Scientific Research Ideas?, Kumar et al., arXiv Badge
  • ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model, Chen et al., arXiv Badge
  • Structuring Scientific Innovation: A Framework for Modeling and Discovering Impactful Knowledge Combinations, Chen et al., arXiv Badge
  • Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science, Liu et al., arXiv Badge
  • Enhance Innovation by Boosting Idea Generation with Large Language Models, Haarmann et al., INFORMS Journal on Computing Badge

3.1.2 Idea Mining from External Signal

Idea Mining from External Knowledge
  • Literature based discovery: models, methods, and trends, Henry et al., Journal of biomedical informatics Badge
  • Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network, Krenn et al., arXiv Badge
  • A survey of large language models, Zhao et al., arXiv Badge
  • Large language models meet nlp: A survey, Qin et al., arXiv Badge
  • Position: data-driven discovery with large generative models, Majumder et al., ICML Badge
  • Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models, Gu et al., arXiv Badge
  • Interesting scientific idea generation using knowledge graphs and llms: Evaluations with 100 research group leaders, Gu et al., arXiv Badge
  • Scimon: Scientific inspiration machines optimized for novelty, Wang et al., ACL Badge
  • Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning, Buehler et al., Other Source Badge
  • Literature meets data: A synergistic approach to hypothesis generation, Liu et al., arXiv Badge
  • Chain of ideas: Revolutionizing research via novel idea development with llm agents, Li et al., arXiv Badge
  • SciPIP: An LLM-based Scientific Paper Idea Proposer, Wang et al., arXiv Badge
  • LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research, Gu et al., arXiv Badge
  • Learning to Generate Research Idea with Dynamic Control, Li et al., arXiv Badge
  • Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI Research Idea Generation, Gao et al., arXiv Badge
  • Sparks of science: Hypothesis generation using structured paper data, O'Neill et al., arXiv Badge
Idea Mining from External Environment Feedback
  • gpt-researcher, Assafelovic et al., PDF Badge
  • Mlagentbench: Evaluating language agents on machine learning experimentation, Huang et al., arXiv Badge
  • Researchagent: Iterative research idea generation over scientific literature with large language models, Baek et al., arXiv Badge
  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • MatPilot: an LLM-enabled AI Materials Scientist under the Framework of Human-Machine Collaboration, Ni et al., arXiv Badge
  • The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation, Swanson et al., bioRxiv Badge
  • Agent laboratory: Using llm agents as research assistants, Schmidgall et al., arXiv Badge
  • LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling Discovery of New Ionizable Lipid Designs for mRNA Delivery, Cui et al., BioRxiv Badge
  • Towards an AI co-scientist, Gottweis et al., arXiv Badge
  • Zochi Technical Report, AI et al., PDF Badge
  • AgentRxiv: Towards Collaborative Autonomous Research, Schmidgall et al., arXiv Badge
  • Carl Technical Report, Institute et al., PDF Badge
  • Ideasynth: Iterative research idea development through evolving and composing idea facets with literature-grounded feedback, Pu et al., Other Source Badge

3.1.3 Idea Mining from Team discussion

AI-AI Collaboration
  • Large language models for automated open-domain scientific hypotheses discovery, Yang et al., arXiv Badge
  • Exploring collaboration mechanisms for llm agents: A social psychology view, Zhang et al., arXiv Badge
  • Acceleron: A tool to accelerate research ideation, Nigam et al., arXiv Badge
  • Hypothesis generation with large language models, Zhou et al., arXiv Badge
  • Researchagent: Iterative research idea generation over scientific literature with large language models, Baek et al., arXiv Badge
  • Llm and simulation as bilevel optimizers: A new paradigm to advance physical scientific discovery, Ma et al., arXiv Badge
  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • Sciagents: Automating scientific discovery through multi-agent intelligent graph reasoning, Ghafarollahi et al., arXiv Badge
  • Two heads are better than one: A multi-agent system has the potential to improve scientific idea generation, Su et al., arXiv Badge
  • Chain of ideas: Revolutionizing research via novel idea development with llm agents, Li et al., arXiv Badge
  • Nova: An iterative planning and search approach to enhance novelty and diversity of llm generated ideas, Hu et al., arXiv Badge
  • The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation, Swanson et al., bioRxiv Badge
  • AIGS: Generating Science from AI-Powered Automated Falsification, Liu et al., arXiv Badge
  • Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses, Yang et al., Other Source Badge
  • Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback, Yuan et al., arXiv Badge
  • Multi-Novelty: Improve the Diversity and Novelty of Contents Generated by Large Language Models via inference-time Multi-Views Brainstorming, Lagzian et al., arXiv Badge
  • Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation, Sinha et al., arXiv Badge
  • PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration, Pu et al., arXiv Badge
Human-AI Collaboration
  • An Interactive Co-Pilot for Accelerated Research Ideation, Nigam et al., PDF Badge
  • Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination, Radensky et al., arXiv Badge
  • MatPilot: an LLM-enabled AI Materials Scientist under the Framework of Human-Machine Collaboration, Ni et al., arXiv Badge
  • IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery, Garikaparthi et al., arXiv Badge
  • Human creativity in the age of llms: Randomized experiments on divergent and convergent thinking, Kumar et al., Other Source Badge
  • Can Large Language Models Unlock Novel Scientific Research Ideas?, Kumar et al., arXiv Badge
  • Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers, Si et al., arXiv Badge
  • LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research, Gu et al., arXiv Badge
  • Large language models for causal hypothesis generation in science, Cohrs et al., Other Source Badge
  • Futuregen: Llm-rag approach to generate the future work of scientific article, Azher et al., arXiv Badge
  • ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition, Liu et al., arXiv Badge
  • Sparks of science: Hypothesis generation using structured paper data, O'Neill et al., arXiv Badge
  • Spark: A System for Scientifically Creative Idea Generation, Sanyal et al., arXiv Badge
  • CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature, Sternlicht et al., arXiv Badge
  • Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation, Lin et al., arXiv Badge

3.2 Novelty & Significance Assessment

  • Does writing with language models reduce content diversity?, Padmakumar et al., arXiv Badge
  • Greater variability in judgements of the value of novel ideas, Johnson et al., Nature Badge
  • How AI ideas affect the creativity, diversity, and evolution of human ideas: evidence from a large, dynamic experiment, Ashkinaze et al., arXiv Badge
  • A content-based novelty measure for scholarly publications: A proof of concept, Wang et al., Other Source Badge
  • Art or artifice? large language models and the false promise of creativity, Chakrabarty et al., Other Source Badge
  • How ai processing delays foster creativity: Exploring research question co-creation with an llm-based agent, Liu et al., Other Source Badge
  • Homogenization effects of large language models on human creative ideation, Anderson et al., Other Source Badge
  • Shared imagination: Llms hallucinate alike, Zhou et al., arXiv Badge
  • Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers, Si et al., arXiv Badge
  • Supporting Assessment of Novelty of Design Problems Using Concept of Problem SAPPhIRE, Singh et al., arXiv Badge
  • Semi-Supervised Classification With Novelty Detection Using Support Vector Machines and Linear Discriminant Analysis, Dove et al., Other Source Badge
  • Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art, Ikoma et al., arXiv Badge
  • How do Humans and Language Models Reason About Creativity? A Comparative Analysis, Laverghetta Jr et al., arXiv Badge
  • Grapheval: A lightweight graph-based llm framework for idea evaluation, Feng et al., arXiv Badge
  • SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings, Keya et al., arXiv Badge
  • Enabling ai scientists to recognize innovation: A domain-agnostic algorithm for assessing novelty, Wang et al., arXiv Badge
  • SC4ANM: Identifying optimal section combinations for automated novelty prediction in academic papers, Wu et al., Expert Systems with Applications Badge

3.3 Theory Analysis

3.3.1 Scientific Claim Formalization

  • LF: a foundational higher-order-logic, Goodsell et al., arXiv Badge
  • Natural Language Hypotheses in Scientific Papers and How to Tame Them: Suggested Steps for Formalizing Complex Scientific Claims, Heger et al., Other Source Badge
  • Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning, Yan et al., arXiv Badge
  • Sciclaimhunt: A large dataset for evidence-based scientific claim verification, Kumar et al., arXiv Badge
  • Towards Effective Extraction and Evaluation of Factual Claims, Metropolitansky et al., arXiv Badge
  • NSF-SciFy: Mining the NSF Awards Database for Scientific Claims, Rao et al., arXiv Badge
  • Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks, Ganguly et al., arXiv Badge
  • Valsci: an open-source, self-hostable literature review utility for automated large-batch scientific claim verification using large language models, Edelman et al., BMC bioinformatics Badge

3.3.2 Scientific Evidence Collection

  • MultiVerS: Improving scientific claim verification with weak supervision and full-document context, Wadden et al., arXiv Badge
  • Missing counter-evidence renders NLP fact-checking unrealistic for misinformation, Glockner et al., arXiv Badge
  • Investigating zero-and few-shot generalization in fact verification, Pan et al., arXiv Badge
  • Comparing knowledge sources for open-domain scientific claim verification, Vladika et al., arXiv Badge
  • Understanding Fine-grained Distortions in Reports of Scientific Findings, W{\"u}hrl et al., arXiv Badge
  • Improving health question answering with reliable and time-aware evidence retrieval, Vladika et al., arXiv Badge
  • Zero-shot scientific claim verification using LLMs and citation text, Alvarez et al., Other Source Badge
  • Grounding fallacies misrepresenting scientific publications in evidence, Glockner et al., arXiv Badge
  • Can foundation models actively gather information in interactive environments to test hypotheses?, Ke et al., arXiv Badge
  • LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification, Wang et al., arXiv Badge
  • SciClaims: An End-to-End Generative System for Biomedical Claim Analysis, Ortega et al., arXiv Badge

3.3.3 Scientific Verification Analysis

  • Proofver: Natural logic theorem proving for fact verification, Krishna et al., TACL Badge
  • The state of human-centered NLP technology for fact-checking, Das et al., Information processing & management Badge
  • aedFaCT: Scientific Fact-Checking Made Easier via Semi-Automatic Discovery of Relevant Expert Opinions, Altuncu et al., arXiv Badge
  • FactKG: Fact verification via reasoning on knowledge graphs, Kim et al., arXiv Badge
  • Fact-checking complex claims with program-guided reasoning, Pan et al., arXiv Badge
  • Prompt to be consistent is better than self-consistent? few-shot and zero-shot fact verification with pre-trained language models, Zeng et al., arXiv Badge
  • Unsupervised Pretraining for Fact Verification by Language Model Distillation, Bazaga et al., arXiv Badge
  • Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method, Zhang et al., arXiv Badge
  • Characterizing and Verifying Scientific Claims: Qualitative Causal Structure is All You Need, Wu et al., EMNLP Badge
  • Can Large Language Models Detect Misinformation in Scientific News Reporting?, Cao et al., arXiv Badge
  • What makes medical claims (un) verifiable? analyzing entity and relation properties for fact verification, W{\"u}hrl et al., arXiv Badge
  • ClaimVer: Explainable claim-level verification and evidence attribution of text through knowledge graphs, Dammu et al., arXiv Badge
  • Generating fact checking explanations, Atanasova et al., Other Source Badge
  • MAGIC: Multi-Argument Generation with Self-Refinement for Domain Generalization in Automatic Fact-Checking, Kao et al., COLING Badge
  • Robust Claim Verification Through Fact Detection, Jafari et al., arXiv Badge
  • Automated justification production for claim veracity in fact checking: A survey on architectures and approaches, Eldifrawi et al., arXiv Badge
  • Enhancing natural language inference performance with knowledge graph for COVID-19 automated fact-checking in Indonesian language, Muharram et al., arXiv Badge
  • Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs, Zhang et al., arXiv Badge
  • DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts, Braun et al., arXiv Badge
  • TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding, Ku et al., arXiv Badge
  • Explainable Biomedical Claim Verification with Large Language Models, Liang et al., arXiv Badge
  • Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?, GX-Chen et al., arXiv Badge

3.3.4 Theorem Proving

  • Generative language modeling for automated theorem proving, Polu et al., arXiv Badge
  • Draft, sketch, and prove: Guiding formal theorem provers with informal proofs, Jiang et al., arXiv Badge
  • Hypertree proof search for neural theorem proving, Lample et al., NeurIPS Badge
  • Thor: Wielding hammers to integrate language models and automated theorem provers, Jiang et al., NeurIPS Badge
  • Decomposing the enigma: Subgoal-based demonstration learning for formal theorem proving, Zhao et al., arXiv Badge
  • Dt-solver: Automated theorem proving with dynamic-tree sampling guided by proof-level value function, Wang et al., ACL Badge
  • Lego-prover: Neural theorem proving with growing libraries, Wang et al., arXiv Badge
  • Baldur: Whole-proof generation and repair with large language models, First et al., Other Source Badge
  • Mustard: Mastering uniform synthesis of theorem and proof data, Huang et al., arXiv Badge
  • A survey on deep learning for theorem proving, Li et al., arXiv Badge
  • Towards large language models as copilots for theorem proving in lean, Song et al., arXiv Badge
  • Proving theorems recursively, Wang et al., arXiv Badge
  • Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data, Xin et al., arXiv Badge
  • Lean-star: Learning to interleave thinking and proving, Lin et al., arXiv Badge
  • Data for mathematical copilots: Better ways of presenting proofs for machine learning, Frieder et al., arXiv Badge
  • Deep Active Learning based Experimental Design to Uncover Synergistic Genetic Interactions for Host Targeted Therapeutics, Zhu et al., arXiv Badge
  • Discovering Symbolic Differential Equations with Symmetry Invariants, Yang et al., arXiv Badge

3.4 Scientific Experiment Conduction

  • Toward machine learning optimization of experimental design, Baydin et al., Other Source Badge
  • AI-assisted design of experiments at the frontiers of computation: methods and new perspectives, Vischia et al., arXiv Badge
  • AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research, Chen et al., arXiv Badge
  • EXP-Bench: Can AI Conduct AI Research Experiments?, Kon et al., arXiv Badge
  • AI Scientists Fail Without Strong Implementation Capability, Zhu et al., arXiv Badge

3.4.1 Experiment Design

  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • Sciagents: Automating scientific discovery through multi-agent intelligent graph reasoning, Ghafarollahi et al., arXiv Badge
  • MatPilot: an LLM-enabled AI Materials Scientist under the Framework of Human-Machine Collaboration, Ni et al., arXiv Badge
  • AI-assisted design of experiments at the frontiers of computation: methods and new perspectives, Vischia et al., arXiv Badge
  • LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling Discovery of New Ionizable Lipid Designs for mRNA Delivery, Cui et al., Other Source Badge
  • Towards an AI co-scientist, Gottweis et al., arXiv Badge
Semi-Automatic Experiment Design
  • AI-assisted inverse design of sequence-ordered high intrinsic thermal conductivity polymers, Huang et al., Materials Today Physics Badge
  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • Meta-Designing Quantum Experiments with Language Models, Arlt et al., arXiv Badge
  • MatPilot: an LLM-enabled AI Materials Scientist under the Framework of Human-Machine Collaboration, Ni et al., arXiv Badge
  • The application of artificial intelligence-assisted technology in cultural and creative product design, Liang et al., Scientific Reports Badge
  • A Human-LLM Note-Taking System with Case-Based Reasoning as Framework for Scientific Discovery, Craig et al., PDF Badge
Full-Automatic Experiment Design
  • Researchagent: Iterative research idea generation over scientific literature with large language models, Baek et al., arXiv Badge
  • Biodiscoveryagent: An ai agent for designing genetic perturbation experiments, Roohani et al., arXiv Badge
  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation, Swanson et al., bioRxiv Badge
  • Large Language Model Assisted Experiment Design with Generative Human-Behavior Agents, Liu et al., Other Source Badge
  • Agent laboratory: Using llm agents as research assistants, Schmidgall et al., arXiv Badge
  • Carl Technical Report, Institute et al., PDF Badge
  • Zochi Technical Report, AI et al., PDF Badge
  • AgentRxiv: Towards Collaborative Autonomous Research, Schmidgall et al., arXiv Badge
  • Robin: A multi-agent system for automating scientific discovery, Ghareeb et al., arXiv Badge
  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • Sciagents: Automating scientific discovery through multi-agent intelligent graph reasoning, Ghafarollahi et al., arXiv Badge
  • MatPilot: an LLM-enabled AI Materials Scientist under the Framework of Human-Machine Collaboration, Ni et al., arXiv Badge
  • AI-assisted design of experiments at the frontiers of computation: methods and new perspectives, Vischia et al., arXiv Badge
  • LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling Discovery of New Ionizable Lipid Designs for mRNA Delivery, Cui et al., BioRxiv Badge
  • Towards an AI co-scientist, Gottweis et al., arXiv Badge

3.4.2 Pre-Experiment Estimation

Evaluative Prediction
  • DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response Evaluation, Wu et al., arXiv Badge
  • Physical formula enhanced multi-task learning for pharmacokinetics prediction, Li et al., arXiv Badge
  • MASSW: A new dataset and benchmark tasks for ai-assisted scientific workflows, Zhang et al., arXiv Badge
  • Unimatch: Universal matching from atom to task for few-shot drug discovery, Li et al., arXiv Badge
  • LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling Discovery of New Ionizable Lipid Designs for mRNA Delivery, Cui et al., BioRxiv Badge
  • Predicting Empirical AI Research Outcomes with Language Models, Wen et al., arXiv Badge
  • Large language models surpass human experts in predicting neuroscience results, Luo et al., Nature Badge
Exploratory Forecasting
  • Automatic chemical design using a data-driven continuous representation of molecules, G{\'o}mez-Bombarelli et al., ACS central science Badge
  • MolGAN: An implicit generative model for small molecular graphs, De Cao et al., arXiv Badge
  • Google DeepMind's AI Dreamed Up 380,000 New Materials. The Next Challenge Is Making Them, Barber et al., PDF Badge
  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • MASSW: A new dataset and benchmark tasks for ai-assisted scientific workflows, Zhang et al., arXiv Badge
  • The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation, Swanson et al., bioRxiv Badge
  • Towards an AI co-scientist, Gottweis et al., arXiv Badge
  • FlavorDiffusion: Modeling Food-Chemical Interactions with Diffusion, Seo et al., PDF Badge
  • MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback, Liu et al., arXiv Badge

3.4.3 Experiment Management

  • Transforming science labs into automated factories of discovery, Angelopoulos et al., Other Source Badge
  • Development of an Automated Workflow for Screening the Assembly and Host--Guest Behavior of Metal-Organic Cages Towards Accelerated Discovery, Basford et al., Other Source Badge
  • AI Driven Experiment Calibration and Control, Britton et al., Other Source Badge
  • Agents for self-driving laboratories applied to quantum computing, Cao et al., arXiv Badge
  • Intelligent experiments through real-time AI: Fast Data Processing and Autonomous Detector Control for sPHENIX and future EIC detectors, Kvapil et al., arXiv Badge
  • Artificial intelligence meets laboratory automation in discovery and synthesis of metal--organic frameworks: A review, Zhao et al., Other Source Badge
  • Agents for Change: Artificial Intelligent Workflows for Quantitative Clinical Pharmacology and Translational Sciences, Shahin et al., Other Source Badge
  • Science acceleration and accessibility with self-driving labs, Canty et al., Nature Communications Badge
  • Accelerating drug discovery with Artificial: a whole-lab orchestration and scheduling system for self-driving labs, Fehlis et al., arXiv Badge
  • Uncovering Bottlenecks and Optimizing Scientific Lab Workflows with Cycle Time Reduction Agents, Fehlis et al., arXiv Badge
  • Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research, Hatakeyama-Sato et al., arXiv Badge
Open-Loop Management
  • The future of self-driving laboratories: from human in the loop interactive AI to gamification, Hysmith et al., Digital Discovery Badge
  • Self-driving labs are the new AI asset, {Axios} et al., PDF Badge
  • DeepMind and BioNTech build AI lab assistants for scientific research, Times} et al., PDF Badge
  • Autonomous platform for solution processing of electronic polymers, Wang et al., Nature Badge
  • Machine learning-led semi-automated medium optimization reveals salt as key for flaviolin production in Pseudomonas putida, Zournas et al., Communications Biology Badge
Close-Loop Management
  • Functional genomic hypothesis generation and experimentation by a robot scientist, King et al., Nature Badge
  • Self-driving laboratory for accelerated discovery of thin-film materials, MacLeod et al., Science Advances Badge
  • Self-driving laboratories for chemistry and materials science, Tom et al., Chemical Reviews Badge
  • Autonomous platform for solution processing of electronic polymers, Wang et al., Nature Badge
  • Self-driving laboratory platform for many-objective self-optimisation of polymer nanoparticle synthesis with cloud-integrated machine learning and orthogonal online analytics, Knox et al., Polymer Chemistry Badge
  • Transforming science labs into automated factories of discovery, Angelopoulos et al., Science Robotics Badge
  • Development of an Automated Workflow for Screening the Assembly and Host--Guest Behavior of Metal-Organic Cages Towards Accelerated Discovery, Basford et al., Angewandte Chemie International Edition Badge
  • AI Driven Experiment Calibration and Control, Britton et al., Other Source Badge
  • Agents for self-driving laboratories applied to quantum computing, Cao et al., arXiv Badge
  • Intelligent experiments through real-time AI: Fast Data Processing and Autonomous Detector Control for sPHENIX and future EIC detectors, Kvapil et al., arXiv Badge
  • Artificial intelligence meets laboratory automation in discovery and synthesis of metal--organic frameworks: A review, Zhao et al., Other Source Badge
  • Agents for Change: Artificial Intelligent Workflows for Quantitative Clinical Pharmacology and Translational Sciences, Shahin et al., Clinical and Translational Science Badge
  • Science acceleration and accessibility with self-driving labs, Canty et al., Nature Communications Badge
  • Accelerating drug discovery with Artificial: a whole-lab orchestration and scheduling system for self-driving labs, Fehlis et al., arXiv Badge
  • Uncovering Bottlenecks and Optimizing Scientific Lab Workflows with Cycle Time Reduction Agents, Fehlis et al., arXiv Badge
  • Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research, Hatakeyama-Sato et al., arXiv Badge

3.4.4 Experimental Conduction

Automated Machine Learning Experiment Conduction
  • AIDE: Human-Level Performance on Data Science Competitions, Dominik et al., PDF Badge
  • Automl-gpt: Automatic machine learning with gpt, Zhang et al., arXiv Badge
  • Automl in the age of large language models: Current challenges, future opportunities and risks, Tornede et al., arXiv Badge
  • Opendevin: An open platform for ai software developers as generalist agents, Wang et al., arXiv Badge
  • Mlr-copilot: Autonomous machine learning research based on large language models agents, Li et al., arXiv Badge
  • Autokaggle: A multi-agent framework for autonomous data science competitions, Li et al., arXiv Badge
  • Large language models orchestrating structured reasoning achieve kaggle grandmaster level, Grosnit et al., arXiv Badge
  • MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?, Zhang et al., arXiv Badge
  • AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage, Zhao et al., arXiv Badge
  • Variable Extraction for Model Recovery in Scientific Literature, Liu et al., PDF Badge
  • AlphaEvolve: A coding agent for scientific and algorithmic discovery, Novikov et al., Google DeepMind Badge
Real-world Experimental Simulation & Conduction.
  • Large language models can self-improve, Huang et al., arXiv Badge
  • Mlcopilot: Unleashing the power of large language models in solving machine learning tasks, Zhang et al., arXiv Badge
  • Training socially aligned language models in simulated human society, Liu et al., arXiv Badge
  • Toolllm: Facilitating large language models to master 16000+ real-world apis, Qin et al., arXiv Badge
  • An autonomous laboratory for the accelerated synthesis of novel materials, Szymanski et al., Nature Badge
  • Autonomous chemical research with large language models, Boiko et al., Nature Badge
  • Reflexion: Language agents with verbal reinforcement learning, Shinn et al., NeurIPS Badge
  • Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings, Hao et al., NeurIPS Badge
  • Toolformer: Language models can teach themselves to use tools, Schick et al., NeurIPS Badge
  • scGPT: toward building a foundation model for single-cell multi-omics using generative AI, Cui et al., Nature Badge
  • Large language model agent for hyper-parameter optimization, Liu et al., arXiv Badge
  • MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge, Ni et al., Extreme Mechanics Letters Badge
  • Researchagent: Iterative research idea generation over scientific literature with large language models, Baek et al., arXiv Badge
  • Automated social science: Language models as scientist and subjects, Manning et al., Other Source Badge
  • Crispr-gpt: An llm agent for automated design of gene-editing experiments, Huang et al., arXiv Badge
  • Position: LLMs can’t plan, but can help planning in LLM-modulo frameworks, Kambhampati et al., ICML Badge
  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • Mlr-copilot: Autonomous machine learning research based on large language models agents, Li et al., arXiv Badge
  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • Sciagents: Automating scientific discovery through multi-agent intelligent graph reasoning, Ghafarollahi et al., arXiv Badge
  • Wrong-of-thought: An integrated reasoning framework with multi-perspective verification and wrong information, Zhang et al., arXiv Badge
  • Simulating Tabular Datasets through LLMs to Rapidly Explore Hypotheses about Real-World Entities, Zabaleta et al., arXiv Badge
  • An automatic end-to-end chemical synthesis development platform powered by large language models, Ruan et al., Nature Badge
  • MatPilot: an LLM-enabled AI Materials Scientist under the Framework of Human-Machine Collaboration, Ni et al., arXiv Badge
  • Towards LLM-Driven Multi-Agent Pipeline for Drug Discovery: Neurodegenerative Diseases Case Study, Solovev et al., Other Source Badge
  • From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents, Mou et al., arXiv Badge
  • On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Evaluation Framework, Siddiqui et al., PDF Badge
  • PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents, Lee et al., arXiv Badge
  • Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback, Yuan et al., arXiv Badge
  • DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective, Peng et al., arXiv Badge
  • Simulating cooperative prosocial behavior with multi-agent LLMs: Evidence and mechanisms for AI agents to inform policy decisions, Sreedhar et al., Other Source Badge
  • Reinforcing clinical decision support through multi-agent systems and ethical ai governance, Chen et al., arXiv Badge
  • OpenFOAMGPT 2.0: end-to-end, trustworthy automation for computational fluid dynamics, Feng et al., arXiv Badge
  • Researchcodeagent: An llm multi-agent system for automated codification of research methodologies, Gandhi et al., arXiv Badge
  • The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search, Yamada et al., arXiv Badge
  • MooseAgent: A LLM Based Multi-agent Framework for Automating Moose Simulation, Zhang et al., arXiv Badge
  • Owl: Optimized workforce learning for general multi-agent assistance in real-world task automation, Hu et al., arXiv Badge

3.4.5 Experimental Analysis

Automated Evaluation Metrics
  • Eight years of AutoML: categorisation, review and trends, Barbudo et al., Knowledge and Information Systems Badge
  • Efficient bayesian learning curve extrapolation using prior-data fitted networks, Adriaensen et al., NeurIPS Badge
  • Automated machine learning: past, present and future, Baratchi et al., Artificial intelligence review Badge
Theoretical Consistency Analysis
  • Variable Extraction for Model Recovery in Scientific Literature, Liu et al., PDF Badge
  • AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage, Zhao et al., arXiv Badge
Exploratory Analysis
  • HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation, Bian et al., arXiv Badge
  • Table meets llm: Can large language models understand structured table data? a benchmark and empirical study, Sui et al., Other Source Badge
  • Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning, Xing et al., arXiv Badge
  • LLM Based Exploratory Data Analysis Using BigQuery Data Canvas, Chaudhuri et al., PDF Badge
  • Toward machine learning optimization of experimental design, Baydin et al., Nuclear Physics News Badge
  • AI-assisted design of experiments at the frontiers of computation: methods and new perspectives, Vischia et al., arXiv Badge
  • AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research, Chen et al., arXiv Badge
  • EXP-Bench: Can AI Conduct AI Research Experiments?, Kon et al., arXiv Badge
  • AI Scientists Fail Without Strong Implementation Capability, Zhu et al., arXiv Badge

3.5 Full-Automatic Discovery

  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • Aviary: training language agents on challenging scientific tasks, Narayanan et al., arXiv Badge
  • Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback, Yuan et al., arXiv Badge
  • Autonomous Microscopy Experiments through Large Language Model Agents, Mandal et al., arXiv Badge
  • Agent laboratory: Using llm agents as research assistants, Schmidgall et al., arXiv Badge
  • Curie: Toward rigorous and automated scientific experimentation with ai agents, Kon et al., arXiv Badge
  • DORA AI Scientist: Multi-agent Virtual Research Team for Scientific Exploration Discovery and Automated Report Generation, Naumov et al., bioRxiv Badge
  • Carl Technical Report, Institute et al., PDF Badge
  • AgentRxiv: Towards Collaborative Autonomous Research, Schmidgall et al., arXiv Badge
  • Zochi Technical Report, AI et al., PDF Badge
  • NovelSeek: When Agent Becomes the Scientist--Building Closed-Loop System from Hypothesis to Verification, Team et al., arXiv Badge
  • AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists, Li et al., arXiv Badge
  • VISION: A modular AI assistant for natural human-instrument interaction at scientific user facilities, Mathur et al., Other Source Badge
  • Scientific discovery in the age of artificial intelligence, Wang et al., Nature Badge
  • Beyond Benchmarking: Automated Capability Discovery via Model Self-Exploration, Lu et al., Other Source Badge
  • AIRUS: a simple workflow for AI-assisted exploration of scientific data, Harris et al., bioRxiv Badge
  • On the Rise of New Mathematical Spaces and Towards AI-Driven Scientific Discovery, Raeini et al., Available at SSRN Badge
  • From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models, He et al., arXiv Badge
  • AI-Driven Discovery: The Transformative Impact of Machine Learning on Research and Development, Roy et al., Other Source Badge

4. AI for Academic Writing

  • Using artificial intelligence in academic writing and research: An essential productivity tool, Khalifa et al., Other Source Badge
  • Human-LLM Coevolution: Evidence from Academic Writing, Geng et al., arXiv Badge
  • Large language models penetration in scholarly writing and peer review, Zhou et al., arXiv Badge
  • And Plato met ChatGPT: an ethical reflection on the use of chatbots in scientific research writing, with a particular focus on the social sciences, Calderon et al., Other Source Badge

4.1 Semi-Automatic Academic Writing

4.1.1 Assistance During Manuscript Preparation

Title Formulation and Optimization
  • Personalized Graph-Based Retrieval for Large Language Models, Au et al., arXiv Badge
  • Generating Accurate and Engaging Research Paper Titles Using NLP Techniques, Bikku et al., Other Source Badge
  • MoDeST: A dataset for Multi Domain Scientific Title Generation, B{\"o}l{\"u}c{\"u} et al., Knowledge-Based Systems Badge
  • Can pre-trained language models generate titles for research papers?, Rehman et al., Other Source Badge
Overall Logical Structure Guidance
  • LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models, Sun et al., CoLM Badge
  • LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts, Hashemi et al., PDF Badge

4.1.2 Assistance During Manuscript Writing

  • Enhancing academic writing skills and motivation: assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students, Song et al., Other Source Badge
  • Human-AI collaboration patterns in AI-assisted academic writing, Nguyen et al., Other Source Badge
  • Patterns and Purposes: A Cross-Journal Analysis of AI Tool Usage in Academic Writing, Xu et al., arXiv Badge
  • Divergent llm adoption and heterogeneous convergence paths in research writing, Lin et al., arXiv Badge
  • Artificial intelligence-assisted academic writing: recommendations for ethical use, Cheng et al., Other Source Badge
Drawing Figures and Charts
  • Text2chart: A multi-staged chart generator from natural language text, Rashid et al., Other Source Badge
  • ChartReader: A unified framework for chart derendering and comprehension without heuristic rules, Cheng et al., ICCV Badge
  • Figgen: Text to scientific figure generation, Rodriguez et al., arXiv Badge
  • Automatikz: Text-guided synthesis of scientific vector graphics with tikz, Belouadi et al., arXiv Badge
  • Scicapenter: Supporting caption composition for scientific figures with machine-generated captions and ratings, Hsu et al., Other Source Badge
  • ChartFormer: A large vision language model for converting chart images into tactile accessible SVGs, Moured et al., Other Source Badge
  • Figuring out Figures: Using Textual References to Caption Scientific Figures, Cao et al., arXiv Badge
  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • AiSciVision: A Framework for Specializing Large Multimodal Models in Scientific Image Classification, Hogan et al., arXiv Badge
  • ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?, Zhang et al., arXiv Badge
  • Chartcoder: Advancing multimodal large language model for chart-to-code generation, Zhao et al., arXiv Badge
  • Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing, Yin et al., arXiv Badge
  • Multi-LLM Collaborative Caption Generation in Scientific Documents, Kim et al., arXiv Badge
  • TikZero: Zero-Shot Text-Guided Graphics Program Synthesis, Belouadi et al., arXiv Badge
  • Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning, Zhang et al., arXiv Badge
  • StarVector: Generating scalable vector graphics code from images and text, Rodriguez et al., Other Source Badge
  • The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search, Yamada et al., arXiv Badge
  • How to Create Accurate Scientific Illustrations with AI in 2025, Team et al., PDF Badge
Formula Transcription
  • Towards Semantic Markup of Mathematical Documents via User Interaction, Vre{\v{c}}ar et al., Other Source Badge
  • Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer, Sundararaj et al., arXiv Badge
  • LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement, Jiang et al., AAAI Badge
Citation Recommendation & Integration
  • Chronological citation recommendation with time preference, Ma et al., Scientometrics Badge
  • When large language models meet citation: A survey, Zhang et al., arXiv Badge
  • Directed Criteria Citation Recommendation and Ranking Through Link Prediction, Watson et al., arXiv Badge
  • ILCiteR: Evidence-grounded Interpretable Local Citation Recommendation, Roy et al., arXiv Badge
  • CiteBART: Learning to Generate Citations for Local Citation Recommendation, {\c{C}}elik et al., arXiv Badge
  • Benchmark for Evaluation and Analysis of Citation Recommendation Models, Maharjan et al., arXiv Badge
  • PaSa: An LLM Agent for Comprehensive Academic Paper Search, He et al., arXiv Badge
  • ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations, Wang et al., arXiv Badge
  • How deep do large language models internalize scientific literature and citation practices?, Algaba et al., arXiv Badge
  • SCIRGC: Multi-Granularity Citation Recommendation and Citation Sentence Preference Alignment, Li et al., arXiv Badge
  • Towards AI-assisted Academic Writing, Liebling et al., PDF Badge
  • Enhancing academic writing skills and motivation: assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students, Song et al., Frontiers in Psychology Badge
  • Human-AI collaboration patterns in AI-assisted academic writing, Nguyen et al., Studies in Higher Education Badge
  • Patterns and Purposes: A Cross-Journal Analysis of AI Tool Usage in Academic Writing, Xu et al., arXiv Badge
  • Divergent llm adoption and heterogeneous convergence paths in research writing, Lin et al., arXiv Badge
  • Artificial intelligence-assisted academic writing: recommendations for ethical use, Cheng et al., Advances in Simulation Badge

4.1.3 Assistance After Manuscript Completion

Grammar Correction
  • Csed: A chinese semantic error diagnosis corpus, Sun et al., arXiv Badge
  • Neural Automated Writing Evaluation with Corrective Feedback, Wang et al., arXiv Badge
  • LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction, Wang et al., arXiv Badge
  • Improving Grammatical Error Correction via Contextual Data Augmentation, Wang et al., arXiv Badge
  • How Paperpal Enhances English Writing Quality and Improves Productivity for Japanese Academics, George et al., Other Source Badge
  • Transforming hematological research documentation with large language models: an approach to scientific writing and data analysis, Yang et al., Blood research Badge
  • The usage of a transformer based and artificial intelligence driven multidimensional feedback system in english writing instruction, Zheng et al., Scientific Reports Badge
Expression & Logical Revision
  • Learning to split and rephrase from Wikipedia edit history, Botha et al., arXiv Badge
  • WikiAtomicEdits: A multilingual corpus of Wikipedia edits for modeling language and discourse, Faruqui et al., arXiv Badge
  • Diamonds in the rough: Generating fluent sentences from early-stage drafts for academic writing assistance, Ito et al., arXiv Badge
  • Text editing by command, Faltings et al., arXiv Badge
  • Wordcraft: A human-AI collaborative editor for story writing, Coenen et al., arXiv Badge
  • Machine-in-the-loop rewriting for creative image captioning, Padmakumar et al., arXiv Badge
  • Read, revise, repeat: A system demonstration for human-in-the-loop iterative text revision, Du et al., arXiv Badge
  • Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities, Lee et al., Other Source Badge
  • Sparks: Inspiration for science writing using language models, Gero et al., Other Source Badge
  • Techniques for supercharging academic writing with generative AI, Lin et al., Nature Badge
  • Overleafcopilot: Empowering academic writing in overleaf with large language models, Wen et al., arXiv Badge
  • Augmenting the author: Exploring the potential of AI collaboration in academic writing, Tu et al., arXiv Badge
  • Step-Back Profiling: Distilling User History for Personalized Scientific Writing, Tang et al., arXiv Badge
  • Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions, Nair et al., arXiv Badge
  • Enhancing Chinese Essay Discourse Logic Evaluation Through Optimized Fine-Tuning of Large Language Models, Song et al., NLPCC Badge
  • Cocoa: Co-Planning and Co-Execution with AI Agents, Feng et al., arXiv Badge
  • Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild, Mysore et al., arXiv Badge
  • XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision, Chen et al., arXiv Badge
  • The usage of a transformer based and artificial intelligence driven multidimensional feedback system in english writing instruction, Zheng et al., Scientific Reports Badge
  • Autonomous LLM-Driven Research—from Data to Human-Verifiable Research Papers, Ifargan et al., NEJM AI Badge

4.2 Full-Automatic Academic Writing

  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • Agent laboratory: Using llm agents as research assistants, Schmidgall et al., arXiv Badge
  • ScholaWrite: A Dataset of End-to-End Scholarly Writing Process, Wang et al., arXiv Badge
  • Beyond outlining: Heterogeneous recursive planning for adaptive long-form writing with language models, Xiong et al., arXiv Badge
  • AgentRxiv: Towards Collaborative Autonomous Research, Schmidgall et al., arXiv Badge
  • Zochi Technical Report, AI et al., PDF Badge
  • Carl Technical Report, Institute et al., PDF Badge
  • The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search, Yamada et al., arXiv Badge
  • Using artificial intelligence in academic writing and research: An essential productivity tool, Khalifa et al., Other Source Badge
  • Human-LLM Coevolution: Evidence from Academic Writing, Geng et al., arXiv Badge
  • Large language models penetration in scholarly writing and peer review, Zhou et al., arXiv Badge
  • And Plato met ChatGPT: an ethical reflection on the use of chatbots in scientific research writing, with a particular focus on the social sciences, Calderon et al., Other Source Badge

5. AI for Academic Peer Reviewing

  • Can we automate scientific reviewing?, Yuan et al., Other Source Badge
  • Reviewergpt? an exploratory study on using large language models for paper reviewing, Liu et al., arXiv Badge
  • Unveiling the sentinels: Assessing ai performance in cybersecurity peer review, Niu et al., arXiv Badge
  • Automated scholarly paper review: Concepts, technologies, and challenges, Lin et al., Other Source Badge
  • What Can Natural Language Processing Do for Peer Review?, Kuznetsov et al., arXiv Badge
  • Artificial intelligence to support publishing and peer review: A summary and review, Kousha et al., Other Source Badge
  • Large language models for automated scholarly paper review: A survey, Zhuang et al., arXiv Badge
  • Evaluating the predictive capacity of ChatGPT for academic peer review outcomes across multiple platforms, Thelwall et al., Other Source Badge
  • A framework for reviewing the results of automated conversion of structured organic synthesis procedures from the literature, Machi et al., Other Source Badge

5.1 Pre-Review

5.1.1 Desk-Review

  • How to Make Peer Review Recommendations and Decisions, Society et al., Other Source Badge
  • Helping editors find reviewers, Tedford et al., Other Source Badge
  • Snapp: Springer Nature's next-generation peer review system, Nature et al., Other Source Badge
  • Matching papers and reviewers at large conferences, Leyton-Brown et al., Artificial Intelligence Badge
  • Streamlining the review process: AI-generated annotations in research manuscripts, D{\'\i}az et al., arXiv Badge
  • Artificial intelligence in peer review: enhancing efficiency while preserving integrity, Doskaliuk et al., Other Source Badge
  • Enhancing Academic Decision-Making: A Pilot Study of AI-Supported Journal Selection in Higher Education, Farber et al., Innovative Higher Education Badge

5.1.2 Reviewer Matching

  • A framework for optimizing paper matching, Charlin et al., UAI Badge
  • The Toronto paper matching system: an automated paper-reviewer assignment system, Charlin et al., Other Source Badge
  • Pistis: A conflict of interest declaration and detection system for peer review management, Wu et al., Other Source Badge
  • An automated conflict of interest based greedy approach for conference paper assignment system, Pradhan et al., Journal of Informetrics Badge
  • Matching papers and reviewers at large conferences, Leyton-Brown et al., Artificial Intelligence Badge
  • Autonomous Machine Learning-Based Peer Reviewer Selection System, Aitymbetov et al., COLING Badge
  • Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing, Pendyala et al., Electronics Badge
  • Peer review expert group recommendation: A multi-subject coverage-based approach, Fu et al., Expert Systems with Applications Badge

5.2 In-Review

5.2.1 Peer-Review

Score Prediction
  • ALL-IN-ONE: Multi-Task Learning BERT Models for Evaluating Peer Assessments., Jia et al., Other Source Badge
  • The quality assist: A technology-assisted peer review based on citation functions to predict the paper quality, Basuki et al., IEEE Access Badge
  • Exploiting labeled and unlabeled data via transformer fine-tuning for peer-review score prediction, Muangkammuen et al., EMNLP Findings Badge
  • RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance, Couto et al., arXiv Badge
Comment Generation
  • Kid-review: knowledge-guided scientific review generation with oracle pre-training, Yuan et al., AAAI Badge
  • Gpt4 is slightly helpful for peer-review assistance: A pilot study, Robertson et al., arXiv Badge
  • Marg: Multi-agent review generation for scientific papers, D'Arcy et al., arXiv Badge
  • Peer review as a multi-turn and long-context dialogue with role-based interactions, Tan et al., arXiv Badge
  • Agentreview: Exploring peer review dynamics with llm agents, Jin et al., arXiv Badge
  • Can large language models provide useful feedback on research papers? A large-scale empirical analysis, Liang et al., NEJM AI Badge
  • Automated Focused Feedback Generation for Scientific Writing Assistance, Chamoun et al., PDF Badge
  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • SEAGraph: Unveiling the Whole Story of Paper Review Comments, Yu et al., arXiv Badge
  • The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search, Yamada et al., arXiv Badge
Unified Generation
  • A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications, Kang et al., NAACL Badge
  • Peerassist: leveraging on paper-review interactions to predict peer review decisions, Bharti et al., Other Source Badge
  • Marg: Multi-agent review generation for scientific papers, D'Arcy et al., arXiv Badge
  • Peer review as a multi-turn and long-context dialogue with role-based interactions, Tan et al., arXiv Badge
  • Automated review generation method based on large language models, Wu et al., arXiv Badge
  • AI-Driven review systems: evaluating LLMs in scalable and bias-aware academic reviews, Tyser et al., arXiv Badge
  • MAMORX: Multi-agent multi-modal scientific review generation with external knowledge, Taechoyotin et al., Other Source Badge
  • Cycleresearcher: Improving automated research via automated review, Weng et al., arXiv Badge
  • OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews, Idahl et al., arXiv Badge
  • The role of large language models in the peer-review process: opportunities and challenges for medical journal reviewers and editors, Lee et al., Other Source Badge
  • PiCO: Peer Review in LLMs based on Consistency Optimization, Ning et al., PDF Badge
  • Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews, Shin et al., arXiv Badge
  • Revieweval: An evaluation framework for ai-generated reviews, Kirtani et al., arXiv Badge
  • Automatically Evaluating the Paper Reviewing Capability of Large Language Models, Shin et al., arXiv Badge
  • Deepreview: Improving llm-based paper review with human-like deep thinking process, Zhu et al., arXiv Badge
  • Reviewagents: Bridging the gap between human and ai-generated paper reviews, Gao et al., arXiv Badge
  • Reviewing Scientific Papers for Critical Problems With Reasoning LLMs: Baseline Approaches and Automatic Evaluation, Zhang et al., arXiv Badge
  • REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning, Taechoyotin et al., arXiv Badge
  • TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review, Chang et al., arXiv Badge
  • PaperEval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system, Huang et al., Information Processing & Management Badge

5.2.2 Meta-Review

  • Summarizing multiple documents with conversational structure for meta-review generation, Li et al., arXiv Badge
  • Meta-review generation with checklist-guided iterative introspection, Zeng et al., arXiv Badge
  • When Reviewers Lock Horn: Finding Disagreement in Scientific Peer Reviews, Kumar et al., arXiv Badge
  • A sentiment consolidation framework for meta-review generation, Li et al., arXiv Badge
  • Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives of Scholarly Manuscripts, Santu et al., arXiv Badge
  • Towards automated meta-review generation via an NLP/ML pipeline in different stages of the scholarly peer review process, Kumar et al., Other Source Badge
  • Metawriter: Exploring the potential and perils of ai writing support in scientific peer review, Sun et al., Other Source Badge
  • GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews, Darrin et al., arXiv Badge
  • PeerArg: Argumentative Peer Review with LLMs, Sukpanichnant et al., arXiv Badge
  • Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment, Chen et al., arXiv Badge
  • LLMs as Meta-Reviewers' Assistants: A Case Study, Hossain et al., PDF Badge

5.3 Post-Review

5.3.1 Influence Analysis

  • Popular and/or prestigious? Measures of scholarly esteem, Ding et al., Information processing & management Badge
  • Measuring academic influence: Not all citations are equal, Zhu et al., Other Source Badge
  • An overview of microsoft academic service (mas) and applications, Sinha et al., Other Source Badge
  • Factors affecting number of citations: a comprehensive review of the literature, Tahamtan et al., Scientometrics Badge
  • Relative citation ratio (RCR): a new metric that uses citation rates to measure influence at the article level, Hutchins et al., PLoS biology Badge
  • HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction, Hao et al., arXiv Badge
  • From Words to Worth: Newborn Article Impact Prediction with LLM, Zhao et al., AAAI Badge
  • Large language models surpass human experts in predicting neuroscience results, Luo et al., Nature Badge

5.3.2 Promotion Enhancement

  • From complexity to clarity: How AI enhances perceptions of scientists and the public's understanding of science, Markowitz et al., PNAS nexus Badge
  • Automatic Evaluation Metrics for Artificially Generated Scientific Research, H{\"o}pner et al., arXiv Badge
  • Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation, Park et al., arXiv Badge
  • P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark, Sun et al., arXiv Badge
  • Can we automate scientific reviewing?, Yuan et al., Other Source Badge
  • Reviewergpt? an exploratory study on using large language models for paper reviewing, Liu et al., arXiv Badge
  • Unveiling the sentinels: Assessing ai performance in cybersecurity peer review, Niu et al., arXiv Badge
  • Automated scholarly paper review: Concepts, technologies, and challenges, Lin et al., Information fusion Badge
  • What Can Natural Language Processing Do for Peer Review?, Kuznetsov et al., arXiv Badge
  • Artificial intelligence to support publishing and peer review: A summary and review, Kousha et al., Learned Publishing Badge
  • Large language models for automated scholarly paper review: A survey, Zhuang et al., arXiv Badge
  • Evaluating the predictive capacity of ChatGPT for academic peer review outcomes across multiple platforms, Thelwall et al., Scientometrics Badge
  • A framework for reviewing the results of automated conversion of structured organic synthesis procedures from the literature, Machi et al., Digital Discovery Badge

6. Application

6.1 AI for Natural Science Research

6.1.1 AI for Physics Research

  • Colloquium: Machine learning in nuclear physics, Boehnlein et al., Other Source Badge
  • Toward the end-to-end optimization of particle physics instruments with differentiable programming, Dorigo et al., Other Source Badge
  • AI meets physics: a comprehensive survey, Jiao et al., Other Source Badge
  • Artificial intelligence for partial differential equations in computational mechanics: A review, Wang et al., arXiv Badge
  • When physics meets machine learning: A survey of physics-informed machine learning, Meng et al., Other Source Badge
Physical World Simulation
  • Interaction networks for learning about objects, relations and physics, Battaglia et al., NeurIPS Badge
  • End-to-end differentiable physics for learning and control, de Avila Belbute-Peres et al., NeurIPS Badge
  • Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Raissi et al., Journal of Computational physics Badge
  • Hamiltonian neural networks, Greydanus et al., NeurIPS Badge
  • Lagrangian neural networks, Cranmer et al., arXiv Badge
  • Physics-informed neural networks and extensions, Raissi et al., arXiv Badge
Automated Law Discovery
  • LLM-SR: Scientific Equation Discovery via Programming with Large Language Models, Shojaee et al., PDF Badge
  • LLM-Feynman: Leveraging Large Language Models for Universal Scientific Formula and Theory Discovery, Song et al., arXiv Badge
  • AI-Newton: A Concept-Driven Physical Law Discovery System without Prior Physical Knowledge, Fang et al., arXiv Badge
  • MLLM-based Discovery of Intrinsic Coordinates and Governing Equations from High-Dimensional Data, Li et al., arXiv Badge
  • LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models, Shojaee et al., PDF Badge
  • DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from Data and Experience, Wang et al., arXiv Badge
  • Colloquium: Machine learning in nuclear physics, Boehnlein et al., Reviews of modern physics Badge
  • Toward the end-to-end optimization of particle physics instruments with differentiable programming, Dorigo et al., Reviews in Physics Badge
  • AI meets physics: a comprehensive survey, Jiao et al., Artificial Intelligence Review Badge
  • Artificial intelligence for partial differential equations in computational mechanics: A review, Wang et al., arXiv Badge
  • When physics meets machine learning: A survey of physics-informed machine learning, Meng et al., Other Source Badge

6.1.2 AI for Biology & Medical Research

  • Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis, Wu et al., arXiv Badge
  • Advancing multimodal medical capabilities of Gemini, Yang et al., arXiv Badge
  • A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation, Tang et al., Other Source Badge
  • Large language models in plant biology, Lam et al., Other Source Badge
  • The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation, Swanson et al., Other Source Badge
  • A Fuzzy Logic-Based Approach to Predict Human Interaction by Functional Near-Infrared Spectroscopy, Jiang et al., Other Source Badge
  • Human-AI Teaming Using Large Language Models: Boosting Brain-Computer Interfacing (BCI) and Brain Research, Kapitonova et al., arXiv Badge
  • From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine, Buess et al., arXiv Badge
  • A survey of llm-based agents in medicine: How far are we from baymax?, Wang et al., arXiv Badge
  • Large language model for knowledge synthesis and AI-enhanced biomanufacturing, Li et al., Other Source Badge
  • Advancing drug discovery and development through GPT models: a review on challenges, innovations and future prospects, Othman et al., Other Source Badge
  • Large Language Models for Zero-shot Inference of Causal Structures in Biology, Newsham et al., arXiv Badge
  • Transforming hematological research documentation with large language models: an approach to scientific writing and data analysis, Yang et al., Other Source Badge
  • SpatialAgent: An autonomous AI agent for spatial biology, Wang et al., Other Source Badge
  • A Human-LLM Note-Taking System with Case-Based Reasoning as Framework for Scientific Discovery, Craig et al., PDF Badge
  • AI-assisted Drug Re-purposing for Human Liver Fibrosis, Guan et al., Other Source Badge
  • Biomni: A General-Purpose Biomedical AI Agent, Huang et al., Other Source Badge
  • Autonomous LLM-Driven Research—from Data to Human-Verifiable Research Papers, Ifargan et al., Other Source Badge
Protein Discovery.
  • Improved protein structure prediction using potentials from deep learning, Senior et al., Nature Badge
  • Highly accurate protein structure prediction with AlphaFold, Jumper et al., nature Badge
  • Leveraging biomolecule and natural language through multi-modal learning: A survey, Pei et al., arXiv Badge
  • ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning, Ghafarollahi et al., Digital Discovery Badge
  • Accurate structure prediction of biomolecular interactions with AlphaFold 3, Abramson et al., Nature Badge
  • Automating exploratory proteomics research via language models, Ding et al., arXiv Badge
  • Sparks: Multi-Agent Artificial Intelligence Model Discovers Protein Design Principles, Ghafarollahi et al., arXiv Badge
  • Enhancing Chemical Reaction and Retrosynthesis Prediction with Large Language Model and Dual-task Learning, Lin et al., arXiv Badge
Cell & Gene Modeling.
  • GenePT: a simple but effective foundation model for genes and cells built from ChatGPT, Chen et al., bioRxiv Badge
  • Biodiscoveryagent: An ai agent for designing genetic perturbation experiments, Roohani et al., arXiv Badge
  • Cellagent: An llm-driven multi-agent framework for automated single-cell data analysis, Xiao et al., arXiv Badge
  • Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas, Rood et al., Cell Badge
  • General-purpose pre-trained large cellular models for single-cell transcriptomics, Bian et al., National Science Review Badge
  • ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation, Agraz et al., Frontiers in Genetics Badge
  • LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs--Evaluation through Synthetic Data Generation, Afonja et al., arXiv Badge
  • Autonomous Robotic System with Optical Coherence Tomography Guidance for Vascular Anastomosis, Haworth et al., arXiv Badge
  • How to build the virtual cell with artificial intelligence: Priorities and opportunities, Bunne et al., Cell Badge
  • Efficient Fine-Tuning of Single-Cell Foundation Models Enables Zero-Shot Molecular Perturbation Prediction, Maleki et al., arXiv Badge
  • NeuroDISK: An AI Approach to Automate Continuous Inquiry-Driven Discoveries in Neuroimaging Genetics, Garijo et al., bioRxiv Badge
  • The rise of agentic AI teammates in medicine, Zou et al., The Lancet Badge
  • Transformers and genome language models, Consens et al., Nature Badge
Drug Discovery
  • A deep learning approach to antibiotic discovery, Stokes et al., Cell Badge
  • Artificial intelligence to deep learning: machine intelligence approach for drug discovery, Gupta et al., Molecular diversity Badge
  • HGTDR: Advancing drug repurposing with heterogeneous graph transformers, Gharizadeh et al., Bioinformatics Badge
  • A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation, Tang et al., Briefings in Bioinformatics Badge
  • A data science roadmap for open science organizations engaged in early-stage drug discovery, Edfeldt et al., Nature Communications Badge
  • Drugclip: Contrastive drug-disease interaction for drug repurposing, Lu et al., arXiv Badge
  • Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review, Gangwal et al., Other Source Badge
  • A foundation model for clinician-centered drug repurposing, Huang et al., Nature Medicine Badge
  • Drugagent: Automating ai-aided drug discovery programming through llm multi-agent collaboration, Liu et al., arXiv Badge
  • Towards LLM-Driven Multi-Agent Pipeline for Drug Discovery: Neurodegenerative Diseases Case Study, Solovev et al., Other Source Badge
  • A Deep Subgrouping Framework for Precision Drug Repurposing via Emulating Clinical Trials on Real-world Patient Data, Lee et al., arXiv Badge
  • Hallucinations Can Improve Large Language Models in Drug Discovery, Yuan et al., arXiv Badge
  • RAG-Enhanced Collaborative LLM Agents for Drug Discovery, Lee et al., arXiv Badge
  • LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling Discovery of New Ionizable Lipid Designs for mRNA Delivery, Cui et al., BioRxiv Badge
  • Advancing drug discovery and development through GPT models: a review on challenges, innovations and future prospects, Othman et al., Intelligence-Based Medicine Badge
  • DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery, Li et al., arXiv Badge
  • AI-assisted Drug Re-purposing for Human Liver Fibrosis, Guan et al., bioRxiv Badge
Clinical Diagnosis
  • Large language models encode clinical knowledge, Singhal et al., PDF Badge
  • Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis, Wu et al., arXiv Badge
  • Advancing clinical decision support: The role of artificial intelligence across six domains, Khalifa et al., Other Source Badge
  • Ai hospital: Benchmarking large language models in a multi-agent medical interaction simulator, Fan et al., arXiv Badge
  • Agent hospital: A simulacrum of hospital with evolvable medical agents, Li et al., arXiv Badge
  • Autonomous Robotic System with Optical Coherence Tomography Guidance for Vascular Anastomosis, Haworth et al., arXiv Badge
  • Piors: Personalized intelligent outpatient reception based on large language model with multi-agents medical scenario simulation, Bao et al., arXiv Badge
  • Towards an AI co-scientist, Gottweis et al., arXiv Badge
  • Generative Artificial Intelligence in Anatomic Pathology, Brodsky et al., Other Source Badge
  • Clinicalgpt-r1: Pushing reasoning capability of generalist disease diagnosis with large language model, Lan et al., arXiv Badge
  • A Human-LLM Note-Taking System with Case-Based Reasoning as Framework for Scientific Discovery, Craig et al., PDF Badge
  • PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions, Kyung et al., arXiv Badge
  • MedSyn: Enhancing Diagnostics with Human-AI Collaboration, Sayin et al., arXiv Badge
  • Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis, Wu et al., arXiv Badge
  • Advancing multimodal medical capabilities of Gemini, Yang et al., arXiv Badge
  • A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation, Tang et al., Briefings in Bioinformatics Badge
  • Large language models in plant biology, Lam et al., Trends in Plant Science Badge
  • The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation, Swanson et al., bioRxiv Badge
  • A Fuzzy Logic-Based Approach to Predict Human Interaction by Functional Near-Infrared Spectroscopy, Jiang et al., Other Source Badge
  • Human-AI Teaming Using Large Language Models: Boosting Brain-Computer Interfacing (BCI) and Brain Research, Kapitonova et al., arXiv Badge
  • From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine, Buess et al., arXiv Badge
  • A survey of llm-based agents in medicine: How far are we from baymax?, Wang et al., arXiv Badge
  • Large language model for knowledge synthesis and AI-enhanced biomanufacturing, Li et al., Trends in Biotechnology Badge
  • Advancing drug discovery and development through GPT models: a review on challenges, innovations and future prospects, Othman et al., Intelligence-Based Medicine Badge
  • Large Language Models for Zero-shot Inference of Causal Structures in Biology, Newsham et al., arXiv Badge
  • Transforming hematological research documentation with large language models: an approach to scientific writing and data analysis, Yang et al., Blood research Badge
  • SpatialAgent: An autonomous AI agent for spatial biology, Wang et al., bioRxiv Badge
  • A Human-LLM Note-Taking System with Case-Based Reasoning as Framework for Scientific Discovery, Craig et al., PDF Badge
  • AI-assisted Drug Re-purposing for Human Liver Fibrosis, Guan et al., bioRxiv Badge
  • Biomni: A General-Purpose Biomedical AI Agent, Huang et al., bioRxiv Badge
  • Autonomous LLM-Driven Research—from Data to Human-Verifiable Research Papers, Ifargan et al., NEJM AI Badge

6.1.3 AI for Chemistry& Materials Research

  • Accelerating materials discovery using artificial intelligence, high performance computing and robotics, Pyzer-Knapp et al., Other Source Badge
  • Accelerating materials language processing with large language models, Choi et al., Other Source Badge
  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • Nano & AI: A Nobel Partnership, Chen et al., Other Source Badge
  • Simulating 500 million years of evolution with a language model, Hayes et al., Other Source Badge
  • AI4Materials: Transforming the Landscape of Materials Science and Enigneering, Jiang et al., Other Source Badge
  • Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry, Mroz et al., Other Source Badge
  • Empowering Generalist Material Intelligence with Large Language Models, Yuan et al., Other Source Badge
  • From Literature to Lab: Hardware-Independent Autonomous Chemical Synthesis with Reinforcement Learning, Wu et al., Other Source Badge
Automatic Analysis
  • Graph networks as a universal machine learning framework for molecules and crystals, Chen et al., Chemistry of Materials Badge
  • An autonomous laboratory for the accelerated synthesis of novel materials, Szymanski et al., Nature Badge
  • Accelerating the Discovery of Abiotic Vesicles with AI-Guided Automated Experimentation, Ekosso et al., Langmuir Badge
  • Sequential closed-loop Bayesian optimization as a guide for organic molecular metallophotocatalyst formulation discovery, Li et al., Nature Badge
  • High-throughput robotic collection, imaging, and machine learning analysis of salt patterns: composition and concentration from dried droplet photos, Batista et al., Digital Discovery Badge
  • Adaptive representation of molecules and materials in Bayesian optimization, Rajabi-Kochi et al., Chemical Science Badge
  • FlavorDiffusion: Modeling Food-Chemical Interactions with Diffusion, Seo et al., PDF Badge
Automatic Discovery
  • Chatgpt-Assisted Rational Design for Iterative Performance Optimization of Perovskite Solar Cells, Zhang et al., Available at SSRN 5127472 Badge
  • Machine learning for molecular and materials science, Butler et al., Nature Badge
  • Scaling deep learning for materials discovery, Merchant et al., Nature Badge
  • Experimental discovery of novel ammonia synthesis catalysts via active learning, Jayarathna et al., Other Source Badge
  • A sober look at LLMs for material discovery: Are they actually good for Bayesian optimization over molecules?, Kristiadi et al., arXiv Badge
  • BatGPT-Chem: A Foundation Large Model For Chemical Engineering, Yang et al., Other Source Badge
  • AI-assisted inverse design of sequence-ordered high intrinsic thermal conductivity polymers, Huang et al., Materials Today Physics Badge
  • Real-time experiment-theory closed-loop interaction for autonomous materials science, Liang et al., arXiv Badge
  • Autonomous mobile robots for exploratory synthetic chemistry, Dai et al., Nature Badge
  • Machine Learning-Aided Inverse Design and Discovery of Novel Polymeric Materials for Membrane Separation, Dangayach et al., Environmental Science & Technology Badge
  • ORGANA: a robotic assistant for automated chemistry experimentation and characterization, Darvish et al., Matter Badge
  • Adaptive AI decision interface for autonomous electronic material discovery, Dai et al., arXiv Badge
Full Human-AI Collaboration Process Management
  • Automated synthesis of oxygen-producing catalysts from Martian meteorites by a robotic AI chemist, Zhu et al., Nature Badge
  • ChemReasoner: Heuristic search over a large language model's knowledge space using quantum-chemical feedback, Sprueill et al., arXiv Badge
  • Efficient evolutionary search over chemical space with large language models, Wang et al., arXiv Badge
  • MatPilot: an LLM-enabled AI Materials Scientist under the Framework of Human-Machine Collaboration, Ni et al., arXiv Badge
  • Autonomous Microscopy Experiments through Large Language Model Agents, Mandal et al., arXiv Badge
  • Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs, Ma et al., Macromolecular Rapid Communications Badge
  • A multiagent-driven robotic ai chemist enabling autonomous chemical research on demand, Song et al., Other Source Badge
  • Agentic Assistant for Material Scientists, Feng et al., Other Source Badge
  • Physics-informed, dual-objective optimization of high-entropy-alloy nanozymes by a robotic AI chemist, Luo et al., Matter Badge
  • Intelligent, Personalized Scientific Assistant via Large Language Models for Solid-State Battery Research, Leng et al., ACS Materials Letters Badge
  • Prim: Principle-inspired material discovery through multi-agent collaboration, Lai et al., arXiv Badge
  • Accelerating materials discovery using artificial intelligence, high performance computing and robotics, Pyzer-Knapp et al., npj Computational Materials Badge
  • Accelerating materials language processing with large language models, Choi et al., Communications Materials Badge
  • Augmenting large language models with chemistry tools, M. Bran et al., Nature Badge
  • Nano & AI: A Nobel Partnership, Chen et al., ACS nano Badge
  • Simulating 500 million years of evolution with a language model, Hayes et al., Science Badge
  • AI4Materials: Transforming the Landscape of Materials Science and Enigneering, Jiang et al., Review of Materials Research Badge
  • Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry, Mroz et al., Chemical Society Reviews Badge
  • Empowering Generalist Material Intelligence with Large Language Models, Yuan et al., Advanced Materials Badge
  • From Literature to Lab: Hardware-Independent Autonomous Chemical Synthesis with Reinforcement Learning, Wu et al., Other Source Badge

6.2 AI for Applied Science and Engineering Research

6.2.1 AI for Robotics and Control Research

  • The AI CUDA engineer: Agentic CUDA kernel discovery, optimization and composition, Lange et al., Other Source Badge
  • Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review, Lee et al., arXiv Badge
Autonomous Design & Optimization
  • Towards industry-ready additive manufacturing: AI-enabled closed-loop control for 3D melt electrowriting, Mieszczanek et al., Communications Engineering Badge
  • Closed-loop transfer enables artificial intelligence to yield chemical knowledge, Angello et al., Nature Badge
  • Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation, Bu et al., NeurIPS Badge
  • Real-time experiment-theory closed-loop interaction for autonomous materials science, Liang et al., arXiv Badge
  • AI-Driven Robotics for Free-Space Optics, Uddin et al., arXiv Badge
End-to-End Vision-Based Control
  • End-to-end training of deep visuomotor policies, Levine et al., Other Source Badge
  • Domain randomization for transferring deep neural networks from simulation to the real world, Tobin et al., Other Source Badge
  • Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection, Levine et al., Other Source Badge
  • Scalable deep reinforcement learning for vision-based robotic manipulation, Kalashnikov et al., Other Source Badge
Sim-to-Real Robustness & Safety
  • Real-world humanoid locomotion with reinforcement learning, Radosavovic et al., Science Robotics Badge
  • Improving generalization of robot locomotion policies via Sharpness-Aware Reinforcement Learning, Bochem et al., arXiv Badge
  • Robustness Evaluation of Offline Reinforcement Learning for Robot Control Against Action Perturbations, Ayabe et al., arXiv Badge
  • Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms, Yang et al., arXiv Badge
  • Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning, Guerrier et al., arXiv Badge
Multi-Task & Multi-Agent Control Frameworks
  • Value Iteration for Learning Concurrently Executable Robotic Control Tasks, Tahmid et al., arXiv Badge
  • NovelSeek: When Agent Becomes the Scientist--Building Closed-Loop System from Hypothesis to Verification, Team et al., arXiv Badge
  • The AI CUDA engineer: Agentic CUDA kernel discovery, optimization and composition, Lange et al., Other Source Badge
  • Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review, Lee et al., arXiv Badge

6.2.2 AI for Software Engineering

Code Generation
  • Evaluating large language models trained on code, Chen et al., arXiv Badge
  • Codegen: An open large language model for code with multi-turn program synthesis, Nijkamp et al., arXiv Badge
  • Starcoder: may the source be with you!, Li et al., arXiv Badge
  • Code llama: Open foundation models for code, Roziere et al., arXiv Badge
  • DeepSeek-Coder: When the Large Language Model Meets Programming--The Rise of Code Intelligence, Guo et al., arXiv Badge
  • Starcoder 2 and the stack v2: The next generation, Lozhkov et al., arXiv Badge
  • MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios, Huang et al., arXiv Badge
  • Seed-Coder: Let the Code Model Curate Data for Itself, Zhang et al., arXiv Badge
End-to-End Software Development
  • Application of large language models to software engineering tasks: Opportunities, risks, and implications, Ozkaya et al., IEEE Software Badge
  • Chatdev: Communicative agents for software development, Qian et al., arXiv Badge
  • Large language models for software engineering: Survey and open problems, Fan et al., Other Source Badge
  • Experiential co-learning of software-developing agents, Qian et al., arXiv Badge
  • Repoexec: Evaluate code generation with a repository-level executable benchmark, Le Hai et al., arXiv e-prints Badge
  • SWE-bench: Can Language Models Resolve Real-world Github Issues?, Jimenez et al., PDF Badge
  • Hyperagent: Generalist software engineering agents to solve coding tasks at scale, Phan et al., arXiv Badge
  • Explainable automated debugging via large language model-driven scientific debugging, Kang et al., Empirical Software Engineering Badge

6.3 AI for Social Science Research

6.3.1 AI for Sociology Research

  • Ethnography and Machine Learning: Synergies and New Directions, Li et al., arXiv Badge
  • Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence, Karjus et al., Other Source Badge
  • Agent-Enhanced Large Language Models for Researching Political Institutions, Loffredo et al., arXiv Badge
  • Reimagining urban science: Scaling causal inference with large language models, Xia et al., arXiv Badge
AI-Assisted Experimental and Interview Studies.
  • Automated social science: Language models as scientist and subjects, Manning et al., Other Source Badge
  • Step Further Towards Automated Social Science: An AI-Powered Interview Platform, Liu et al., Available at SSRN Badge
Large-Scale Simulation of Social Phenomena.
  • RAISE: A New Method to Develop Experimental Stimuli for Advertising Research with Image Generative Artificial Intelligence, Zamudio et al., Journal of Advertising Badge
  • Cultural evolution in populations of Large Language Models, Perez et al., arXiv Badge
  • Economic Anthropology in the Era of Generative Artificial Intelligence, Sheldon et al., arXiv Badge
  • Malinowski in the Age of AI: Can large language models create a text game based on an anthropological classic?, Hoffmann et al., arXiv Badge
  • AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making, Huang et al., arXiv Badge
  • ResearchTown: Simulator of Human Research Community, Yu et al., arXiv Badge
  • Simulating cooperative prosocial behavior with multi-agent LLMs: Evidence and mechanisms for AI agents to inform policy decisions, Sreedhar et al., Other Source Badge
  • Predicting Field Experiments with Large Language Models, Chen et al., arXiv Badge
  • Language Models Surface the Unwritten Code of Science and Society, Bao et al., arXiv Badge
Potential Risks Discussion.
  • Automated social science: Language models as scientist and subjects, Manning et al., Other Source Badge
  • ChatGPT as research scientist: probing GPT’s capabilities as a research librarian, research ethicist, data generator, and data predictor, Lehr et al., Other Source Badge
  • Predicting Results of Social Science Experiments Using Large Language Models, Luke et al., PDF Badge
  • Ethnography and Machine Learning: Synergies and New Directions, Li et al., arXiv Badge
  • Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence, Karjus et al., Other Source Badge
  • Agent-Enhanced Large Language Models for Researching Political Institutions, Loffredo et al., arXiv Badge
  • Reimagining urban science: Scaling causal inference with large language models, Xia et al., arXiv Badge

6.3.2 AI for Psychology Research

  • Automating psychological hypothesis generation with AI: when large language models meet causal graph, Tong et al., Other Source Badge
  • Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits, Li et al., arXiv Badge
Experiment Workflow Automation and Simulation.
  • Using cognitive psychology to understand GPT-3, Binz et al., PDF Badge
  • Can AI language models replace human participants?, Dillion et al., PDF Badge
  • The emergence of economic rationality of GPT, Chen et al., Other Source Badge
  • AI-experiments in education: An AI-driven randomized controlled trial for higher education research, Cingillioglu et al., Education and Information Technologies Badge
  • RAISE: A New Method to Develop Experimental Stimuli for Advertising Research with Image Generative Artificial Intelligence, Zamudio et al., Journal of Advertising Badge
  • Frontiers: Can Large Language Models Capture Human Preferences?, Goli et al., PDF Badge
  • Testing theory of mind in large language models and humans, Strachan et al., PDF Badge
  • Do large language models show decision heuristics similar to humans? A case study using GPT-3.5., Suri et al., Other Source Badge
  • Towards a client-centered assessment of llm therapists by client simulation, Wang et al., arXiv Badge
  • Interactive agents: Simulating counselor-client psychological counseling via role-playing llm-to-llm interactions, Qiu et al., arXiv Badge
  • Can AI Replace Human Subjects? A Large-Scale Replication of Psychological Experiments with LLMs, Cui et al., arXiv Badge
Human-AI Trust and Safety Design.
  • MMSD2. 0: Towards a reliable multi-modal sarcasm detection system, Qin et al., arXiv Badge
  • Developing trustworthy artificial intelligence: insights from research on interpersonal, human-automation, and human-AI trust, Li et al., Frontiers in Psychology Badge
  • From Lived Experience to Insight: Unpacking the Psychological Risks of Using AI Conversational Agents, Chandra et al., arXiv Badge
Psychological Interventions.
  • Using cognitive psychology to understand GPT-3, Binz et al., PDF Badge
  • Can AI language models replace human participants?, Dillion et al., PDF Badge
  • Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT, Hagendorff et al., PDF Badge
  • Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study, Deiner et al., PDF Badge
  • Crafting clarity: Leveraging large language models to decode consumer reviews, Praveen et al., PDF Badge
  • ChatGPT for Textual Analysis? How to Use Generative LLMs in Accounting Research, de Kok et al., PDF Badge
  • The use of artificial intelligence in psychotherapy: development of intelligent therapeutic systems, Spytska et al., BMC psychology Badge
  • Randomized trial of a generative ai chatbot for mental health treatment, Heinz et al., Nejm Ai Badge
  • Large language models as mental health resources: Patterns of use in the united states, Rousmaniere et al., Other Source Badge
  • Large Language Models Pass the Turing Test, Jones et al., arXiv Badge
  • Experiential Narratives in Marketing: A Comparison of Generative AI and Human Content, Wen et al., PDF Badge
  • Automating psychological hypothesis generation with AI: when large language models meet causal graph, Tong et al., Other Source Badge
  • Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits, Li et al., arXiv Badge

7. Future and Frontiers

7.1 Interdisciplinary AI Models

  • Artificial intelligence in cancer research: learning at different levels of data granularity, Cirillo et al., Molecular oncology Badge
  • Generating full length wikipedia biographies: The impact of gender bias on the retrieval-based generation of women biographies, Fan et al., arXiv Badge
  • Contrastive knowledge integrated graph neural networks for Chinese medical text classification, Lan et al., Other Source Badge
  • Heterogeneous federated learning: State-of-the-art and research challenges, Ye et al., ACM Computing Surveys Badge
  • A comprehensive survey of cross-domain policy transfer for embodied agents, Niu et al., arXiv Badge
  • Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models, Gu et al., arXiv Badge
  • A survey of trustworthy representation learning across domains, Zhu et al., Other Source Badge
  • BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science, Lin et al., arXiv Badge
  • Knowledge transfer for cross-domain reinforcement learning: a systematic review, Serrano et al., IEEE Access Badge
  • Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning, Buehler et al., Other Source Badge
  • Heterogeneous data integration: Challenges and opportunities, Putrama et al., Data in Brief Badge
  • A comprehensive survey of foundation models in medicine, Khan et al., Other Source Badge
  • Foundation models and intelligent decision-making: Progress, challenges, and perspectives, Huang et al., The Innovation Badge

7.2 Ethics and Safety in AI4Research

  • Causal learning for socially responsible AI, Cheng et al., arXiv Badge
  • Artificial intelligence and ethics: a comprehensive review of bias mitigation, transparency, and accountability in AI Systems, Mensah et al., Preprint, November Badge
  • Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies, Ferrara et al., Sci Badge
  • AXOLOTL: fairness through assisted self-debiasing of large language model outputs, Ebrahimi et al., arXiv Badge
  • Policy advice and best practices on bias and fairness in AI, Alvarez et al., Ethics and Information Technology Badge
  • Automated Peer-Reviewer Assignment can be Manipulated to Secure Reviews from Colluders, Hsieh et al., Other Source Badge
  • Mitigating bias in artificial intelligence: Fair data generation via causal models for transparent and explainable decision-making, Gonz{\'a}lez-Sendino et al., Future Generation Computer Systems Badge
  • Enhancing peer review efficiency: A mixed-methods analysis of artificial intelligence-assisted reviewer selection across academic disciplines, Farber et al., Learned Publishing Badge
  • Beyond principlism: practical strategies for ethical AI use in research practices, Lin et al., AI and Ethics Badge
  • SciTrust: Evaluating the Trustworthiness of Large Language Models for Science, Herron et al., Other Source Badge
  • Are we there yet? revealing the risks of utilizing large language models in scholarly peer review, Ye et al., arXiv Badge
  • Vulnerability of Text-Matching in ML/AI Conference Reviewer Assignments to Collusions, Raghunathan et al., arXiv Badge
  • How human--AI feedback loops alter human perceptual, emotional and social judgements, Glickman et al., Nature Badge
  • The hidden dimensions of llm alignment: A multi-dimensional safety analysis, Pan et al., arXiv Badge
  • Responsible AI in biotechnology: balancing discovery, innovation and biosecurity risks, Wheeler et al., Other Source Badge
  • All that glitters is not novel: Plagiarism in ai generated research, Gupta et al., arXiv Badge
  • Detecting llm-written peer reviews, Rao et al., arXiv Badge
  • Ethical and bias considerations in artificial intelligence/machine learning, Hanna et al., Modern Pathology Badge
  • Automation Bias in AI-assisted Medical Decision-making under Time Pressure in Computational Pathology, Rosbach et al., Other Source Badge
  • Considering the Ethics of Large Machine Learning Models in the Chemical Sciences, Spotte-Smith et al., Other Source Badge
  • Generative artificial intelligence for academic research: evidence from guidance issued for researchers by higher education institutions in the United States, Ganguly et al., AI and Ethics Badge
  • Artificial intelligence and dichotomania, McShane et al., Judgment and Decision Making Badge
  • The Plagiarism Singularity Conjecture, Ranga et al., ACL Badge
  • Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models, Xiong et al., arXiv Badge
  • BiasFilter: An Inference-Time Debiasing Framework for Large Language Models, Cheng et al., arXiv Badge
  • SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents, Zhu et al., arXiv Badge
  • OpenReview Should be Protected and Leveraged as a Community Asset for Research in the Era of Large Language Models, Sun et al., arXiv Badge

7.3 AI for Collaborative Research

  • A hybrid approach to privacy-preserving federated learning, Truex et al., Other Source Badge
  • A review of applications in federated learning, Li et al., Computers & Industrial Engineering Badge
  • A survey on federated learning, Zhang et al., Knowledge-Based Systems Badge
  • A systematic review of federated learning: Challenges, aggregation methods, and development tools, Guendouzi et al., Other Source Badge
  • Federated learning and data privacy: A review of challenges and opportunities, Myakala et al., Other Source Badge
  • Designing collaborative intelligence systems for employee-AI service co-production, Blaurock et al., Journal of Service Research Badge
  • Collaborative Intelligence: A scoping review of current applications, Schleiger et al., Applied Artificial Intelligence Badge
  • Deconstructing Human-AI Collaboration: Agency, Interaction, and Adaptation, Holter et al., Other Source Badge
  • The ai scientist: Towards fully automated open-ended scientific discovery, Lu et al., arXiv Badge
  • Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review, Gomez et al., Frontiers in Computer Science Badge
  • Text2world: Benchmarking large language models for symbolic world model generation, Hu et al., arXiv Badge
  • Distributed cross-learning for equitable federated models-privacy-preserving prediction on data from five California hospitals, Kuo et al., Nature Communications Badge
  • Multi-agent risks from advanced ai, Hammond et al., arXiv Badge
  • Simulating cooperative prosocial behavior with multi-agent LLMs: Evidence and mechanisms for AI agents to inform policy decisions, Sreedhar et al., Other Source Badge
  • Accelerating drug discovery with Artificial: a whole-lab orchestration and scheduling system for self-driving labs, Fehlis et al., arXiv Badge
  • 34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery, Zimmermann et al., arXiv Badge
  • DrugPilot: LLM-based Parameterized Reasoning Agent for Drug Discovery, Li et al., arXiv Badge
  • The role of agentic ai in shaping a smart future: A systematic review, Hosseini et al., Array Badge

7.4 Explainability and Transparency of AI4Research

  • On gradient-like explanation under a black-box setting: when black-box explanations become as good as white-box, Cai et al., arXiv Badge
  • Explainable and interpretable artificial intelligence in medicine: a systematic bibliometric review, Frasca et al., Discover Artificial Intelligence Badge
  • Towards uncovering how large language model works: An explainability perspective, Zhao et al., arXiv Badge
  • Mechanistic Interpretability for AI Safety--A Review, Bereska et al., arXiv Badge
  • A practical review of mechanistic interpretability for transformer-based language models, Rai et al., arXiv Badge
  • Interpreting black-box models: a review on explainable artificial intelligence, Hassija et al., Cognitive Computation Badge
  • Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought, Chen et al., NeurIPS Badge
  • Explainable AI reloaded: Challenging the xai status quo in the era of large language models, Ehsan et al., Other Source Badge
  • Beyond principlism: practical strategies for ethical AI use in research practices, Lin et al., AI and Ethics Badge
  • ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model, Chen et al., arXiv Badge
  • RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning, Chen et al., arXiv Badge

7.5 AI for Dynamic and Real‑Time Optimized Scientific Experimentation

  • Tree-planner: Efficient close-loop task planning with large language models, Hu et al., arXiv Badge
  • Review of low-cost self-driving laboratories in chemistry and materials science: the “frugal twin” concept, Lo et al., Digital Discovery Badge
  • Self-driving laboratories for chemistry and materials science, Tom et al., Chemical Reviews Badge
  • Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model, Hu et al., arXiv Badge
  • Real-time experiment-theory closed-loop interaction for autonomous materials science, Liang et al., arXiv Badge
  • AutoSciLab: A Self-Driving Laboratory For Interpretable Scientific Discovery, Desai et al., AAAI Badge
  • Adaptive AI decision interface for autonomous electronic material discovery, Dai et al., arXiv Badge
  • Science acceleration and accessibility with self-driving labs, Canty et al., Nature Communications Badge

7.6 Multimodal Integration in AI4Research

  • Look, read and enrich-learning from scientific figures and their captions, Gomez-Perez et al., Other Source Badge
  • Uniter: Universal image-text representation learning, Chen et al., ECCV Badge
  • T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering, Wang et al., arXiv Badge
  • Figcaps-hf: A figure-to-caption generative framework and benchmark with human feedback, Singh et al., arXiv Badge
  • M 3 CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought, Chen et al., arXiv Badge
  • Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models, Shi et al., arXiv Badge
  • S3 agent: Unlocking the power of VLLM for zero-shot multi-modal sarcasm detection, Wang et al., Other Source Badge
  • Vlm4bio: A benchmark dataset to evaluate pretrained vision-language models for trait discovery from biological images, Maruf et al., NeurIPS Badge
  • Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs, Wang et al., arXiv Badge
  • What factors affect multi-modal in-context learning? an in-depth exploration, Qin et al., arXiv Badge
  • Bigdocs: An open dataset for training multimodal models on document and code tasks, Rodriguez et al., ICLR Badge
  • InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback, Zhao et al., arXiv Badge
  • MERMaid: Universal multimodal mining of chemical reactions from PDFs using vision-language models, Leong et al., Other Source Badge
  • Comt: A novel benchmark for chain of multi-modal thought on large vision-language models, Cheng et al., AAAI Badge
  • Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought, Cheng et al., arXiv Badge
  • HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights, Gokdemir et al., arXiv Badge

7.7 Multilingual Integration in AI4Research

  • Languages are still a major barrier to global science, Amano et al., PLoS biology Badge
  • Unsupervised cross-lingual representation learning at scale, Conneau et al., arXiv Badge
  • SimAlign: High quality word alignments without parallel training data using static and contextualized embeddings, Sabet et al., arXiv Badge
  • Ten tips for overcoming language barriers in science, Amano et al., Nature Badge
  • Improving low-resource languages in pre-trained multilingual language models, Hangya et al., EMNLP Badge
  • Hit-scir at mmnlu-22: Consistency regularization for multilingual spoken language understanding, Zheng et al., arXiv Badge
  • Crosslingual capabilities and knowledge barriers in multilingual large language models, Chua et al., arXiv Badge
  • AutoCAP: Towards automatic cross-lingual alignment planning for zero-shot chain-of-thought, Zhang et al., arXiv Badge
  • Rule-based, neural and LLM back-translation: Comparative insights from a variant of Ladin, Frontull et al., arXiv Badge
  • A survey of multilingual large language models, Qin et al., Patterns Badge
  • A smack of all neighbouring languages: How multilingual is scholarly communication?, Pradier et al., arXiv Badge
  • X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System, Wang et al., arXiv Badge
  • AI-powered platform for scientific discovery, Trifonov et al., Other Source Badge
  • Hypothesis generation with large language models, Zhou et al., arXiv Badge
  • Artificial intelligence and scientific discovery: A model of prioritized search, Agrawal et al., Research Policy Badge
  • A comprehensive survey of scientific large language models and their applications in scientific discovery, Zhang et al., arXiv Badge
  • Artificial intelligence for literature reviews: Opportunities and challenges, Bolanos et al., Artificial Intelligence Review Badge
  • Creativity in AI: Progresses and Challenges, Ismayilzada et al., arXiv Badge
  • LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions, Liao et al., arXiv Badge
  • Towards scientific discovery with generative ai: Progress, opportunities, and challenges, Reddy et al., AAAI Badge
  • LLM4SR: A Survey on Large Language Models for Scientific Research, Luo et al., arXiv Badge
  • Large language models for automated scholarly paper review: A survey, Zhuang et al., arXiv Badge
  • Large Physics Models: Towards a collaborative approach with Large Language Models and Foundation Models, Barman et al., arXiv Badge
  • Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation, Eger et al., arXiv Badge
  • Unlocking the Potential of AI Researchers in Scientific Discovery: What Is Missing?, Yu et al., arXiv Badge
  • A review of llm-assisted ideation, Li et al., arXiv Badge
  • Towards scientific intelligence: A survey of llm-based scientific agents, Ren et al., arXiv Badge
  • Agentichypothesis: A survey on hypothesis generation using llm systems, Bazgir et al., Other Source Badge
  • Agentic ai for scientific discovery: A survey of progress, challenges, and future directions, Gridach et al., arXiv Badge
  • A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models, Alkan et al., arXiv Badge
  • Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery, Zhang et al., arXiv Badge
  • Scientific hypothesis generation and validation: Methods, datasets, and future directions, Kulkarni et al., arXiv Badge
  • AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research, Chen et al., arXiv Badge
  • Towards Agentic AI for Science: Hypothesis Generation, Comprehension, Quantification, and Validation, Huang et al., Other Source Badge
  • Position: The AI Conference Peer Review Crisis Demands Author Feedback and Reviewer Rewards, Kim et al., arXiv Badge
  • From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery, Zheng et al., arXiv Badge
  • AI Scientists Fail Without Strong Implementation Capability, Zhu et al., arXiv Badge

9. Resources

9.1 AI for Scientific Comprehension

9.1.1 Textual Scientific Comprehension

  • Pubmedqa: A dataset for biomedical research question answering, Jin et al., arXiv Badge
  • Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering, Pal et al., Other Source Badge
  • CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice, Raza et al., BMC bioinformatics Badge
  • Scienceqa: A novel resource for question answering on scholarly articles, Saikh et al., Other Source Badge
  • Clam: Selective clarification for ambiguous questions with generative language models, Kuhn et al., arXiv Badge
  • BioASQ-QA: A manually curated corpus for Biomedical Question Answering, Krithara et al., Scientific Data Badge
  • The sciqa scientific question answering benchmark for scholarly knowledge, Auer et al., Scientific Reports Badge
  • Theoremqa: A theorem-driven question answering dataset, Chen et al., arXiv Badge
  • Scibench: Evaluating college-level scientific problem-solving abilities of large language models, Wang et al., arXiv Badge
  • What if: Generating code to answer simulation questions in chemistry texts, Peretz et al., Other Source Badge
  • Enabling Language Models to Implicitly Learn Self-Improvement, Wang et al., arXiv Badge
  • Paperqa: Retrieval-augmented generative agent for scientific research, L{\'a}la et al., arXiv Badge
  • Sciglm: Training scientific language models with self-reflective instruction annotation and tuning, Zhang et al., arXiv Badge
  • Generating Multiple Choice Questions from Scientific Literature via Large Language Models, Luo et al., Other Source Badge
  • Biomedlm: A 2.7 b parameter language model trained on biomedical text, Bolton et al., arXiv Badge
  • SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation, Wan et al., arXiv Badge
  • M 3 CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought, Chen et al., arXiv Badge
  • Scifibench: Benchmarking large multimodal models for scientific figure interpretation, Roberts et al., arXiv Badge
  • Sciknoweval: Evaluating multi-level scientific knowledge of large language models, Feng et al., arXiv Badge
  • BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science, Lin et al., arXiv Badge
  • Scholarchemqa: Unveiling the power of language models in chemical research question answering, Chen et al., arXiv Badge
  • Mmsci: A dataset for graduate-level multi-discipline multimodal scientific understanding, Li et al., arXiv Badge
  • SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers, Pramanick et al., PDF Badge
  • Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models, Li et al., PDF Badge
  • SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark, Liang et al., PDF Badge
  • Language agents achieve superhuman synthesis of scientific knowledge, Skarlinski et al., arXiv Badge
  • Fine-Tuning Large Language Models for Scientific Text Classification: A Comparative Study, Rostam et al., Other Source Badge
  • Graphusion: a RAG framework for Knowledge Graph Construction with a global perspective, Yang et al., arXiv Badge
  • M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models, Li et al., PDF Badge
  • SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers, Singh et al., arXiv Badge
  • SciAgent: Tool-augmented Language Models for Scientific Reasoning, Ma et al., PDF Badge
  • SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature, Wadden et al., Other Source Badge
  • PaSa: An LLM Agent for Comprehensive Academic Paper Search, He et al., arXiv Badge
  • BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning, Zhao et al., arXiv Badge
  • AutoPaperBench: An MLLM-Based Framework for Automatic Generation of Paper Understanding Evaluation Benchmarks, Kim et al., Electronics Badge
  • FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights, Yu et al., arXiv Badge
  • SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models, Yu et al., arXiv Badge
  • EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models, Xu et al., arXiv Badge
  • Scaling Physical Reasoning with the PHYSICS Dataset, Zheng et al., arXiv Badge

9.1.2 Table & Chart Scientific Comprehension

  • ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning, Masry et al., PDF Badge
  • Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning, Xia et al., arXiv Badge
  • Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study, Sui et al., PDF Badge
  • NovaChart: A Large-scale Dataset towards Chart Understanding and Generation of Multimodal Large Language Models, Hu et al., PDF Badge
  • CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs, Wang et al., PDF Badge
  • The Mighty ToRR: A Benchmark for Table Reasoning and Robustness, Ashury-Tahan et al., arXiv Badge
  • Tablebench: A comprehensive and complex benchmark for table question answering, Wu et al., AAAI Badge

9.2 AI for Academic Survey

  • Ms2: Multi-document summarization of medical studies, DeYoung et al., arXiv Badge
  • Generating (factual?) narrative summaries of rcts: Experiments with neural multi-document summarization, Wallace et al., Other Source Badge
  • Overview of MSLR2022: A shared task on multi-document summarization for literature reviews, Wang et al., Other Source Badge
  • Generating a structured summary of numerous academic papers: Dataset and method, Liu et al., arXiv Badge
  • SciReviewGen: a large-scale dataset for automatic literature review generation, Kasanishi et al., arXiv Badge
  • SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section, Fernandes et al., Other Source Badge
  • OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining, Zhang et al., arXiv Badge
  • OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources, Docekal et al., arXiv Badge
  • Autosurvey: Large language models can automatically write surveys, Wang et al., NeurIPS Badge
  • SurveyX: Academic Survey Automation via Large Language Models, Liang et al., arXiv Badge
  • SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing, Yan et al., arXiv Badge
  • Browsecomp: A simple yet challenging benchmark for browsing agents, Wei et al., arXiv Badge
  • LLM times MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources, Wang et al., arXiv Badge
  • AcademicBrowse: Benchmarking Academic Browse Ability of LLMs, Zhou et al., arXiv Badge

9.3 AI for Scientific Discovery

Idea Mining
  • OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining, Zhang et al., arXiv Badge
  • Can Large Language Models Unlock Novel Scientific Research Ideas?, Kumar et al., arXiv Badge
  • LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context, Ruan et al., arXiv Badge
  • Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses, Yang et al., Other Source Badge
  • Learning to Generate Research Idea with Dynamic Control, Li et al., arXiv Badge
  • Structuring Scientific Innovation: A Framework for Modeling and Discovering Impactful Knowledge Combinations, Chen et al., arXiv Badge
  • ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition, Liu et al., arXiv Badge
  • Ai idea bench 2025: Ai research idea generation benchmark, Qiu et al., arXiv Badge
  • Sparks of science: Hypothesis generation using structured paper data, O'Neill et al., arXiv Badge
  • Spark: A System for Scientifically Creative Idea Generation, Sanyal et al., arXiv Badge
  • Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science, Liu et al., arXiv Badge
  • CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature, Sternlicht et al., arXiv Badge
Novelty & Significant Assessment
  • Blade: Benchmarking language model agents for data-driven science, Gu et al., arXiv Badge
  • Empowering AI as Autonomous Researchers: Evaluating LLMs in Generating Novel Research Ideas through Automated Metrics, Dasgupta et al., Other Source Badge
  • LLMs Tackle Meta-Analysis: Automating Scientific Hypothesis Generation with Statistical Rigor, Lin et al., Other Source Badge
  • A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models, Tan et al., arXiv Badge
  • Hypobench: Towards systematic and principled benchmarking for hypothesis generation, Liu et al., arXiv Badge
  • Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications, Lin et al., PDF Badge
  • Harnessing Large Language Models for Scientific Novelty Detection, Liu et al., arXiv Badge
Theory Analysis
  • Minif2f: a cross-system benchmark for formal olympiad-level mathematics, Zheng et al., arXiv Badge
  • FactKG: Fact verification via reasoning on knowledge graphs, Kim et al., arXiv Badge
  • Investigating zero-and few-shot generalization in fact verification, Pan et al., arXiv Badge
  • Fimo: A challenge formal dataset for automated theorem proving, Liu et al., arXiv Badge
  • Can Large Language Models Detect Misinformation in Scientific News Reporting?, Cao et al., arXiv Badge
  • Mustard: Mastering uniform synthesis of theorem and proof data, Huang et al., arXiv Badge
  • MAGIC: Multi-Argument Generation with Self-Refinement for Domain Generalization in Automatic Fact-Checking, Kao et al., COLING Badge
  • Zero-shot scientific claim verification using LLMs and citation text, Alvarez et al., Other Source Badge
  • Grounding fallacies misrepresenting scientific publications in evidence, Glockner et al., arXiv Badge
  • Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs, Zhang et al., arXiv Badge
  • DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts, Braun et al., arXiv Badge
  • TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding, Ku et al., arXiv Badge
  • BioDSA-1K: Benchmarking Data Science Agents for Biomedical Research, Wang et al., arXiv Badge
Experiment Design
  • Benchmarking compound activity prediction for real-world drug discovery applications, Tian et al., Communications Chemistry Badge
  • A bioactivity foundation model using pairwise meta-learning, Feng et al., Nature Badge
  • BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning, Liu et al., arXiv Badge
  • LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation, Zhang et al., arXiv Badge
Experiment Conduction
  • Mlagentbench: Evaluating language agents on machine learning experimentation, Huang et al., arXiv Badge
  • Infiagent-dabench: Evaluating agents on data analysis tasks, Hu et al., arXiv Badge
  • DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?, Jing et al., arXiv Badge
  • Mle-bench: Evaluating machine learning agents on machine learning engineering, Chan et al., arXiv Badge
  • Mlgym: A new framework and benchmark for advancing ai research agents, Nathani et al., arXiv Badge
  • MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?, Zhang et al., arXiv Badge
  • Scireplicate-bench: Benchmarking llms in agent-driven algorithmic reproduction from research papers, Xiang et al., arXiv Badge
  • Can AI Agents Design and Implement Drug Discovery Pipelines?, Smbatyan et al., arXiv Badge
  • EXP-Bench: Can AI Conduct AI Research Experiments?, Kon et al., arXiv Badge
  • Scienceboard: Evaluating multimodal autonomous agents in realistic scientific workflows, Sun et al., arXiv Badge
  • AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage, Zhao et al., arXiv Badge
  • MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research, Chen et al., arXiv Badge
  • Autobio: A simulation and benchmark for robotic automation in digital biology laboratory, Lan et al., arXiv Badge
  • ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code, Hua et al., arXiv Badge
Experimental Analysis
  • Microvqa: A multimodal reasoning benchmark for microscopy-based scientific research, Burgess et al., Other Source Badge
Full Automatic Discovery
  • Ds-agent: Automated data science by empowering large language models with case-based reasoning, Guo et al., arXiv Badge
  • Discoverybench: Towards data-driven discovery with large language models, Majumder et al., arXiv Badge
  • Blade: Benchmarking language model agents for data-driven science, Gu et al., arXiv Badge
  • Scienceagentbench: Toward rigorous assessment of language agents for data-driven scientific discovery, Chen et al., arXiv Badge
  • DISCOVERYWORLD: A virtual environment for developing and evaluating automated scientific discovery agents, Jansen et al., NeurIPS Badge
  • Curie: Toward rigorous and automated scientific experimentation with ai agents, Kon et al., arXiv Badge
  • A vision for auto research with llm agents, Liu et al., arXiv Badge
  • Can AI Agents Design and Implement Drug Discovery Pipelines?, Smbatyan et al., arXiv Badge
  • Llm-srbench: A new benchmark for scientific equation discovery with large language models, Shojaee et al., arXiv Badge
  • Towards llm agents for earth observation, Kao et al., arXiv Badge
  • Benchmarking AI scientists in omics data-driven biological research, Luo et al., arXiv Badge
  • ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition, Liu et al., arXiv Badge

9.4 AI for Academic Writing

9.4.1 Semi-Automatic Academic Writing

Assistance During Manuscript Preparation.
  • LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts, Hashemi et al., PDF Badge
  • MoDeST: A dataset for Multi Domain Scientific Title Generation, Bölücü et al., PDF Badge
Assistance During Manuscript Writing
  • CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding, Wright et al., ACL Findings Badge
  • Figgen: Text to scientific figure generation, Rodriguez et al., arXiv Badge
  • Scicapenter: Supporting caption composition for scientific figures with machine-generated captions and ratings, Hsu et al., Other Source Badge
  • Figuring out Figures: Using Textual References to Caption Scientific Figures, Cao et al., arXiv Badge
  • CiteBART: Learning to Generate Citations for Local Citation Recommendation, {\c{C}}elik et al., arXiv Badge
  • TikZero: Zero-Shot Text-Guided Graphics Program Synthesis, Belouadi et al., arXiv Badge
  • Futuregen: Llm-rag approach to generate the future work of scientific article, Azher et al., arXiv Badge
  • ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations, Wang et al., arXiv Badge
  • XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision, Chen et al., arXiv Badge
Assistance After Manuscript Completion.
  • WikiAtomicEdits: A multilingual corpus of Wikipedia edits for modeling language and discourse, Faruqui et al., arXiv Badge
  • Learning to split and rephrase from Wikipedia edit history, Botha et al., arXiv Badge
  • Diamonds in the rough: Generating fluent sentences from early-stage drafts for academic writing assistance, Ito et al., arXiv Badge
  • Neural Automated Writing Evaluation with Corrective Feedback, Wang et al., arXiv Badge
  • AAAR-1.0: Assessing AI's Potential to Assist Research, Lou et al., arXiv Badge
  • Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers, Pang et al., arXiv Badge
  • The usage of a transformer based and artificial intelligence driven multidimensional feedback system in english writing instruction, Zheng et al., Scientific Reports Badge

9.5 AI for Academic Peer Reviewing

  • A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications, Kang et al., NAACL Badge
  • Citetracked: A longitudinal dataset of peer reviews and citations, Plank et al., Other Source Badge
  • COMPARE: a taxonomy and dataset of comparison discussions in peer reviews, Singh et al., Other Source Badge
  • Peer review analyze: A novel benchmark resource for computational analysis of peer reviews, Ghosal et al., Plos one Badge
  • Reviewergpt? an exploratory study on using large language models for paper reviewing, Liu et al., arXiv Badge
  • NLPeer: A Unified Resource for the Computational Study of Peer Review, Dycke et al., PDF Badge
  • Moprd: A multidisciplinary open peer review dataset, Lin et al., Neural Computing and Applications Badge
  • The Open Review-Based (ORB) dataset: Towards Automatic Assessment of Scientific Papers and Experiment Proposals in High-Energy Physics, Szumega et al., arXiv Badge
  • Pre: A peer review based large language model evaluator, Chu et al., arXiv Badge
  • Is LLM a reliable reviewer? A comprehensive evaluation of LLM on automatic paper reviewing tasks, Zhou et al., COLING Badge
  • PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews, Bharti et al., Language Resources and Evaluation Badge
  • RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance, Couto et al., arXiv Badge
  • Peer review as a multi-turn and long-context dialogue with role-based interactions, Tan et al., arXiv Badge
  • MASSW: A new dataset and benchmark tasks for ai-assisted scientific workflows, Zhang et al., arXiv Badge
  • Scientific opinion summarization: Paper meta-review generation dataset, methods, and evaluation, Zeng et al., IJCAI Badge
  • Can large language models provide useful feedback on research papers? A large-scale empirical analysis, Liang et al., NEJM AI Badge
  • An Analysis of Tasks and Datasets in Peer Reviewing, Staudinger et al., Other Source Badge
  • PeerArg: Argumentative Peer Review with LLMs, Sukpanichnant et al., arXiv Badge
  • Enhancing peer review efficiency: A mixed-methods analysis of artificial intelligence-assisted reviewer selection across academic disciplines, Farber et al., Learned Publishing Badge
  • Automatic Large Language Model Evaluation via Peer Review, Chu et al., Other Source Badge
  • AAAR-1.0: Assessing AI's Potential to Assist Research, Lou et al., arXiv Badge
  • Is your paper being reviewed by an llm? investigating ai text detectability in peer review, Yu et al., arXiv Badge
  • WithdrarXiv: A Large-Scale Dataset for Retraction Study, Rao et al., arXiv Badge
  • OpenReviewer: A Specialized Large Language Model for Generating Critical Scientific Paper Reviews, Idahl et al., arXiv Badge
  • Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews, Shin et al., arXiv Badge
  • PeerQA: A Scientific Question Answering Dataset from Peer Reviews, Baumg{\"a}rtner et al., ACL Badge
  • Revieweval: An evaluation framework for ai-generated reviews, Kirtani et al., arXiv Badge
  • LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews, Purkayastha et al., arXiv Badge
  • When AI co-scientists fail: SPOT-a benchmark for automated verification of scientific research, Son et al., arXiv Badge
  • Re 2: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions, Zhang et al., arXiv Badge
  • PaperEval: A universal, quantitative, and explainable paper evaluation method powered by a multi-agent system, Huang et al., Information Processing & Management Badge

BibTeX

BibTex Code Here