1. AI for Scientific Comprehension
1.1 Textual Scientific Comprehension
- Open-retrieval conversational question answering, Qu et al.,
- A non-factoid question-answering taxonomy, Bolotova et al.,
- How Well Do Large Language Models Extract Keywords? A Systematic
Evaluation on Scientific Corpora, Mansour et
al.,
1.1.1 Semi-Automatic Scientific Comprehension
- Scholarchemqa: Unveiling the power of language models in chemical
research question answering, Chen et al.,
- Evaluating and Training Long-Context Large Language Models for
Question Answering on Scientific Papers, Hilgert
et al.,
- Are plain language summaries more readable than scientific abstracts?
Evidence from six biomedical and life sciences
journals, Wen et al.,
- Clam: Selective clarification for ambiguous questions with generative
language models, Kuhn et al.,
- Clarify when necessary: Resolving ambiguity through interaction with
lms, Zhang et al.,
- Empowering language models with active inquiry for deeper
understanding, Pang et al.,
- Iqa-eval: Automatic evaluation of human-model interactive question
answering, Li et al.,
- The ai scientist-v2: Workshop-level automated scientific discovery via
agentic tree search, Yamada et al.,
- Truly Assessing Fluid Intelligence of Large Language Models through
Dynamic Reasoning Evaluation, Yang et al.,
- CiteWorth: Cite-Worthiness Detection for Improved Scientific Document
Understanding, Wright et al.,
- Scienceqa: A novel resource for question answering on scholarly
articles, Saikh et al.,
- Human and technological infrastructures of fact-checking,
Juneja et al.,
- Paperqa: Retrieval-augmented generative agent for scientific
research, Lala et al.,
- Efficacy analysis of online artificial intelligence fact-checking
tools, Hartley et al.,
- Language agents achieve superhuman synthesis of scientific
knowledge, Skarlinski et al.,
- Graphusion: a RAG framework for Knowledge Graph Construction with a
global perspective, Yang et al.,
- SciAgent: Tool-augmented Language Models for Scientific
Reasoning, Ma et al.,
- Hallucination Mitigation using Agentic AI Natural Language-Based
Frameworks, Gosmar et al.,
- MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large
Language Models and Retrieval-Augmented
Generation, Kim et al.,
- Towards reasoning era: A survey of long chain-of-thought for reasoning
large language models, Chen et al.,
- Self-Critique Guided Iterative Reasoning for Multi-hop Question
Answering, Chu et al.,
- CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal
Hallucinations Detection in Large Language
Models, Zhang et al.,
- Boolq: Exploring the surprising difficulty of natural yes/no
questions, Clark et al.,
- SciBERT: A Pretrained Language Model for Scientific Text,
Beltagy et al.,
- CoQUAD: a COVID-19 question answering dataset system, facilitating
research, benchmarking, and practice, Raza et
al.,
- Quaser: Question answering with scalable extractive
rationalization, Ghoshal et al.,
- Spaceqa: Answering questions about the design of space missions and
space craft concepts, Garcia-Silva et al.,
- What if: Generating code to answer simulation questions in chemistry
texts, Peretz et al.,
- Biomedlm: A 2.7 b parameter language model trained on biomedical
text, Bolton et al.,
- Scifibench: Benchmarking large multimodal models for scientific figure
interpretation, Roberts et al.,
- Scholarchemqa: Unveiling the power of language models in chemical
research question answering, Chen et al.,
- Mmsci: A dataset for graduate-level multi-discipline multimodal
scientific understanding, Li et al.,
- Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
Large Vision-Language Models, Li et al.,
- What are the essential factors in crafting effective long context
multi-hop instruction datasets? insights and best
practices, Chen et al.,
- Fine-Tuning Large Language Models for Scientific Text Classification:
A Comparative Study, Rostam et al.,
- L-CiteEval: Do Long-Context Models Truly Leverage Context for
Responding?, Tang et al.,
- Toward expert-level medical question answering with large language
models, Singhal et al.,
- A comprehensive survey on long context language modeling, Liu
et al.,
- A survey on transformer context extension: Approaches and
evaluation, Liu et al.,
- Scholarchemqa: Unveiling the power of language models in chemical
research question answering, Chen et al.,
- Evaluating and Training Long-Context Large Language Models for
Question Answering on Scientific Papers, Hilgert
et al.,
- Are plain language summaries more readable than scientific abstracts?
Evidence from six biomedical and life sciences
journals, Wen et al.,
1.1.2 Full-Automatic Scientific Comprehension
Summarization-guided Automatic Scientific Comprehension- Straight from the scientist's mouth—plain language summaries promote
laypeople's comprehension and knowledge acquisition when
reading about individual research findings in
psychology, Kerwer et al.,
- Hierarchical attention graph for scientific document summarization in
global and local level, Zhao et al.,
- Can Large Language Model Summarizers Adapt to Diverse Scientific
Communication Goals?, Fonseca et al.,
- Autonomous LLM-Driven Research—from Data to Human-Verifiable Research
Papers, Ifargan et al.,
- Large language models can self-improve, Huang et al.,
- Selfcheck: Using llms to zero-shot check their own step-by-step
reasoning, Miao et al.,
- Enabling Language Models to Implicitly Learn Self-Improvement,
Wang et al.,
- Sciglm: Training scientific language models with self-reflective
instruction annotation and tuning, Zhang et al.,
- Generating Multiple Choice Questions from Scientific Literature via
Large Language Models, Luo et al.,
- SciQAG: A Framework for Auto-Generated Science Question Answering
Dataset with Fine-grained Evaluation, Wan et
al.,
- Recursive introspection: Teaching language model agents how to
self-improve, Qu et al.,
- Mind the Gap: Examining the Self-Improvement Capabilities of Large
Language Models, Song et al.,
- FRAME: Feedback-Refined Agent Methodology for Enhancing Medical
Research Insights, Yu et al.,
- Introspective Growth: Automatically Advancing LLM Expertise in
Technology Judgment, Wu et al.,
- Open-retrieval conversational question answering, Qu et al.,
- A non-factoid question-answering taxonomy, Bolotova et al.,
- How Well Do Large Language Models Extract Keywords? A Systematic
Evaluation on Scientific Corpora, Mansour et
al.,
1.2 Table & Chart Scientific Comprehension
- How well do large language models understand tables in materials
science?, Circi et al.,
- ArxivDIGESTables: Synthesizing Scientific Literature into Tables using
Language Models, Newman et al.,
- Sciverse: Unveiling the knowledge comprehension and visual reasoning
of lmms on multi-modal scientific problems, Guo
et al.,
1.2.1 Table Understanding
- A survey on table-and-text hybridqa: Concepts, methods, challenges and
future directions, Wang et al.,
- Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding, Wang et al.,
- Improving demonstration diversity by human-free fusing for
text-to-SQL, Wang et al.,
- Table Meets LLM: Can Large Language Models Understand Structured Table
Data? A Benchmark and Empirical Study, Sui et
al.,
- Multimodal Table Understanding, Zheng et al.,
- Tree-of-Table: Unleashing the Power of LLMs for Enhanced Large-Scale
Table Understanding, Ji et al.,
- Tablemaster: A recipe to advance table understanding with language
models, Cao et al.,
- A survey of table reasoning with large language models, Zhang
et al.,
- The Mighty ToRR: A Benchmark for Table Reasoning and
Robustness, Ashury-Tahan et al.,
- Tablebench: A comprehensive and complex benchmark for table question
answering, Wu et al.,
1.2.2 Chart Understanding
- Chartassisstant: A universal chart multimodal language model via
chart-to-table pre-training and multitask instruction
tuning, Meng et al.,
- SPIQA: A Dataset for Multimodal Question Answering on Scientific
Papers, Pramanick et al.,
- ChartInstruct: Instruction Tuning for Chart Comprehension and
Reasoning, Masry et al.,
- ChartAssistant: A Universal Chart Multimodal Language Model via
Chart-to-Table Pre-training and Multitask Instruction
Tuning, Meng et al.,
- SceMQA: A Scientific College Entrance Level Multimodal Question
Answering Benchmark, Liang et al.,
- Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
Large Vision-Language Models, Li et al.,
- SynChart: Synthesizing Charts from Language Models, Liu et
al.,
- NovaChart: A Large-scale Dataset towards Chart Understanding and
Generation of Multimodal Large Language Models,
Hu et al.,
- ChartGemma: Visual Instruction-tuning for Chart Reasoning in the
Wild, Masry et al.,
- ChartSketcher: Reasoning with Multimodal Feedback and Reflection for
Chart Understanding, Huang et al.,
- Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
From Scratch with Agentic Framework, Yang et
al.,
- How well do large language models understand tables in materials
science?, Circi et al.,
- ArxivDIGESTables: Synthesizing Scientific Literature into Tables using
Language Models, Newman et al.,
- Sciverse: Unveiling the knowledge comprehension and visual reasoning
of lmms on multi-modal scientific problems, Guo
et al.,
2. AI for Academic Survey
- Pre-writing: The stage of discovery in the writing process,
Rohman et al.,
2.1 Related Work Retrieval
- Paper recommender systems: a literature survey, Beel et al.,
- A Review on Personalized Academic Paper Recommendation., Li et
al.,
- Insights into relevant knowledge extraction techniques: a
comprehensive review, Shahid et al.,
- A survey on rag meeting llms: Towards retrieval-augmented large
language models, Fan et al.,
- Scientific paper recommendation: A survey, Bai et al.,
- SPLADE v2: Sparse lexical and expansion model for information
retrieval, Formal et al.,
- Scientific paper recommendation systems: a literature review of recent
publications, Kreutz et al.,
- Clinical Trial Retrieval via Multi-grained Similarity
Learning, Luo et al.,
- Related Work and Citation Text Generation: A Survey, Li et
al.,
- MIR: Methodology Inspiration Retrieval for Scientific Research
Problems, Garikaparthi et al.,
- From who you know to what you read: Augmenting scientific
recommendations with implicit social networks,
Kang et al.,
- Comlittee: Literature discovery with personal elected author
committees, Kang et al.,
- Citationsum: Citation-aware graph contrastive learning for scientific
paper summarization, Luo et al.,
- Explaining relationships among research papers, Li et al.,
- KGValidator: A Framework for Automatic Validation of Knowledge Graph
Construction, Boylan et al.,
- An academic recommender system on large citation data based on
clustering, graph modeling and deep learning,
Stergiopoulos et al.,
- ArZiGo: A recommendation system for scientific articles,
Pinedo et al.,
- Graphusion: a RAG framework for Knowledge Graph Construction with a
global perspective, Yang et al.,
- Taxonomy Tree Generation from Citation Graph, Hu et al.,
- Construction and Application of Materials Knowledge Graph in
Multidisciplinary Materials Science via Large Language
Model, Ye et al.,
- Docs2KG: A Human-LLM Collaborative Approach to Unified Knowledge Graph
Construction from Heterogeneous Documents, Sun
et al.,
- Paperweaver: Enriching topical paper alerts by contextualizing
recommended papers with user-collected papers,
Lee et al.,
- Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source
Question-Answer Systems using Large Language
Models, Seabra et al.,
- Agentic Retrieval-Augmented Generation: A Survey on Agentic
RAG, Singh et al.,
- PaSa: An LLM Agent for Comprehensive Academic Paper Search, He
et al.,
- CuriousLLM: Elevating multi-document question answering with
llm-enhanced knowledge graph reasoning, Yang et
al.,
- Introducing Deep Research, {OpenAI} et al.,
- LitLLMs, LLMs for Literature Review: Are we there yet?,
Agarwal et al.,
- Select, Read, and Write: A Multi-Agent Framework of Full-Text-based
Related Work Generation, Liu et al.,
- GPT-4o Search Preview, {OpenAI} et al.,
- WebDancer: Towards Autonomous Information Seeking Agency, Wu
et al.,
- Iterative self-incentivization empowers large language models as
agentic searchers, Shi et al.,
- Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
From Scratch with Agentic Framework, Yang et
al.,
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research
Agents, Du et al.,
- AcademicBrowse: Benchmarking Academic Browse Ability of LLMs,
Zhou et al.,
- Paper recommender systems: a literature survey, Beel et al.,
- A Review on Personalized Academic Paper Recommendation., Li et
al.,
- Insights into relevant knowledge extraction techniques: a
comprehensive review, Shahid et al.,
- A survey on rag meeting llms: Towards retrieval-augmented large
language models, Fan et al.,
2.2 Overview Report Generation
- Towards automated related work summarization, Hoang et al.,
2.2.1 Research Roadmap Mapping
- Hierarchical catalogue generation for literature review: a
benchmark, Zhu et al.,
- Assisting in writing wikipedia-like articles from scratch with large
language models, Shao et al.,
- Chime: Llm-assisted hierarchical organization of scientific studies
for literature review support, Hsu et al.,
- Knowledge Navigator: LLM-guided Browsing Framework for Exploratory
Search in Scientific Literature, Katz et al.,
- Understanding Survey Paper Taxonomy about Large Language Models via
Graph Representation Learning, Zhuang et al.,
- Artificial intelligence for literature reviews: Opportunities and
challenges, Bolanos et al.,
- Taxonomy Tree Generation from Citation Graph, Hu et al.,
- LLMs for Literature Review: Are we there yet?, Agarwal et al.,
- Autosurvey: Large language models can automatically write
surveys, Wang et al.,
- SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and
Multi-dimensional Evaluation for Automated Survey
Writing, Yan et al.,
- Towards reasoning era: A survey of long chain-of-thought for reasoning
large language models, Chen et al.,
- Ai2 Scholar QA: Organized Literature Synthesis with
Attribution, Singh et al.,
2.2.2 Section-level Related Work Generation
- Towards automated related work summarization, Hoang et al.,
- Capturing relations between scientific papers: An abstractive model
for related work section generation, Chen et
al.,
- Target-aware abstractive related work generation with contrastive
learning, Chen et al.,
- The use of a large language model to create plain language summaries
of evidence reviews in healthcare: A feasibility
study, Ovelman et al.,
- Related Work and Citation Text Generation: A Survey, Li et
al.,
- 376 Using a large language model to create lay summaries of clinical
study descriptions, Kaiser et al.,
- Select, Read, and Write: A Multi-Agent Framework of Full-Text-based
Related Work Generation, Liu et al.,
- Towards automated related work summarization, Hoang et al.,
- Automatic generation of related work sections in scientific papers: an
optimization approach, Hu et al.,
- Neural related work summarization with a joint context-driven
attention mechanism, Wang et al.,
- Automatic generation of related work through summarizing
citations, Chen et al.,
- Toc-rwg: Explore the combination of topic model and citation
information for automatic related work
generation, Wang et al.,
- Automatic Related Work Section Generation by Sentence Extraction and
Reordering., Deng et al.,
- Neural related work summarization with a joint context-driven
attention mechanism, Wang et al.,
- Automated lay language summarization of biomedical scientific
reviews, Guo et al.,
- BACO: A background knowledge-and content-based framework for citing
sentence generation, Ge et al.,
- Capturing relations between scientific papers: An abstractive model
for related work section generation, Chen et
al.,
- Target-aware abstractive related work generation with contrastive
learning, Chen et al.,
- Multi-document scientific summarization from a knowledge graph-centric
view, Wang et al.,
- Controllable citation sentence generation with language
models, Gu et al.,
- Causal intervention for abstractive related work generation,
Liu et al.,
- Cited text spans for citation text generation, Li et al.,
- Towards a unified framework for reference retrieval and related work
generation, Shi et al.,
- Explaining relationships among research papers, Li et al.,
- Shallow synthesis of knowledge in gpt-generated texts: A case study in
automatic related work composition, Martin-Boyle
et al.,
- Related work and citation text generation: A survey, Li et
al.,
- RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document
Abstractive Summarization, Pu et al.,
- Reinforced Subject-Aware Graph Neural Network for Related Work
Generation, Yu et al.,
- Disentangling Instructive Information from Ranked Multiple Candidates
for Multi-Document Scientific Summarization,
Wang et al.,
- Toward Related Work Generation with Structure and Novelty
Statement, Nishimura et al.,
- Estimating Optimal Context Length for Hybrid Retrieval-augmented
Multi-document Summarization, Pratapa et al.,
- Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature
Summarization, Achkar et al.,
- Towards automated related work summarization, Hoang et al.,
- Capturing relations between scientific papers: An abstractive model
for related work section generation, Chen et
al.,
- Target-aware abstractive related work generation with contrastive
learning, Chen et al.,
- The use of a large language model to create plain language summaries
of evidence reviews in healthcare: A feasibility
study, Ovelman et al.,
- Related Work and Citation Text Generation: A Survey, Li et
al.,
- 376 Using a large language model to create lay summaries of clinical
study descriptions, Kaiser et al.,
- Select, Read, and Write: A Multi-Agent Framework of Full-Text-based
Related Work Generation, Liu et al.,
2.2.3 Document-level Survey Generation
- Analyzing the past to prepare for the future: Writing a literature
review, Webster et al.,
- Hierarchical catalogue generation for literature review: a
benchmark, Zhu et al.,
- Bio-sieve: exploring instruction tuning large language models for
systematic review automation, Robinson et al.,
- Litllm: A toolkit for scientific literature review, Agarwal et
al.,
- Assisting in writing wikipedia-like articles from scratch with large
language models, Shao et al.,
- Artificial intelligence for literature reviews: Opportunities and
challenges, Bolanos et al.,
- Language agents achieve superhuman synthesis of scientific
knowledge, Skarlinski et al.,
- Instruct Large Language Models to Generate Scientific Literature
Survey Step by Step, Lai et al.,
- Openscholar: Synthesizing scientific literature with
retrieval-augmented lms, Asai et al.,
- Intelligent summaries: Will Artificial Intelligence mark the finale
for biomedical literature reviews?, Galli et
al.,
- Autosurvey: Large language models can automatically write
surveys, Wang et al.,
- LAG: LLM agents for Leaderboard Auto Generation on Demanding,
Wu et al.,
- SurveyX: Academic Survey Automation via Large Language Models,
Liang et al.,
- Automating research synthesis with domain-specific large language
model fine-tuning, Susnjak et al.,
- SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and
Multi-dimensional Evaluation for Automated Survey
Writing, Yan et al.,
- Towards automated related work summarization, Hoang et al.,
- Pre-writing: The stage of discovery in the writing process,
Rohman et al.,
3. AI for Scientific Discovery
- Scientific discovery in the age of artificial intelligence,
Wang et al.,
- Beyond Benchmarking: Automated Capability Discovery via Model
Self-Exploration, Lu et al.,
- AIRUS: a simple workflow for AI-assisted exploration of scientific
data, Harris et al.,
- On the Rise of New Mathematical Spaces and Towards AI-Driven
Scientific Discovery, Raeini et al.,
- From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule
Learning with Large Language Models, He et al.,
- AI-Driven Discovery: The Transformative Impact of Machine Learning on
Research and Development, Roy et al.,
3.1 Idea Mining
- Can Large Language Models Unlock Novel Scientific Research
Ideas?, Kumar et al.,
- Can llms generate novel research ideas? a large-scale human study with
100+ nlp researchers, Si et al.,
- LLMs can realize combinatorial creativity: generating creative ideas
via LLMs for scientific research, Gu et al.,
- Large language models for causal hypothesis generation in
science, Cohrs et al.,
- Futuregen: Llm-rag approach to generate the future work of scientific
article, Azher et al.,
- ResearchBench: Benchmarking LLMs in Scientific Discovery via
Inspiration-Based Task Decomposition, Liu et
al.,
- Sparks of science: Hypothesis generation using structured paper
data, O'Neill et al.,
- Spark: A System for Scientifically Creative Idea Generation,
Sanyal et al.,
- CHIMERA: A Knowledge Base of Idea Recombination in Scientific
Literature, Sternlicht et al.,
- Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI
Knowledge Co-Creation, Lin et al.,
3.1.1 Idea Mining from Internal Knowledge
- Ideas are dimes a dozen: Large language models for idea generation in
innovation, Girotra et al.,
- Prompting Diverse Ideas: Increasing AI Idea Variance, Meincke
et al.,
- Using Large Language Models for Idea Generation in Innovation,
Meincke et al.,
- Can llms generate novel research ideas? a large-scale human study with
100+ nlp researchers, Si et al.,
- Can Large Language Models Unlock Novel Scientific Research
Ideas?, Kumar et al.,
- ECM: A Unified Electronic Circuit Model for Explaining the Emergence
of In-Context Learning and Chain-of-Thought in Large
Language Model, Chen et al.,
- Structuring Scientific Innovation: A Framework for Modeling and
Discovering Impactful Knowledge Combinations,
Chen et al.,
- Improving Research Idea Generation Through Data: An Empirical
Investigation in Social Science, Liu et al.,
- Enhance Innovation by Boosting Idea Generation with Large Language
Models, Haarmann et al.,
3.1.2 Idea Mining from External Signal
Idea Mining from External Knowledge- Literature based discovery: models, methods, and trends, Henry
et al.,
- Predicting the Future of AI with AI: High-quality link prediction in
an exponentially growing knowledge network,
Krenn et al.,
- A survey of large language models, Zhao et al.,
- Large language models meet nlp: A survey, Qin et al.,
- Position: data-driven discovery with large generative models,
Majumder et al.,
- Generation and human-expert evaluation of interesting research ideas
using knowledge graphs and large language
models, Gu et al.,
- Interesting scientific idea generation using knowledge graphs and
llms: Evaluations with 100 research group
leaders, Gu et al.,
- Scimon: Scientific inspiration machines optimized for novelty,
Wang et al.,
- Accelerating scientific discovery with generative knowledge
extraction, graph-based representation, and multimodal
intelligent graph reasoning, Buehler et al.,
- Literature meets data: A synergistic approach to hypothesis
generation, Liu et al.,
- Chain of ideas: Revolutionizing research via novel idea development
with llm agents, Li et al.,
- SciPIP: An LLM-based Scientific Paper Idea Proposer, Wang et
al.,
- LLMs can realize combinatorial creativity: generating creative ideas
via LLMs for scientific research, Gu et al.,
- Learning to Generate Research Idea with Dynamic Control, Li et
al.,
- Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI
Research Idea Generation, Gao et al.,
- Sparks of science: Hypothesis generation using structured paper
data, O'Neill et al.,
- gpt-researcher, Assafelovic et al.,
- Mlagentbench: Evaluating language agents on machine learning
experimentation, Huang et al.,
- Researchagent: Iterative research idea generation over scientific
literature with large language models, Baek et
al.,
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- MatPilot: an LLM-enabled AI Materials Scientist under the Framework of
Human-Machine Collaboration, Ni et al.,
- The virtual lab: AI agents design new SARS-CoV-2 nanobodies with
experimental validation, Swanson et al.,
- Agent laboratory: Using llm agents as research assistants,
Schmidgall et al.,
- LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling
Discovery of New Ionizable Lipid Designs for mRNA
Delivery, Cui et al.,
- Towards an AI co-scientist, Gottweis et al.,
- Zochi Technical Report, AI et al.,
- AgentRxiv: Towards Collaborative Autonomous Research,
Schmidgall et al.,
- Carl Technical Report, Institute et al.,
- Ideasynth: Iterative research idea development through evolving and
composing idea facets with literature-grounded
feedback, Pu et al.,
3.1.3 Idea Mining from Team discussion
AI-AI Collaboration- Large language models for automated open-domain scientific hypotheses
discovery, Yang et al.,
- Exploring collaboration mechanisms for llm agents: A social psychology
view, Zhang et al.,
- Acceleron: A tool to accelerate research ideation, Nigam et
al.,
- Hypothesis generation with large language models, Zhou et al.,
- Researchagent: Iterative research idea generation over scientific
literature with large language models, Baek et
al.,
- Llm and simulation as bilevel optimizers: A new paradigm to advance
physical scientific discovery, Ma et al.,
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- Sciagents: Automating scientific discovery through multi-agent
intelligent graph reasoning, Ghafarollahi et
al.,
- Two heads are better than one: A multi-agent system has the potential
to improve scientific idea generation, Su et
al.,
- Chain of ideas: Revolutionizing research via novel idea development
with llm agents, Li et al.,
- Nova: An iterative planning and search approach to enhance novelty and
diversity of llm generated ideas, Hu et al.,
- The virtual lab: AI agents design new SARS-CoV-2 nanobodies with
experimental validation, Swanson et al.,
- AIGS: Generating Science from AI-Powered Automated
Falsification, Liu et al.,
- Large Language Models for Rediscovering Unseen Chemistry Scientific
Hypotheses, Yang et al.,
- Dolphin: Closed-loop Open-ended Auto-research through Thinking,
Practice, and Feedback, Yuan et al.,
- Multi-Novelty: Improve the Diversity and Novelty of Contents Generated
by Large Language Models via inference-time Multi-Views
Brainstorming, Lagzian et al.,
- Can Language Models Falsify? Evaluating Algorithmic Reasoning with
Counterexample Creation, Sinha et al.,
- PiFlow: Principle-aware Scientific Discovery with Multi-Agent
Collaboration, Pu et al.,
- An Interactive Co-Pilot for Accelerated Research Ideation,
Nigam et al.,
- Scideator: Human-LLM Scientific Idea Generation Grounded in
Research-Paper Facet Recombination, Radensky et
al.,
- MatPilot: an LLM-enabled AI Materials Scientist under the Framework of
Human-Machine Collaboration, Ni et al.,
- IRIS: Interactive Research Ideation System for Accelerating Scientific
Discovery, Garikaparthi et al.,
- Human creativity in the age of llms: Randomized experiments on
divergent and convergent thinking, Kumar et al.,
- Can Large Language Models Unlock Novel Scientific Research
Ideas?, Kumar et al.,
- Can llms generate novel research ideas? a large-scale human study with
100+ nlp researchers, Si et al.,
- LLMs can realize combinatorial creativity: generating creative ideas
via LLMs for scientific research, Gu et al.,
- Large language models for causal hypothesis generation in
science, Cohrs et al.,
- Futuregen: Llm-rag approach to generate the future work of scientific
article, Azher et al.,
- ResearchBench: Benchmarking LLMs in Scientific Discovery via
Inspiration-Based Task Decomposition, Liu et
al.,
- Sparks of science: Hypothesis generation using structured paper
data, O'Neill et al.,
- Spark: A System for Scientifically Creative Idea Generation,
Sanyal et al.,
- CHIMERA: A Knowledge Base of Idea Recombination in Scientific
Literature, Sternlicht et al.,
- Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI
Knowledge Co-Creation, Lin et al.,
3.2 Novelty & Significance Assessment
- Does writing with language models reduce content diversity?,
Padmakumar et al.,
- Greater variability in judgements of the value of novel ideas,
Johnson et al.,
- How AI ideas affect the creativity, diversity, and evolution of human
ideas: evidence from a large, dynamic
experiment, Ashkinaze et al.,
- A content-based novelty measure for scholarly publications: A proof of
concept, Wang et al.,
- Art or artifice? large language models and the false promise of
creativity, Chakrabarty et al.,
- How ai processing delays foster creativity: Exploring research
question co-creation with an llm-based agent,
Liu et al.,
- Homogenization effects of large language models on human creative
ideation, Anderson et al.,
- Shared imagination: Llms hallucinate alike, Zhou et al.,
- Can llms generate novel research ideas? a large-scale human study with
100+ nlp researchers, Si et al.,
- Supporting Assessment of Novelty of Design Problems Using Concept of
Problem SAPPhIRE, Singh et al.,
- Semi-Supervised Classification With Novelty Detection Using Support
Vector Machines and Linear Discriminant
Analysis, Dove et al.,
- Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the
Correspondence between Patent Claim and Prior
Art, Ikoma et al.,
- How do Humans and Language Models Reason About Creativity? A
Comparative Analysis, Laverghetta Jr et al.,
- Grapheval: A lightweight graph-based llm framework for idea
evaluation, Feng et al.,
- SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence
Embeddings, Keya et al.,
- Enabling ai scientists to recognize innovation: A domain-agnostic
algorithm for assessing novelty, Wang et al.,
- SC4ANM: Identifying optimal section combinations for automated novelty
prediction in academic papers, Wu et al.,
3.3 Theory Analysis
3.3.1 Scientific Claim Formalization
- LF: a foundational higher-order-logic, Goodsell et al.,
- Natural Language Hypotheses in Scientific Papers and How to Tame Them:
Suggested Steps for Formalizing Complex Scientific
Claims, Heger et al.,
- Position: Multimodal Large Language Models Can Significantly Advance
Scientific Reasoning, Yan et al.,
- Sciclaimhunt: A large dataset for evidence-based scientific claim
verification, Kumar et al.,
- Towards Effective Extraction and Evaluation of Factual Claims,
Metropolitansky et al.,
- NSF-SciFy: Mining the NSF Awards Database for Scientific
Claims, Rao et al.,
- Grammars of Formal Uncertainty: When to Trust LLMs in Automated
Reasoning Tasks, Ganguly et al.,
- Valsci: an open-source, self-hostable literature review utility for
automated large-batch scientific claim verification
using large language models, Edelman et al.,
3.3.2 Scientific Evidence Collection
- MultiVerS: Improving scientific claim verification with weak
supervision and full-document context, Wadden et
al.,
- Missing counter-evidence renders NLP fact-checking unrealistic for
misinformation, Glockner et al.,
- Investigating zero-and few-shot generalization in fact
verification, Pan et al.,
- Comparing knowledge sources for open-domain scientific claim
verification, Vladika et al.,
- Understanding Fine-grained Distortions in Reports of Scientific
Findings, W{\"u}hrl et al.,
- Improving health question answering with reliable and time-aware
evidence retrieval, Vladika et al.,
- Zero-shot scientific claim verification using LLMs and citation
text, Alvarez et al.,
- Grounding fallacies misrepresenting scientific publications in
evidence, Glockner et al.,
- Can foundation models actively gather information in interactive
environments to test hypotheses?, Ke et al.,
- LLM-based Corroborating and Refuting Evidence Retrieval for Scientific
Claim Verification, Wang et al.,
- SciClaims: An End-to-End Generative System for Biomedical Claim
Analysis, Ortega et al.,
3.3.3 Scientific Verification Analysis
- Proofver: Natural logic theorem proving for fact verification,
Krishna et al.,
- The state of human-centered NLP technology for fact-checking,
Das et al.,
- aedFaCT: Scientific Fact-Checking Made Easier via Semi-Automatic
Discovery of Relevant Expert Opinions, Altuncu
et al.,
- FactKG: Fact verification via reasoning on knowledge graphs,
Kim et al.,
- Fact-checking complex claims with program-guided reasoning,
Pan et al.,
- Prompt to be consistent is better than self-consistent? few-shot and
zero-shot fact verification with pre-trained language
models, Zeng et al.,
- Unsupervised Pretraining for Fact Verification by Language Model
Distillation, Bazaga et al.,
- Towards llm-based fact verification on news claims with a hierarchical
step-by-step prompting method, Zhang et al.,
- Characterizing and Verifying Scientific Claims: Qualitative Causal
Structure is All You Need, Wu et al.,
- Can Large Language Models Detect Misinformation in Scientific News
Reporting?, Cao et al.,
- What makes medical claims (un) verifiable? analyzing entity and
relation properties for fact verification,
W{\"u}hrl et al.,
- ClaimVer: Explainable claim-level verification and evidence
attribution of text through knowledge graphs,
Dammu et al.,
- Generating fact checking explanations, Atanasova et al.,
- MAGIC: Multi-Argument Generation with Self-Refinement for Domain
Generalization in Automatic Fact-Checking, Kao
et al.,
- Robust Claim Verification Through Fact Detection, Jafari et
al.,
- Automated justification production for claim veracity in fact
checking: A survey on architectures and
approaches, Eldifrawi et al.,
- Enhancing natural language inference performance with knowledge graph
for COVID-19 automated fact-checking in Indonesian
language, Muharram et al.,
- Augmenting the Veracity and Explanations of Complex Fact Checking via
Iterative Self-Revision with LLMs, Zhang et al.,
- DEFAME: Dynamic Evidence-based FAct-checking with Multimodal
Experts, Braun et al.,
- TheoremExplainAgent: Towards Video-based Multimodal Explanations for
LLM Theorem Understanding, Ku et al.,
- Explainable Biomedical Claim Verification with Large Language
Models, Liang et al.,
- Language Agents Mirror Human Causal Reasoning Biases. How Can We Help
Them Think Like Scientists?, GX-Chen et al.,
3.3.4 Theorem Proving
- Generative language modeling for automated theorem proving,
Polu et al.,
- Draft, sketch, and prove: Guiding formal theorem provers with informal
proofs, Jiang et al.,
- Hypertree proof search for neural theorem proving, Lample et
al.,
- Thor: Wielding hammers to integrate language models and automated
theorem provers, Jiang et al.,
- Decomposing the enigma: Subgoal-based demonstration learning for
formal theorem proving, Zhao et al.,
- Dt-solver: Automated theorem proving with dynamic-tree sampling guided
by proof-level value function, Wang et al.,
- Lego-prover: Neural theorem proving with growing libraries,
Wang et al.,
- Baldur: Whole-proof generation and repair with large language
models, First et al.,
- Mustard: Mastering uniform synthesis of theorem and proof
data, Huang et al.,
- A survey on deep learning for theorem proving, Li et al.,
- Towards large language models as copilots for theorem proving in
lean, Song et al.,
- Proving theorems recursively, Wang et al.,
- Deepseek-prover: Advancing theorem proving in llms through large-scale
synthetic data, Xin et al.,
- Lean-star: Learning to interleave thinking and proving, Lin et
al.,
- Data for mathematical copilots: Better ways of presenting proofs for
machine learning, Frieder et al.,
- Deep Active Learning based Experimental Design to Uncover Synergistic
Genetic Interactions for Host Targeted
Therapeutics, Zhu et al.,
- Discovering Symbolic Differential Equations with Symmetry
Invariants, Yang et al.,
3.4 Scientific Experiment Conduction
- Toward machine learning optimization of experimental design,
Baydin et al.,
- AI-assisted design of experiments at the frontiers of computation:
methods and new perspectives, Vischia et al.,
- AI-Driven Automation Can Become the Foundation of Next-Era Science of
Science Research, Chen et al.,
- EXP-Bench: Can AI Conduct AI Research Experiments?, Kon et
al.,
- AI Scientists Fail Without Strong Implementation Capability,
Zhu et al.,
3.4.1 Experiment Design
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- Sciagents: Automating scientific discovery through multi-agent
intelligent graph reasoning, Ghafarollahi et
al.,
- MatPilot: an LLM-enabled AI Materials Scientist under the Framework of
Human-Machine Collaboration, Ni et al.,
- AI-assisted design of experiments at the frontiers of computation:
methods and new perspectives, Vischia et al.,
- LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling
Discovery of New Ionizable Lipid Designs for mRNA
Delivery, Cui et al.,
- Towards an AI co-scientist, Gottweis et al.,
- AI-assisted inverse design of sequence-ordered high intrinsic thermal
conductivity polymers, Huang et al.,
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- Meta-Designing Quantum Experiments with Language Models, Arlt
et al.,
- MatPilot: an LLM-enabled AI Materials Scientist under the Framework of
Human-Machine Collaboration, Ni et al.,
- The application of artificial intelligence-assisted technology in
cultural and creative product design, Liang et
al.,
- A Human-LLM Note-Taking System with Case-Based Reasoning as Framework
for Scientific Discovery, Craig et al.,
- Researchagent: Iterative research idea generation over scientific
literature with large language models, Baek et
al.,
- Biodiscoveryagent: An ai agent for designing genetic perturbation
experiments, Roohani et al.,
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- The virtual lab: AI agents design new SARS-CoV-2 nanobodies with
experimental validation, Swanson et al.,
- Large Language Model Assisted Experiment Design with Generative
Human-Behavior Agents, Liu et al.,
- Agent laboratory: Using llm agents as research assistants,
Schmidgall et al.,
- Carl Technical Report, Institute et al.,
- Zochi Technical Report, AI et al.,
- AgentRxiv: Towards Collaborative Autonomous Research,
Schmidgall et al.,
- Robin: A multi-agent system for automating scientific
discovery, Ghareeb et al.,
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- Sciagents: Automating scientific discovery through multi-agent
intelligent graph reasoning, Ghafarollahi et
al.,
- MatPilot: an LLM-enabled AI Materials Scientist under the Framework of
Human-Machine Collaboration, Ni et al.,
- AI-assisted design of experiments at the frontiers of computation:
methods and new perspectives, Vischia et al.,
- LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling
Discovery of New Ionizable Lipid Designs for mRNA
Delivery, Cui et al.,
- Towards an AI co-scientist, Gottweis et al.,
3.4.2 Pre-Experiment Estimation
Evaluative Prediction- DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response
Evaluation, Wu et al.,
- Physical formula enhanced multi-task learning for pharmacokinetics
prediction, Li et al.,
- MASSW: A new dataset and benchmark tasks for ai-assisted scientific
workflows, Zhang et al.,
- Unimatch: Universal matching from atom to task for few-shot drug
discovery, Li et al.,
- LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling
Discovery of New Ionizable Lipid Designs for mRNA
Delivery, Cui et al.,
- Predicting Empirical AI Research Outcomes with Language
Models, Wen et al.,
- Large language models surpass human experts in predicting neuroscience
results, Luo et al.,
- Automatic chemical design using a data-driven continuous
representation of molecules,
G{\'o}mez-Bombarelli et al.,
- MolGAN: An implicit generative model for small molecular
graphs, De Cao et al.,
- Google DeepMind's AI Dreamed Up 380,000 New Materials. The Next
Challenge Is Making Them, Barber et al.,
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- MASSW: A new dataset and benchmark tasks for ai-assisted scientific
workflows, Zhang et al.,
- The virtual lab: AI agents design new SARS-CoV-2 nanobodies with
experimental validation, Swanson et al.,
- Towards an AI co-scientist, Gottweis et al.,
- FlavorDiffusion: Modeling Food-Chemical Interactions with
Diffusion, Seo et al.,
- MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated
Experimental Feedback, Liu et al.,
3.4.3 Experiment Management
- Transforming science labs into automated factories of
discovery, Angelopoulos et al.,
- Development of an Automated Workflow for Screening the Assembly and
Host--Guest Behavior of Metal-Organic Cages Towards
Accelerated Discovery, Basford et al.,
- AI Driven Experiment Calibration and Control, Britton et al.,
- Agents for self-driving laboratories applied to quantum
computing, Cao et al.,
- Intelligent experiments through real-time AI: Fast Data Processing and
Autonomous Detector Control for sPHENIX and future EIC
detectors, Kvapil et al.,
- Artificial intelligence meets laboratory automation in discovery and
synthesis of metal--organic frameworks: A
review, Zhao et al.,
- Agents for Change: Artificial Intelligent Workflows for Quantitative
Clinical Pharmacology and Translational
Sciences, Shahin et al.,
- Science acceleration and accessibility with self-driving labs,
Canty et al.,
- Accelerating drug discovery with Artificial: a whole-lab orchestration
and scheduling system for self-driving labs,
Fehlis et al.,
- Uncovering Bottlenecks and Optimizing Scientific Lab Workflows with
Cycle Time Reduction Agents, Fehlis et al.,
- Perspective on Utilizing Foundation Models for Laboratory Automation
in Materials Research, Hatakeyama-Sato et al.,
- The future of self-driving laboratories: from human in the loop
interactive AI to gamification, Hysmith et al.,
- Self-driving labs are the new AI asset, {Axios} et al.,
- DeepMind and BioNTech build AI lab assistants for scientific
research, Times} et al.,
- Autonomous platform for solution processing of electronic
polymers, Wang et al.,
- Machine learning-led semi-automated medium optimization reveals salt
as key for flaviolin production in Pseudomonas
putida, Zournas et al.,
- Functional genomic hypothesis generation and experimentation by a
robot scientist, King et al.,
- Self-driving laboratory for accelerated discovery of thin-film
materials, MacLeod et al.,
- Self-driving laboratories for chemistry and materials science,
Tom et al.,
- Autonomous platform for solution processing of electronic
polymers, Wang et al.,
- Self-driving laboratory platform for many-objective self-optimisation
of polymer nanoparticle synthesis with cloud-integrated
machine learning and orthogonal online
analytics, Knox et al.,
- Transforming science labs into automated factories of
discovery, Angelopoulos et al.,
- Development of an Automated Workflow for Screening the Assembly and
Host--Guest Behavior of Metal-Organic Cages Towards
Accelerated Discovery, Basford et al.,
- AI Driven Experiment Calibration and Control, Britton et al.,
- Agents for self-driving laboratories applied to quantum
computing, Cao et al.,
- Intelligent experiments through real-time AI: Fast Data Processing and
Autonomous Detector Control for sPHENIX and future EIC
detectors, Kvapil et al.,
- Artificial intelligence meets laboratory automation in discovery and
synthesis of metal--organic frameworks: A
review, Zhao et al.,
- Agents for Change: Artificial Intelligent Workflows for Quantitative
Clinical Pharmacology and Translational
Sciences, Shahin et al.,
- Science acceleration and accessibility with self-driving labs,
Canty et al.,
- Accelerating drug discovery with Artificial: a whole-lab orchestration
and scheduling system for self-driving labs,
Fehlis et al.,
- Uncovering Bottlenecks and Optimizing Scientific Lab Workflows with
Cycle Time Reduction Agents, Fehlis et al.,
- Perspective on Utilizing Foundation Models for Laboratory Automation
in Materials Research, Hatakeyama-Sato et al.,
3.4.4 Experimental Conduction
Automated Machine Learning Experiment Conduction- AIDE: Human-Level Performance on Data Science Competitions,
Dominik et al.,
- Automl-gpt: Automatic machine learning with gpt, Zhang et al.,
- Automl in the age of large language models: Current challenges, future
opportunities and risks, Tornede et al.,
- Opendevin: An open platform for ai software developers as generalist
agents, Wang et al.,
- Mlr-copilot: Autonomous machine learning research based on large
language models agents, Li et al.,
- Autokaggle: A multi-agent framework for autonomous data science
competitions, Li et al.,
- Large language models orchestrating structured reasoning achieve
kaggle grandmaster level, Grosnit et al.,
- MLRC-Bench: Can Language Agents Solve Machine Learning Research
Challenges?, Zhang et al.,
- AutoReproduce: Automatic AI Experiment Reproduction with Paper
Lineage, Zhao et al.,
- Variable Extraction for Model Recovery in Scientific
Literature, Liu et al.,
- AlphaEvolve: A coding agent for scientific and algorithmic
discovery, Novikov et al.,
- Large language models can self-improve, Huang et al.,
- Mlcopilot: Unleashing the power of large language models in solving
machine learning tasks, Zhang et al.,
- Training socially aligned language models in simulated human
society, Liu et al.,
- Toolllm: Facilitating large language models to master 16000+
real-world apis, Qin et al.,
- An autonomous laboratory for the accelerated synthesis of novel
materials, Szymanski et al.,
- Autonomous chemical research with large language models, Boiko
et al.,
- Reflexion: Language agents with verbal reinforcement learning,
Shinn et al.,
- Toolkengpt: Augmenting frozen language models with massive tools via
tool embeddings, Hao et al.,
- Toolformer: Language models can teach themselves to use tools,
Schick et al.,
- scGPT: toward building a foundation model for single-cell multi-omics
using generative AI, Cui et al.,
- Large language model agent for hyper-parameter optimization,
Liu et al.,
- MechAgents: Large language model multi-agent collaborations can solve
mechanics problems, generate new data, and integrate
knowledge, Ni et al.,
- Researchagent: Iterative research idea generation over scientific
literature with large language models, Baek et
al.,
- Automated social science: Language models as scientist and
subjects, Manning et al.,
- Crispr-gpt: An llm agent for automated design of gene-editing
experiments, Huang et al.,
- Position: LLMs can’t plan, but can help planning in LLM-modulo
frameworks, Kambhampati et al.,
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- Mlr-copilot: Autonomous machine learning research based on large
language models agents, Li et al.,
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- Sciagents: Automating scientific discovery through multi-agent
intelligent graph reasoning, Ghafarollahi et
al.,
- Wrong-of-thought: An integrated reasoning framework with
multi-perspective verification and wrong
information, Zhang et al.,
- Simulating Tabular Datasets through LLMs to Rapidly Explore Hypotheses
about Real-World Entities, Zabaleta et al.,
- An automatic end-to-end chemical synthesis development platform
powered by large language models, Ruan et al.,
- MatPilot: an LLM-enabled AI Materials Scientist under the Framework of
Human-Machine Collaboration, Ni et al.,
- Towards LLM-Driven Multi-Agent Pipeline for Drug Discovery:
Neurodegenerative Diseases Case Study, Solovev
et al.,
- From Individual to Society: A Survey on Social Simulation Driven by
Large Language Model-based Agents, Mou et al.,
- On Evaluating LLMs' Capabilities as Functional Approximators: A
Bayesian Evaluation Framework, Siddiqui et al.,
- PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of
Psychiatric Assessment Conversational Agents,
Lee et al.,
- Dolphin: Closed-loop Open-ended Auto-research through Thinking,
Practice, and Feedback, Yuan et al.,
- DLPO: Towards a Robust, Efficient, and Generalizable Prompt
Optimization Framework from a Deep-Learning
Perspective, Peng et al.,
- Simulating cooperative prosocial behavior with multi-agent LLMs:
Evidence and mechanisms for AI agents to inform policy
decisions, Sreedhar et al.,
- Reinforcing clinical decision support through multi-agent systems and
ethical ai governance, Chen et al.,
- OpenFOAMGPT 2.0: end-to-end, trustworthy automation for computational
fluid dynamics, Feng et al.,
- Researchcodeagent: An llm multi-agent system for automated
codification of research methodologies, Gandhi
et al.,
- The ai scientist-v2: Workshop-level automated scientific discovery via
agentic tree search, Yamada et al.,
- MooseAgent: A LLM Based Multi-agent Framework for Automating Moose
Simulation, Zhang et al.,
- Owl: Optimized workforce learning for general multi-agent assistance
in real-world task automation, Hu et al.,
3.4.5 Experimental Analysis
Automated Evaluation Metrics- Eight years of AutoML: categorisation, review and trends,
Barbudo et al.,
- Efficient bayesian learning curve extrapolation using prior-data
fitted networks, Adriaensen et al.,
- Automated machine learning: past, present and future, Baratchi
et al.,
- Variable Extraction for Model Recovery in Scientific
Literature, Liu et al.,
- AutoReproduce: Automatic AI Experiment Reproduction with Paper
Lineage, Zhao et al.,
- HeLM: Highlighted Evidence augmented Language Model for Enhanced
Table-to-Text Generation, Bian et al.,
- Table meets llm: Can large language models understand structured table
data? a benchmark and empirical study, Sui et
al.,
- Table-LLM-Specialist: Language Model Specialists for Tables using
Iterative Generator-Validator Fine-tuning, Xing
et al.,
- LLM Based Exploratory Data Analysis Using BigQuery Data
Canvas, Chaudhuri et al.,
- Toward machine learning optimization of experimental design,
Baydin et al.,
- AI-assisted design of experiments at the frontiers of computation:
methods and new perspectives, Vischia et al.,
- AI-Driven Automation Can Become the Foundation of Next-Era Science of
Science Research, Chen et al.,
- EXP-Bench: Can AI Conduct AI Research Experiments?, Kon et
al.,
- AI Scientists Fail Without Strong Implementation Capability,
Zhu et al.,
3.5 Full-Automatic Discovery
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- Aviary: training language agents on challenging scientific
tasks, Narayanan et al.,
- Dolphin: Closed-loop Open-ended Auto-research through Thinking,
Practice, and Feedback, Yuan et al.,
- Autonomous Microscopy Experiments through Large Language Model
Agents, Mandal et al.,
- Agent laboratory: Using llm agents as research assistants,
Schmidgall et al.,
- Curie: Toward rigorous and automated scientific experimentation with
ai agents, Kon et al.,
- DORA AI Scientist: Multi-agent Virtual Research Team for Scientific
Exploration Discovery and Automated Report
Generation, Naumov et al.,
- Carl Technical Report, Institute et al.,
- AgentRxiv: Towards Collaborative Autonomous Research,
Schmidgall et al.,
- Zochi Technical Report, AI et al.,
- NovelSeek: When Agent Becomes the Scientist--Building Closed-Loop
System from Hypothesis to Verification, Team et
al.,
- AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open
Co-Scientists, Li et al.,
- VISION: A modular AI assistant for natural human-instrument
interaction at scientific user facilities,
Mathur et al.,
- Scientific discovery in the age of artificial intelligence,
Wang et al.,
- Beyond Benchmarking: Automated Capability Discovery via Model
Self-Exploration, Lu et al.,
- AIRUS: a simple workflow for AI-assisted exploration of scientific
data, Harris et al.,
- On the Rise of New Mathematical Spaces and Towards AI-Driven
Scientific Discovery, Raeini et al.,
- From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule
Learning with Large Language Models, He et al.,
- AI-Driven Discovery: The Transformative Impact of Machine Learning on
Research and Development, Roy et al.,
4. AI for Academic Writing
- Using artificial intelligence in academic writing and research: An
essential productivity tool, Khalifa et al.,
- Human-LLM Coevolution: Evidence from Academic Writing, Geng et
al.,
- Large language models penetration in scholarly writing and peer
review, Zhou et al.,
- And Plato met ChatGPT: an ethical reflection on the use of chatbots in
scientific research writing, with a particular focus on
the social sciences, Calderon et al.,
4.1 Semi-Automatic Academic Writing
4.1.1 Assistance During Manuscript Preparation
Title Formulation and Optimization- Personalized Graph-Based Retrieval for Large Language Models,
Au et al.,
- Generating Accurate and Engaging Research Paper Titles Using NLP
Techniques, Bikku et al.,
- MoDeST: A dataset for Multi Domain Scientific Title
Generation, B{\"o}l{\"u}c{\"u} et al.,
- Can pre-trained language models generate titles for research
papers?, Rehman et al.,
- LalaEval: A Holistic Human Evaluation Framework for Domain-Specific
Large Language Models, Sun et al.,
- LLM-Rubric: A Multidimensional, Calibrated Approach to Automated
Evaluation of Natural Language Texts, Hashemi et
al.,
4.1.2 Assistance During Manuscript Writing
- Enhancing academic writing skills and motivation: assessing the
efficacy of ChatGPT in AI-assisted language learning for
EFL students, Song et al.,
- Human-AI collaboration patterns in AI-assisted academic
writing, Nguyen et al.,
- Patterns and Purposes: A Cross-Journal Analysis of AI Tool Usage in
Academic Writing, Xu et al.,
- Divergent llm adoption and heterogeneous convergence paths in research
writing, Lin et al.,
- Artificial intelligence-assisted academic writing: recommendations for
ethical use, Cheng et al.,
- Text2chart: A multi-staged chart generator from natural language
text, Rashid et al.,
- ChartReader: A unified framework for chart derendering and
comprehension without heuristic rules, Cheng et
al.,
- Figgen: Text to scientific figure generation, Rodriguez et
al.,
- Automatikz: Text-guided synthesis of scientific vector graphics with
tikz, Belouadi et al.,
- Scicapenter: Supporting caption composition for scientific figures
with machine-generated captions and ratings, Hsu
et al.,
- ChartFormer: A large vision language model for converting chart images
into tactile accessible SVGs, Moured et al.,
- Figuring out Figures: Using Textual References to Caption Scientific
Figures, Cao et al.,
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- AiSciVision: A Framework for Specializing Large Multimodal Models in
Scientific Image Classification, Hogan et al.,
- ScImage: How Good Are Multimodal Large Language Models at Scientific
Text-to-Image Generation?, Zhang et al.,
- Chartcoder: Advancing multimodal large language model for
chart-to-code generation, Zhao et al.,
- Understanding How Paper Writers Use AI-Generated Captions in Figure
Caption Writing, Yin et al.,
- Multi-LLM Collaborative Caption Generation in Scientific
Documents, Kim et al.,
- TikZero: Zero-Shot Text-Guided Graphics Program Synthesis,
Belouadi et al.,
- Enhancing Chart-to-Code Generation in Multimodal Large Language Models
via Iterative Dual Preference Learning, Zhang et
al.,
- StarVector: Generating scalable vector graphics code from images and
text, Rodriguez et al.,
- The ai scientist-v2: Workshop-level automated scientific discovery via
agentic tree search, Yamada et al.,
- How to Create Accurate Scientific Illustrations with AI in
2025, Team et al.,
- Towards Semantic Markup of Mathematical Documents via User
Interaction, Vre{\v{c}}ar et al.,
- Automated LaTeX Code Generation from Handwritten Math Expressions
Using Vision Transformer, Sundararaj et al.,
- LATTE: Improving Latex Recognition for Tables and Formulae with
Iterative Refinement, Jiang et al.,
- Chronological citation recommendation with time preference, Ma
et al.,
- When large language models meet citation: A survey, Zhang et
al.,
- Directed Criteria Citation Recommendation and Ranking Through Link
Prediction, Watson et al.,
- ILCiteR: Evidence-grounded Interpretable Local Citation
Recommendation, Roy et al.,
- CiteBART: Learning to Generate Citations for Local Citation
Recommendation, {\c{C}}elik et al.,
- Benchmark for Evaluation and Analysis of Citation Recommendation
Models, Maharjan et al.,
- PaSa: An LLM Agent for Comprehensive Academic Paper Search, He
et al.,
- ScholarCopilot: Training Large Language Models for Academic Writing
with Accurate Citations, Wang et al.,
- How deep do large language models internalize scientific literature
and citation practices?, Algaba et al.,
- SCIRGC: Multi-Granularity Citation Recommendation and Citation
Sentence Preference Alignment, Li et al.,
- Towards AI-assisted Academic Writing, Liebling et al.,
- Enhancing academic writing skills and motivation: assessing the
efficacy of ChatGPT in AI-assisted language learning for
EFL students, Song et al.,
- Human-AI collaboration patterns in AI-assisted academic
writing, Nguyen et al.,
- Patterns and Purposes: A Cross-Journal Analysis of AI Tool Usage in
Academic Writing, Xu et al.,
- Divergent llm adoption and heterogeneous convergence paths in research
writing, Lin et al.,
- Artificial intelligence-assisted academic writing: recommendations for
ethical use, Cheng et al.,
4.1.3 Assistance After Manuscript Completion
Grammar Correction- Csed: A chinese semantic error diagnosis corpus, Sun et al.,
- Neural Automated Writing Evaluation with Corrective Feedback,
Wang et al.,
- LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical
Error Correction, Wang et al.,
- Improving Grammatical Error Correction via Contextual Data
Augmentation, Wang et al.,
- How Paperpal Enhances English Writing Quality and Improves
Productivity for Japanese Academics, George et
al.,
- Transforming hematological research documentation with large language
models: an approach to scientific writing and data
analysis, Yang et al.,
- The usage of a transformer based and artificial intelligence driven
multidimensional feedback system in english writing
instruction, Zheng et al.,
- Learning to split and rephrase from Wikipedia edit history,
Botha et al.,
- WikiAtomicEdits: A multilingual corpus of Wikipedia edits for modeling
language and discourse, Faruqui et al.,
- Diamonds in the rough: Generating fluent sentences from early-stage
drafts for academic writing assistance, Ito et
al.,
- Text editing by command, Faltings et al.,
- Wordcraft: A human-AI collaborative editor for story writing,
Coenen et al.,
- Machine-in-the-loop rewriting for creative image captioning,
Padmakumar et al.,
- Read, revise, repeat: A system demonstration for human-in-the-loop
iterative text revision, Du et al.,
- Coauthor: Designing a human-ai collaborative writing dataset for
exploring language model capabilities, Lee et
al.,
- Sparks: Inspiration for science writing using language models,
Gero et al.,
- Techniques for supercharging academic writing with generative
AI, Lin et al.,
- Overleafcopilot: Empowering academic writing in overleaf with large
language models, Wen et al.,
- Augmenting the author: Exploring the potential of AI collaboration in
academic writing, Tu et al.,
- Step-Back Profiling: Distilling User History for Personalized
Scientific Writing, Tang et al.,
- Closing the Loop: Learning to Generate Writing Feedback via Language
Model Simulated Student Revisions, Nair et al.,
- Enhancing Chinese Essay Discourse Logic Evaluation Through Optimized
Fine-Tuning of Large Language Models, Song et
al.,
- Cocoa: Co-Planning and Co-Execution with AI Agents, Feng et
al.,
- Prototypical Human-AI Collaboration Behaviors from LLM-Assisted
Writing in the Wild, Mysore et al.,
- XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic
Paper Revision, Chen et al.,
- The usage of a transformer based and artificial intelligence driven
multidimensional feedback system in english writing
instruction, Zheng et al.,
- Autonomous LLM-Driven Research—from Data to Human-Verifiable Research
Papers, Ifargan et al.,
4.2 Full-Automatic Academic Writing
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- Agent laboratory: Using llm agents as research assistants,
Schmidgall et al.,
- ScholaWrite: A Dataset of End-to-End Scholarly Writing
Process, Wang et al.,
- Beyond outlining: Heterogeneous recursive planning for adaptive
long-form writing with language models, Xiong et
al.,
- AgentRxiv: Towards Collaborative Autonomous Research,
Schmidgall et al.,
- Zochi Technical Report, AI et al.,
- Carl Technical Report, Institute et al.,
- The ai scientist-v2: Workshop-level automated scientific discovery via
agentic tree search, Yamada et al.,
- Using artificial intelligence in academic writing and research: An
essential productivity tool, Khalifa et al.,
- Human-LLM Coevolution: Evidence from Academic Writing, Geng et
al.,
- Large language models penetration in scholarly writing and peer
review, Zhou et al.,
- And Plato met ChatGPT: an ethical reflection on the use of chatbots in
scientific research writing, with a particular focus on
the social sciences, Calderon et al.,
5. AI for Academic Peer Reviewing
- Can we automate scientific reviewing?, Yuan et al.,
- Reviewergpt? an exploratory study on using large language models for
paper reviewing, Liu et al.,
- Unveiling the sentinels: Assessing ai performance in cybersecurity
peer review, Niu et al.,
- Automated scholarly paper review: Concepts, technologies, and
challenges, Lin et al.,
- What Can Natural Language Processing Do for Peer Review?,
Kuznetsov et al.,
- Artificial intelligence to support publishing and peer review: A
summary and review, Kousha et al.,
- Large language models for automated scholarly paper review: A
survey, Zhuang et al.,
- Evaluating the predictive capacity of ChatGPT for academic peer review
outcomes across multiple platforms, Thelwall et
al.,
- A framework for reviewing the results of automated conversion of
structured organic synthesis procedures from the
literature, Machi et al.,
5.1 Pre-Review
5.1.1 Desk-Review
- How to Make Peer Review Recommendations and Decisions, Society
et al.,
- Helping editors find reviewers, Tedford et al.,
- Snapp: Springer Nature's next-generation peer review system,
Nature et al.,
- Matching papers and reviewers at large conferences,
Leyton-Brown et al.,
- Streamlining the review process: AI-generated annotations in research
manuscripts, D{\'\i}az et al.,
- Artificial intelligence in peer review: enhancing efficiency while
preserving integrity, Doskaliuk et al.,
- Enhancing Academic Decision-Making: A Pilot Study of AI-Supported
Journal Selection in Higher Education, Farber et
al.,
5.1.2 Reviewer Matching
- A framework for optimizing paper matching, Charlin et al.,
- The Toronto paper matching system: an automated paper-reviewer
assignment system, Charlin et al.,
- Pistis: A conflict of interest declaration and detection system for
peer review management, Wu et al.,
- An automated conflict of interest based greedy approach for conference
paper assignment system, Pradhan et al.,
- Matching papers and reviewers at large conferences,
Leyton-Brown et al.,
- Autonomous Machine Learning-Based Peer Reviewer Selection
System, Aitymbetov et al.,
- Automated Research Review Support Using Machine Learning, Large
Language Models, and Natural Language
Processing, Pendyala et al.,
- Peer review expert group recommendation: A multi-subject
coverage-based approach, Fu et al.,
5.2 In-Review
5.2.1 Peer-Review
Score Prediction- ALL-IN-ONE: Multi-Task Learning BERT Models for Evaluating Peer
Assessments., Jia et al.,
- The quality assist: A technology-assisted peer review based on
citation functions to predict the paper quality,
Basuki et al.,
- Exploiting labeled and unlabeled data via transformer fine-tuning for
peer-review score prediction, Muangkammuen et
al.,
- RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper
Relevance, Couto et al.,
- Kid-review: knowledge-guided scientific review generation with oracle
pre-training, Yuan et al.,
- Gpt4 is slightly helpful for peer-review assistance: A pilot
study, Robertson et al.,
- Marg: Multi-agent review generation for scientific papers,
D'Arcy et al.,
- Peer review as a multi-turn and long-context dialogue with role-based
interactions, Tan et al.,
- Agentreview: Exploring peer review dynamics with llm agents,
Jin et al.,
- Can large language models provide useful feedback on research papers?
A large-scale empirical analysis, Liang et al.,
- Automated Focused Feedback Generation for Scientific Writing
Assistance, Chamoun et al.,
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- SEAGraph: Unveiling the Whole Story of Paper Review Comments,
Yu et al.,
- The ai scientist-v2: Workshop-level automated scientific discovery via
agentic tree search, Yamada et al.,
- A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP
Applications, Kang et al.,
- Peerassist: leveraging on paper-review interactions to predict peer
review decisions, Bharti et al.,
- Marg: Multi-agent review generation for scientific papers,
D'Arcy et al.,
- Peer review as a multi-turn and long-context dialogue with role-based
interactions, Tan et al.,
- Automated review generation method based on large language
models, Wu et al.,
- AI-Driven review systems: evaluating LLMs in scalable and bias-aware
academic reviews, Tyser et al.,
- MAMORX: Multi-agent multi-modal scientific review generation with
external knowledge, Taechoyotin et al.,
- Cycleresearcher: Improving automated research via automated
review, Weng et al.,
- OpenReviewer: A Specialized Large Language Model for Generating
Critical Scientific Paper Reviews, Idahl et al.,
- The role of large language models in the peer-review process:
opportunities and challenges for medical journal
reviewers and editors, Lee et al.,
- PiCO: Peer Review in LLMs based on Consistency Optimization,
Ning et al.,
- Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM
Reviews, Shin et al.,
- Revieweval: An evaluation framework for ai-generated reviews,
Kirtani et al.,
- Automatically Evaluating the Paper Reviewing Capability of Large
Language Models, Shin et al.,
- Deepreview: Improving llm-based paper review with human-like deep
thinking process, Zhu et al.,
- Reviewagents: Bridging the gap between human and ai-generated paper
reviews, Gao et al.,
- Reviewing Scientific Papers for Critical Problems With Reasoning LLMs:
Baseline Approaches and Automatic Evaluation,
Zhang et al.,
- REMOR: Automated Peer Review Generation with LLM Reasoning and
Multi-Objective Reinforcement Learning,
Taechoyotin et al.,
- TreeReview: A Dynamic Tree of Questions Framework for Deep and
Efficient LLM-based Scientific Peer Review,
Chang et al.,
- PaperEval: A universal, quantitative, and explainable paper evaluation
method powered by a multi-agent system, Huang et
al.,
5.2.2 Meta-Review
- Summarizing multiple documents with conversational structure for
meta-review generation, Li et al.,
- Meta-review generation with checklist-guided iterative
introspection, Zeng et al.,
- When Reviewers Lock Horn: Finding Disagreement in Scientific Peer
Reviews, Kumar et al.,
- A sentiment consolidation framework for meta-review
generation, Li et al.,
- Prompting LLMs to Compose Meta-Review Drafts from Peer-Review
Narratives of Scholarly Manuscripts, Santu et
al.,
- Towards automated meta-review generation via an NLP/ML pipeline in
different stages of the scholarly peer review
process, Kumar et al.,
- Metawriter: Exploring the potential and perils of ai writing support
in scientific peer review, Sun et al.,
- GLIMPSE: Pragmatically Informative Multi-Document Summarization for
Scholarly Reviews, Darrin et al.,
- PeerArg: Argumentative Peer Review with LLMs, Sukpanichnant et
al.,
- Bridging Social Psychology and LLM Reasoning: Conflict-Aware
Meta-Review Generation via Cognitive Alignment,
Chen et al.,
- LLMs as Meta-Reviewers' Assistants: A Case Study, Hossain et
al.,
5.3 Post-Review
5.3.1 Influence Analysis
- Popular and/or prestigious? Measures of scholarly esteem, Ding
et al.,
- Measuring academic influence: Not all citations are equal, Zhu
et al.,
- An overview of microsoft academic service (mas) and
applications, Sinha et al.,
- Factors affecting number of citations: a comprehensive review of the
literature, Tahamtan et al.,
- Relative citation ratio (RCR): a new metric that uses citation rates
to measure influence at the article level,
Hutchins et al.,
- HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific
Citation Prediction, Hao et al.,
- From Words to Worth: Newborn Article Impact Prediction with
LLM, Zhao et al.,
- Large language models surpass human experts in predicting neuroscience
results, Luo et al.,
5.3.2 Promotion Enhancement
- From complexity to clarity: How AI enhances perceptions of scientists
and the public's understanding of science,
Markowitz et al.,
- Automatic Evaluation Metrics for Artificially Generated Scientific
Research, H{\"o}pner et al.,
- Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with
Iterative Feedback Loop for Improved Scientific
Short-form Generation, Park et al.,
- P2P: Automated Paper-to-Poster Generation and Fine-Grained
Benchmark, Sun et al.,
- Can we automate scientific reviewing?, Yuan et al.,
- Reviewergpt? an exploratory study on using large language models for
paper reviewing, Liu et al.,
- Unveiling the sentinels: Assessing ai performance in cybersecurity
peer review, Niu et al.,
- Automated scholarly paper review: Concepts, technologies, and
challenges, Lin et al.,
- What Can Natural Language Processing Do for Peer Review?,
Kuznetsov et al.,
- Artificial intelligence to support publishing and peer review: A
summary and review, Kousha et al.,
- Large language models for automated scholarly paper review: A
survey, Zhuang et al.,
- Evaluating the predictive capacity of ChatGPT for academic peer review
outcomes across multiple platforms, Thelwall et
al.,
- A framework for reviewing the results of automated conversion of
structured organic synthesis procedures from the
literature, Machi et al.,
6. Application
6.1 AI for Natural Science Research
6.1.1 AI for Physics Research
- Colloquium: Machine learning in nuclear physics, Boehnlein et
al.,
- Toward the end-to-end optimization of particle physics instruments
with differentiable programming, Dorigo et al.,
- AI meets physics: a comprehensive survey, Jiao et al.,
- Artificial intelligence for partial differential equations in
computational mechanics: A review, Wang et al.,
- When physics meets machine learning: A survey of physics-informed
machine learning, Meng et al.,
- Interaction networks for learning about objects, relations and
physics, Battaglia et al.,
- End-to-end differentiable physics for learning and control, de
Avila Belbute-Peres et al.,
- Physics-informed neural networks: A deep learning framework for
solving forward and inverse problems involving nonlinear
partial differential equations, Raissi et al.,
- Hamiltonian neural networks, Greydanus et al.,
- Lagrangian neural networks, Cranmer et al.,
- Physics-informed neural networks and extensions, Raissi et
al.,
- LLM-SR: Scientific Equation Discovery via Programming with Large
Language Models, Shojaee et al.,
- LLM-Feynman: Leveraging Large Language Models for Universal Scientific
Formula and Theory Discovery, Song et al.,
- AI-Newton: A Concept-Driven Physical Law Discovery System without
Prior Physical Knowledge, Fang et al.,
- MLLM-based Discovery of Intrinsic Coordinates and Governing Equations
from High-Dimensional Data, Li et al.,
- LLM-SRBench: A New Benchmark for Scientific Equation Discovery with
Large Language Models, Shojaee et al.,
- DrSR: LLM based Scientific Equation Discovery with Dual Reasoning from
Data and Experience, Wang et al.,
- Colloquium: Machine learning in nuclear physics, Boehnlein et
al.,
- Toward the end-to-end optimization of particle physics instruments
with differentiable programming, Dorigo et al.,
- AI meets physics: a comprehensive survey, Jiao et al.,
- Artificial intelligence for partial differential equations in
computational mechanics: A review, Wang et al.,
- When physics meets machine learning: A survey of physics-informed
machine learning, Meng et al.,
6.1.2 AI for Biology & Medical Research
- Can gpt-4v (ision) serve medical applications? case studies on gpt-4v
for multimodal medical diagnosis, Wu et al.,
- Advancing multimodal medical capabilities of Gemini, Yang et
al.,
- A survey of generative AI for de novo drug design: new frontiers in
molecule and protein generation, Tang et al.,
- Large language models in plant biology, Lam et al.,
- The virtual lab: AI agents design new SARS-CoV-2 nanobodies with
experimental validation, Swanson et al.,
- A Fuzzy Logic-Based Approach to Predict Human Interaction by
Functional Near-Infrared Spectroscopy, Jiang et
al.,
- Human-AI Teaming Using Large Language Models: Boosting Brain-Computer
Interfacing (BCI) and Brain Research, Kapitonova
et al.,
- From large language models to multimodal AI: A scoping review on the
potential of generative AI in medicine, Buess et
al.,
- A survey of llm-based agents in medicine: How far are we from
baymax?, Wang et al.,
- Large language model for knowledge synthesis and AI-enhanced
biomanufacturing, Li et al.,
- Advancing drug discovery and development through GPT models: a review
on challenges, innovations and future prospects,
Othman et al.,
- Large Language Models for Zero-shot Inference of Causal Structures in
Biology, Newsham et al.,
- Transforming hematological research documentation with large language
models: an approach to scientific writing and data
analysis, Yang et al.,
- SpatialAgent: An autonomous AI agent for spatial biology, Wang
et al.,
- A Human-LLM Note-Taking System with Case-Based Reasoning as Framework
for Scientific Discovery, Craig et al.,
- AI-assisted Drug Re-purposing for Human Liver Fibrosis, Guan
et al.,
- Biomni: A General-Purpose Biomedical AI Agent, Huang et al.,
- Autonomous LLM-Driven Research—from Data to Human-Verifiable Research
Papers, Ifargan et al.,
- Improved protein structure prediction using potentials from deep
learning, Senior et al.,
- Highly accurate protein structure prediction with AlphaFold,
Jumper et al.,
- Leveraging biomolecule and natural language through multi-modal
learning: A survey, Pei et al.,
- ProtAgents: protein discovery via large language model multi-agent
collaborations combining physics and machine
learning, Ghafarollahi et al.,
- Accurate structure prediction of biomolecular interactions with
AlphaFold 3, Abramson et al.,
- Automating exploratory proteomics research via language
models, Ding et al.,
- Sparks: Multi-Agent Artificial Intelligence Model Discovers Protein
Design Principles, Ghafarollahi et al.,
- Enhancing Chemical Reaction and Retrosynthesis Prediction with Large
Language Model and Dual-task Learning, Lin et
al.,
- GenePT: a simple but effective foundation model for genes and cells
built from ChatGPT, Chen et al.,
- Biodiscoveryagent: An ai agent for designing genetic perturbation
experiments, Roohani et al.,
- Cellagent: An llm-driven multi-agent framework for automated
single-cell data analysis, Xiao et al.,
- Toward a foundation model of causal cell and tissue biology with a
Perturbation Cell and Tissue Atlas, Rood et al.,
- General-purpose pre-trained large cellular models for single-cell
transcriptomics, Bian et al.,
- ML-GAP: machine learning-enhanced genomic analysis pipeline using
autoencoders and data augmentation, Agraz et
al.,
- LLM4GRN: Discovering Causal Gene Regulatory Networks with
LLMs--Evaluation through Synthetic Data
Generation, Afonja et al.,
- Autonomous Robotic System with Optical Coherence Tomography Guidance
for Vascular Anastomosis, Haworth et al.,
- How to build the virtual cell with artificial intelligence: Priorities
and opportunities, Bunne et al.,
- Efficient Fine-Tuning of Single-Cell Foundation Models Enables
Zero-Shot Molecular Perturbation Prediction,
Maleki et al.,
- NeuroDISK: An AI Approach to Automate Continuous Inquiry-Driven
Discoveries in Neuroimaging Genetics, Garijo et
al.,
- The rise of agentic AI teammates in medicine, Zou et al.,
- Transformers and genome language models, Consens et al.,
- A deep learning approach to antibiotic discovery, Stokes et
al.,
- Artificial intelligence to deep learning: machine intelligence
approach for drug discovery, Gupta et al.,
- HGTDR: Advancing drug repurposing with heterogeneous graph
transformers, Gharizadeh et al.,
- A survey of generative AI for de novo drug design: new frontiers in
molecule and protein generation, Tang et al.,
- A data science roadmap for open science organizations engaged in
early-stage drug discovery, Edfeldt et al.,
- Drugclip: Contrastive drug-disease interaction for drug
repurposing, Lu et al.,
- Current strategies to address data scarcity in artificial
intelligence-based drug discovery: A comprehensive
review, Gangwal et al.,
- A foundation model for clinician-centered drug repurposing,
Huang et al.,
- Drugagent: Automating ai-aided drug discovery programming through llm
multi-agent collaboration, Liu et al.,
- Towards LLM-Driven Multi-Agent Pipeline for Drug Discovery:
Neurodegenerative Diseases Case Study, Solovev
et al.,
- A Deep Subgrouping Framework for Precision Drug Repurposing via
Emulating Clinical Trials on Real-world Patient
Data, Lee et al.,
- Hallucinations Can Improve Large Language Models in Drug
Discovery, Yuan et al.,
- RAG-Enhanced Collaborative LLM Agents for Drug Discovery, Lee
et al.,
- LUMI-lab: a Foundation Model-Driven Autonomous Platform Enabling
Discovery of New Ionizable Lipid Designs for mRNA
Delivery, Cui et al.,
- Advancing drug discovery and development through GPT models: a review
on challenges, innovations and future prospects,
Othman et al.,
- DrugPilot: LLM-based Parameterized Reasoning Agent for Drug
Discovery, Li et al.,
- AI-assisted Drug Re-purposing for Human Liver Fibrosis, Guan
et al.,
- Large language models encode clinical knowledge, Singhal et
al.,
- Can gpt-4v (ision) serve medical applications? case studies on gpt-4v
for multimodal medical diagnosis, Wu et al.,
- Advancing clinical decision support: The role of artificial
intelligence across six domains, Khalifa et al.,
- Ai hospital: Benchmarking large language models in a multi-agent
medical interaction simulator, Fan et al.,
- Agent hospital: A simulacrum of hospital with evolvable medical
agents, Li et al.,
- Autonomous Robotic System with Optical Coherence Tomography Guidance
for Vascular Anastomosis, Haworth et al.,
- Piors: Personalized intelligent outpatient reception based on large
language model with multi-agents medical scenario
simulation, Bao et al.,
- Towards an AI co-scientist, Gottweis et al.,
- Generative Artificial Intelligence in Anatomic Pathology,
Brodsky et al.,
- Clinicalgpt-r1: Pushing reasoning capability of generalist disease
diagnosis with large language model, Lan et al.,
- A Human-LLM Note-Taking System with Case-Based Reasoning as Framework
for Scientific Discovery, Craig et al.,
- PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient
Interactions, Kyung et al.,
- MedSyn: Enhancing Diagnostics with Human-AI Collaboration,
Sayin et al.,
- Can gpt-4v (ision) serve medical applications? case studies on gpt-4v
for multimodal medical diagnosis, Wu et al.,
- Advancing multimodal medical capabilities of Gemini, Yang et
al.,
- A survey of generative AI for de novo drug design: new frontiers in
molecule and protein generation, Tang et al.,
- Large language models in plant biology, Lam et al.,
- The virtual lab: AI agents design new SARS-CoV-2 nanobodies with
experimental validation, Swanson et al.,
- A Fuzzy Logic-Based Approach to Predict Human Interaction by
Functional Near-Infrared Spectroscopy, Jiang et
al.,
- Human-AI Teaming Using Large Language Models: Boosting Brain-Computer
Interfacing (BCI) and Brain Research, Kapitonova
et al.,
- From large language models to multimodal AI: A scoping review on the
potential of generative AI in medicine, Buess et
al.,
- A survey of llm-based agents in medicine: How far are we from
baymax?, Wang et al.,
- Large language model for knowledge synthesis and AI-enhanced
biomanufacturing, Li et al.,
- Advancing drug discovery and development through GPT models: a review
on challenges, innovations and future prospects,
Othman et al.,
- Large Language Models for Zero-shot Inference of Causal Structures in
Biology, Newsham et al.,
- Transforming hematological research documentation with large language
models: an approach to scientific writing and data
analysis, Yang et al.,
- SpatialAgent: An autonomous AI agent for spatial biology, Wang
et al.,
- A Human-LLM Note-Taking System with Case-Based Reasoning as Framework
for Scientific Discovery, Craig et al.,
- AI-assisted Drug Re-purposing for Human Liver Fibrosis, Guan
et al.,
- Biomni: A General-Purpose Biomedical AI Agent, Huang et al.,
- Autonomous LLM-Driven Research—from Data to Human-Verifiable Research
Papers, Ifargan et al.,
6.1.3 AI for Chemistry& Materials Research
- Accelerating materials discovery using artificial intelligence, high
performance computing and robotics, Pyzer-Knapp
et al.,
- Accelerating materials language processing with large language
models, Choi et al.,
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- Nano & AI: A Nobel Partnership, Chen et al.,
- Simulating 500 million years of evolution with a language
model, Hayes et al.,
- AI4Materials: Transforming the Landscape of Materials Science and
Enigneering, Jiang et al.,
- Cross-disciplinary perspectives on the potential for artificial
intelligence across chemistry, Mroz et al.,
- Empowering Generalist Material Intelligence with Large Language
Models, Yuan et al.,
- From Literature to Lab: Hardware-Independent Autonomous Chemical
Synthesis with Reinforcement Learning, Wu et
al.,
- Graph networks as a universal machine learning framework for molecules
and crystals, Chen et al.,
- An autonomous laboratory for the accelerated synthesis of novel
materials, Szymanski et al.,
- Accelerating the Discovery of Abiotic Vesicles with AI-Guided
Automated Experimentation, Ekosso et al.,
- Sequential closed-loop Bayesian optimization as a guide for organic
molecular metallophotocatalyst formulation
discovery, Li et al.,
- High-throughput robotic collection, imaging, and machine learning
analysis of salt patterns: composition and concentration
from dried droplet photos, Batista et al.,
- Adaptive representation of molecules and materials in Bayesian
optimization, Rajabi-Kochi et al.,
- FlavorDiffusion: Modeling Food-Chemical Interactions with
Diffusion, Seo et al.,
- Chatgpt-Assisted Rational Design for Iterative Performance
Optimization of Perovskite Solar Cells, Zhang et
al.,
- Machine learning for molecular and materials science, Butler
et al.,
- Scaling deep learning for materials discovery, Merchant et
al.,
- Experimental discovery of novel ammonia synthesis catalysts via active
learning, Jayarathna et al.,
- A sober look at LLMs for material discovery: Are they actually good
for Bayesian optimization over molecules?,
Kristiadi et al.,
- BatGPT-Chem: A Foundation Large Model For Chemical
Engineering, Yang et al.,
- AI-assisted inverse design of sequence-ordered high intrinsic thermal
conductivity polymers, Huang et al.,
- Real-time experiment-theory closed-loop interaction for autonomous
materials science, Liang et al.,
- Autonomous mobile robots for exploratory synthetic chemistry,
Dai et al.,
- Machine Learning-Aided Inverse Design and Discovery of Novel Polymeric
Materials for Membrane Separation, Dangayach et
al.,
- ORGANA: a robotic assistant for automated chemistry experimentation
and characterization, Darvish et al.,
- Adaptive AI decision interface for autonomous electronic material
discovery, Dai et al.,
- Automated synthesis of oxygen-producing catalysts from Martian
meteorites by a robotic AI chemist, Zhu et al.,
- ChemReasoner: Heuristic search over a large language model's knowledge
space using quantum-chemical feedback, Sprueill
et al.,
- Efficient evolutionary search over chemical space with large language
models, Wang et al.,
- MatPilot: an LLM-enabled AI Materials Scientist under the Framework of
Human-Machine Collaboration, Ni et al.,
- Autonomous Microscopy Experiments through Large Language Model
Agents, Mandal et al.,
- Automated Retrosynthesis Planning of Macromolecules Using Large
Language Models and Knowledge Graphs, Ma et al.,
- A multiagent-driven robotic ai chemist enabling autonomous chemical
research on demand, Song et al.,
- Agentic Assistant for Material Scientists, Feng et al.,
- Physics-informed, dual-objective optimization of high-entropy-alloy
nanozymes by a robotic AI chemist, Luo et al.,
- Intelligent, Personalized Scientific Assistant via Large Language
Models for Solid-State Battery Research, Leng et
al.,
- Prim: Principle-inspired material discovery through multi-agent
collaboration, Lai et al.,
- Accelerating materials discovery using artificial intelligence, high
performance computing and robotics, Pyzer-Knapp
et al.,
- Accelerating materials language processing with large language
models, Choi et al.,
- Augmenting large language models with chemistry tools, M. Bran
et al.,
- Nano & AI: A Nobel Partnership, Chen et al.,
- Simulating 500 million years of evolution with a language
model, Hayes et al.,
- AI4Materials: Transforming the Landscape of Materials Science and
Enigneering, Jiang et al.,
- Cross-disciplinary perspectives on the potential for artificial
intelligence across chemistry, Mroz et al.,
- Empowering Generalist Material Intelligence with Large Language
Models, Yuan et al.,
- From Literature to Lab: Hardware-Independent Autonomous Chemical
Synthesis with Reinforcement Learning, Wu et
al.,
6.2 AI for Applied Science and Engineering Research
6.2.1 AI for Robotics and Control Research
- The AI CUDA engineer: Agentic CUDA kernel discovery, optimization and
composition, Lange et al.,
- Generative Machine Learning in Adaptive Control of Dynamic
Manufacturing Processes: A Review, Lee et al.,
- Towards industry-ready additive manufacturing: AI-enabled closed-loop
control for 3D melt electrowriting, Mieszczanek
et al.,
- Closed-loop transfer enables artificial intelligence to yield chemical
knowledge, Angello et al.,
- Closed-Loop Visuomotor Control with Generative Expectation for Robotic
Manipulation, Bu et al.,
- Real-time experiment-theory closed-loop interaction for autonomous
materials science, Liang et al.,
- AI-Driven Robotics for Free-Space Optics, Uddin et al.,
- End-to-end training of deep visuomotor policies, Levine et
al.,
- Domain randomization for transferring deep neural networks from
simulation to the real world, Tobin et al.,
- Learning hand-eye coordination for robotic grasping with deep learning
and large-scale data collection, Levine et al.,
- Scalable deep reinforcement learning for vision-based robotic
manipulation, Kalashnikov et al.,
- Real-world humanoid locomotion with reinforcement learning,
Radosavovic et al.,
- Improving generalization of robot locomotion policies via
Sharpness-Aware Reinforcement Learning, Bochem
et al.,
- Robustness Evaluation of Offline Reinforcement Learning for Robot
Control Against Action Perturbations, Ayabe et
al.,
- Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual
Servoing of Soft Continuum Arms, Yang et al.,
- Guided by Guardrails: Control Barrier Functions as Safety Instructors
for Robotic Learning, Guerrier et al.,
- Value Iteration for Learning Concurrently Executable Robotic Control
Tasks, Tahmid et al.,
- NovelSeek: When Agent Becomes the Scientist--Building Closed-Loop
System from Hypothesis to Verification, Team et
al.,
- The AI CUDA engineer: Agentic CUDA kernel discovery, optimization and
composition, Lange et al.,
- Generative Machine Learning in Adaptive Control of Dynamic
Manufacturing Processes: A Review, Lee et al.,
6.2.2 AI for Software Engineering
Code Generation- Evaluating large language models trained on code, Chen et al.,
- Codegen: An open large language model for code with multi-turn program
synthesis, Nijkamp et al.,
- Starcoder: may the source be with you!, Li et al.,
- Code llama: Open foundation models for code, Roziere et al.,
- DeepSeek-Coder: When the Large Language Model Meets Programming--The
Rise of Code Intelligence, Guo et al.,
- Starcoder 2 and the stack v2: The next generation, Lozhkov et
al.,
- MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library
Scenarios, Huang et al.,
- Seed-Coder: Let the Code Model Curate Data for Itself, Zhang
et al.,
- Application of large language models to software engineering tasks:
Opportunities, risks, and implications, Ozkaya
et al.,
- Chatdev: Communicative agents for software development, Qian
et al.,
- Large language models for software engineering: Survey and open
problems, Fan et al.,
- Experiential co-learning of software-developing agents, Qian
et al.,
- Repoexec: Evaluate code generation with a repository-level executable
benchmark, Le Hai et al.,
- SWE-bench: Can Language Models Resolve Real-world Github
Issues?, Jimenez et al.,
- Hyperagent: Generalist software engineering agents to solve coding
tasks at scale, Phan et al.,
- Explainable automated debugging via large language model-driven
scientific debugging, Kang et al.,
6.3 AI for Social Science Research
6.3.1 AI for Sociology Research
- Ethnography and Machine Learning: Synergies and New
Directions, Li et al.,
- Machine-assisted quantitizing designs: augmenting humanities and
social sciences with artificial intelligence,
Karjus et al.,
- Agent-Enhanced Large Language Models for Researching Political
Institutions, Loffredo et al.,
- Reimagining urban science: Scaling causal inference with large
language models, Xia et al.,
- Automated social science: Language models as scientist and
subjects, Manning et al.,
- Step Further Towards Automated Social Science: An AI-Powered Interview
Platform, Liu et al.,
- RAISE: A New Method to Develop Experimental Stimuli for Advertising
Research with Image Generative Artificial
Intelligence, Zamudio et al.,
- Cultural evolution in populations of Large Language Models,
Perez et al.,
- Economic Anthropology in the Era of Generative Artificial
Intelligence, Sheldon et al.,
- Malinowski in the Age of AI: Can large language models create a text
game based on an anthropological classic?,
Hoffmann et al.,
- AdaSociety: An Adaptive Environment with Social Structures for
Multi-Agent Decision-Making, Huang et al.,
- ResearchTown: Simulator of Human Research Community, Yu et
al.,
- Simulating cooperative prosocial behavior with multi-agent LLMs:
Evidence and mechanisms for AI agents to inform policy
decisions, Sreedhar et al.,
- Predicting Field Experiments with Large Language Models, Chen
et al.,
- Language Models Surface the Unwritten Code of Science and
Society, Bao et al.,
- Automated social science: Language models as scientist and
subjects, Manning et al.,
- ChatGPT as research scientist: probing GPT’s capabilities as a
research librarian, research ethicist, data generator,
and data predictor, Lehr et al.,
- Predicting Results of Social Science Experiments Using Large Language
Models, Luke et al.,
- Ethnography and Machine Learning: Synergies and New
Directions, Li et al.,
- Machine-assisted quantitizing designs: augmenting humanities and
social sciences with artificial intelligence,
Karjus et al.,
- Agent-Enhanced Large Language Models for Researching Political
Institutions, Loffredo et al.,
- Reimagining urban science: Scaling causal inference with large
language models, Xia et al.,
6.3.2 AI for Psychology Research
- Automating psychological hypothesis generation with AI: when large
language models meet causal graph, Tong et al.,
- Can Large Language Models Understand You Better? An MBTI Personality
Detection Dataset Aligned with Population
Traits, Li et al.,
- Using cognitive psychology to understand GPT-3, Binz et al.,
- Can AI language models replace human participants?, Dillion et
al.,
- The emergence of economic rationality of GPT, Chen et al.,
- AI-experiments in education: An AI-driven randomized controlled trial
for higher education research, Cingillioglu et
al.,
- RAISE: A New Method to Develop Experimental Stimuli for Advertising
Research with Image Generative Artificial
Intelligence, Zamudio et al.,
- Frontiers: Can Large Language Models Capture Human
Preferences?, Goli et al.,
- Testing theory of mind in large language models and humans,
Strachan et al.,
- Do large language models show decision heuristics similar to humans? A
case study using GPT-3.5., Suri et al.,
- Towards a client-centered assessment of llm therapists by client
simulation, Wang et al.,
- Interactive agents: Simulating counselor-client psychological
counseling via role-playing llm-to-llm
interactions, Qiu et al.,
- Can AI Replace Human Subjects? A Large-Scale Replication of
Psychological Experiments with LLMs, Cui et al.,
- MMSD2. 0: Towards a reliable multi-modal sarcasm detection
system, Qin et al.,
- Developing trustworthy artificial intelligence: insights from research
on interpersonal, human-automation, and human-AI
trust, Li et al.,
- From Lived Experience to Insight: Unpacking the Psychological Risks of
Using AI Conversational Agents, Chandra et al.,
- Using cognitive psychology to understand GPT-3, Binz et al.,
- Can AI language models replace human participants?, Dillion et
al.,
- Human-like intuitive behavior and reasoning biases emerged in large
language models but disappeared in ChatGPT,
Hagendorff et al.,
- Large Language Models Can Enable Inductive Thematic Analysis of a
Social Media Corpus in a Single Prompt: Human Validation
Study, Deiner et al.,
- Crafting clarity: Leveraging large language models to decode consumer
reviews, Praveen et al.,
- ChatGPT for Textual Analysis? How to Use Generative LLMs in Accounting
Research, de Kok et al.,
- The use of artificial intelligence in psychotherapy: development of
intelligent therapeutic systems, Spytska et al.,
- Randomized trial of a generative ai chatbot for mental health
treatment, Heinz et al.,
- Large language models as mental health resources: Patterns of use in
the united states, Rousmaniere et al.,
- Large Language Models Pass the Turing Test, Jones et al.,
- Experiential Narratives in Marketing: A Comparison of Generative AI
and Human Content, Wen et al.,
- Automating psychological hypothesis generation with AI: when large
language models meet causal graph, Tong et al.,
- Can Large Language Models Understand You Better? An MBTI Personality
Detection Dataset Aligned with Population
Traits, Li et al.,
7. Future and Frontiers
7.1 Interdisciplinary AI Models
- Artificial intelligence in cancer research: learning at different
levels of data granularity, Cirillo et al.,
- Generating full length wikipedia biographies: The impact of gender
bias on the retrieval-based generation of women
biographies, Fan et al.,
- Contrastive knowledge integrated graph neural networks for Chinese
medical text classification, Lan et al.,
- Heterogeneous federated learning: State-of-the-art and research
challenges, Ye et al.,
- A comprehensive survey of cross-domain policy transfer for embodied
agents, Niu et al.,
- Generation and human-expert evaluation of interesting research ideas
using knowledge graphs and large language
models, Gu et al.,
- A survey of trustworthy representation learning across
domains, Zhu et al.,
- BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for
Biomedical Science, Lin et al.,
- Knowledge transfer for cross-domain reinforcement learning: a
systematic review, Serrano et al.,
- Accelerating scientific discovery with generative knowledge
extraction, graph-based representation, and multimodal
intelligent graph reasoning, Buehler et al.,
- Heterogeneous data integration: Challenges and opportunities,
Putrama et al.,
- A comprehensive survey of foundation models in medicine, Khan
et al.,
- Foundation models and intelligent decision-making: Progress,
challenges, and perspectives, Huang et al.,
7.2 Ethics and Safety in AI4Research
- Causal learning for socially responsible AI, Cheng et al.,
- Artificial intelligence and ethics: a comprehensive review of bias
mitigation, transparency, and accountability in AI
Systems, Mensah et al.,
- Fairness and bias in artificial intelligence: A brief survey of
sources, impacts, and mitigation strategies,
Ferrara et al.,
- AXOLOTL: fairness through assisted self-debiasing of large language
model outputs, Ebrahimi et al.,
- Policy advice and best practices on bias and fairness in AI,
Alvarez et al.,
- Automated Peer-Reviewer Assignment can be Manipulated to Secure
Reviews from Colluders, Hsieh et al.,
- Mitigating bias in artificial intelligence: Fair data generation via
causal models for transparent and explainable
decision-making, Gonz{\'a}lez-Sendino et al.,
- Enhancing peer review efficiency: A mixed-methods analysis of
artificial intelligence-assisted reviewer selection
across academic disciplines, Farber et al.,
- Beyond principlism: practical strategies for ethical AI use in
research practices, Lin et al.,
- SciTrust: Evaluating the Trustworthiness of Large Language Models for
Science, Herron et al.,
- Are we there yet? revealing the risks of utilizing large language
models in scholarly peer review, Ye et al.,
- Vulnerability of Text-Matching in ML/AI Conference Reviewer
Assignments to Collusions, Raghunathan et al.,
- How human--AI feedback loops alter human perceptual, emotional and
social judgements, Glickman et al.,
- The hidden dimensions of llm alignment: A multi-dimensional safety
analysis, Pan et al.,
- Responsible AI in biotechnology: balancing discovery, innovation and
biosecurity risks, Wheeler et al.,
- All that glitters is not novel: Plagiarism in ai generated
research, Gupta et al.,
- Detecting llm-written peer reviews, Rao et al.,
- Ethical and bias considerations in artificial intelligence/machine
learning, Hanna et al.,
- Automation Bias in AI-assisted Medical Decision-making under Time
Pressure in Computational Pathology, Rosbach et
al.,
- Considering the Ethics of Large Machine Learning Models in the
Chemical Sciences, Spotte-Smith et al.,
- Generative artificial intelligence for academic research: evidence
from guidance issued for researchers by higher education
institutions in the United States, Ganguly et
al.,
- Artificial intelligence and dichotomania, McShane et al.,
- The Plagiarism Singularity Conjecture, Ranga et al.,
- Toward Reliable Biomedical Hypothesis Generation: Evaluating
Truthfulness and Hallucination in Large Language
Models, Xiong et al.,
- BiasFilter: An Inference-Time Debiasing Framework for Large Language
Models, Cheng et al.,
- SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM
Agents, Zhu et al.,
- OpenReview Should be Protected and Leveraged as a Community Asset for
Research in the Era of Large Language Models,
Sun et al.,
7.3 AI for Collaborative Research
- A hybrid approach to privacy-preserving federated learning,
Truex et al.,
- A review of applications in federated learning, Li et al.,
- A survey on federated learning, Zhang et al.,
- A systematic review of federated learning: Challenges, aggregation
methods, and development tools, Guendouzi et
al.,
- Federated learning and data privacy: A review of challenges and
opportunities, Myakala et al.,
- Designing collaborative intelligence systems for employee-AI service
co-production, Blaurock et al.,
- Collaborative Intelligence: A scoping review of current
applications, Schleiger et al.,
- Deconstructing Human-AI Collaboration: Agency, Interaction, and
Adaptation, Holter et al.,
- The ai scientist: Towards fully automated open-ended scientific
discovery, Lu et al.,
- Human-AI collaboration is not very collaborative yet: A taxonomy of
interaction patterns in AI-assisted decision making from
a systematic review, Gomez et al.,
- Text2world: Benchmarking large language models for symbolic world
model generation, Hu et al.,
- Distributed cross-learning for equitable federated
models-privacy-preserving prediction on data from five
California hospitals, Kuo et al.,
- Multi-agent risks from advanced ai, Hammond et al.,
- Simulating cooperative prosocial behavior with multi-agent LLMs:
Evidence and mechanisms for AI agents to inform policy
decisions, Sreedhar et al.,
- Accelerating drug discovery with Artificial: a whole-lab orchestration
and scheduling system for self-driving labs,
Fehlis et al.,
- 34 Examples of LLM Applications in Materials Science and Chemistry:
Towards Automation, Assistants, Agents, and Accelerated
Scientific Discovery, Zimmermann et al.,
- DrugPilot: LLM-based Parameterized Reasoning Agent for Drug
Discovery, Li et al.,
- The role of agentic ai in shaping a smart future: A systematic
review, Hosseini et al.,
7.4 Explainability and Transparency of AI4Research
- On gradient-like explanation under a black-box setting: when black-box
explanations become as good as white-box, Cai et
al.,
- Explainable and interpretable artificial intelligence in medicine: a
systematic bibliometric review, Frasca et al.,
- Towards uncovering how large language model works: An explainability
perspective, Zhao et al.,
- Mechanistic Interpretability for AI Safety--A Review, Bereska
et al.,
- A practical review of mechanistic interpretability for
transformer-based language models, Rai et al.,
- Interpreting black-box models: a review on explainable artificial
intelligence, Hassija et al.,
- Unlocking the capabilities of thought: A reasoning boundary framework
to quantify and optimize chain-of-thought, Chen
et al.,
- Explainable AI reloaded: Challenging the xai status quo in the era of
large language models, Ehsan et al.,
- Beyond principlism: practical strategies for ethical AI use in
research practices, Lin et al.,
- ECM: A Unified Electronic Circuit Model for Explaining the Emergence
of In-Context Learning and Chain-of-Thought in Large
Language Model, Chen et al.,
- RBF++: Quantifying and Optimizing Reasoning Boundaries across
Measurable and Unmeasurable Capabilities for
Chain-of-Thought Reasoning, Chen et al.,
7.5 AI for Dynamic and Real‑Time Optimized Scientific Experimentation
- Tree-planner: Efficient close-loop task planning with large language
models, Hu et al.,
- Review of low-cost self-driving laboratories in chemistry and
materials science: the “frugal twin” concept, Lo
et al.,
- Self-driving laboratories for chemistry and materials science,
Tom et al.,
- Hiagent: Hierarchical working memory management for solving
long-horizon agent tasks with large language
model, Hu et al.,
- Real-time experiment-theory closed-loop interaction for autonomous
materials science, Liang et al.,
- AutoSciLab: A Self-Driving Laboratory For Interpretable Scientific
Discovery, Desai et al.,
- Adaptive AI decision interface for autonomous electronic material
discovery, Dai et al.,
- Science acceleration and accessibility with self-driving labs,
Canty et al.,
7.6 Multimodal Integration in AI4Research
- Look, read and enrich-learning from scientific figures and their
captions, Gomez-Perez et al.,
- Uniter: Universal image-text representation learning, Chen et
al.,
- T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large
Language Model Signals for Science Question
Answering, Wang et al.,
- Figcaps-hf: A figure-to-caption generative framework and benchmark
with human feedback, Singh et al.,
- M 3 CoT: A Novel Benchmark for Multi-Domain Multi-step
Multi-modal Chain-of-Thought, Chen et al.,
- Every Part Matters: Integrity Verification of Scientific Figures Based
on Multimodal Large Language Models, Shi et al.,
- S3 agent: Unlocking the power of VLLM for zero-shot multi-modal
sarcasm detection, Wang et al.,
- Vlm4bio: A benchmark dataset to evaluate pretrained vision-language
models for trait discovery from biological
images, Maruf et al.,
- Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation
with LLMs, Wang et al.,
- What factors affect multi-modal in-context learning? an in-depth
exploration, Qin et al.,
- Bigdocs: An open dataset for training multimodal models on document
and code tasks, Rodriguez et al.,
- InterFeedback: Unveiling Interactive Intelligence of Large Multimodal
Models via Human Feedback, Zhao et al.,
- MERMaid: Universal multimodal mining of chemical reactions from PDFs
using vision-language models, Leong et al.,
- Comt: A novel benchmark for chain of multi-modal thought on large
vision-language models, Cheng et al.,
- Visual Thoughts: A Unified Perspective of Understanding Multimodal
Chain-of-Thought, Cheng et al.,
- HiPerRAG: High-Performance Retrieval Augmented Generation for
Scientific Insights, Gokdemir et al.,
7.7 Multilingual Integration in AI4Research
- Languages are still a major barrier to global science, Amano
et al.,
- Unsupervised cross-lingual representation learning at scale,
Conneau et al.,
- SimAlign: High quality word alignments without parallel training data
using static and contextualized embeddings,
Sabet et al.,
- Ten tips for overcoming language barriers in science, Amano et
al.,
- Improving low-resource languages in pre-trained multilingual language
models, Hangya et al.,
- Hit-scir at mmnlu-22: Consistency regularization for multilingual
spoken language understanding, Zheng et al.,
- Crosslingual capabilities and knowledge barriers in multilingual large
language models, Chua et al.,
- AutoCAP: Towards automatic cross-lingual alignment planning for
zero-shot chain-of-thought, Zhang et al.,
- Rule-based, neural and LLM back-translation: Comparative insights from
a variant of Ladin, Frontull et al.,
- A survey of multilingual large language models, Qin et al.,
- A smack of all neighbouring languages: How multilingual is scholarly
communication?, Pradier et al.,
- X-WebAgentBench: A Multilingual Interactive Web Benchmark for
Evaluating Global Agentic System, Wang et al.,
8. Related Materials
- AI-powered platform for scientific discovery, Trifonov et al.,
- Hypothesis generation with large language models, Zhou et al.,
- Artificial intelligence and scientific discovery: A model of
prioritized search, Agrawal et al.,
- A comprehensive survey of scientific large language models and their
applications in scientific discovery, Zhang et
al.,
- Artificial intelligence for literature reviews: Opportunities and
challenges, Bolanos et al.,
- Creativity in AI: Progresses and Challenges, Ismayilzada et
al.,
- LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and
Perceptions, Liao et al.,
- Towards scientific discovery with generative ai: Progress,
opportunities, and challenges, Reddy et al.,
- LLM4SR: A Survey on Large Language Models for Scientific
Research, Luo et al.,
- Large language models for automated scholarly paper review: A
survey, Zhuang et al.,
- Large Physics Models: Towards a collaborative approach with Large
Language Models and Foundation Models, Barman et
al.,
- Transforming Science with Large Language Models: A Survey on
AI-assisted Scientific Discovery, Experimentation,
Content Generation, and Evaluation, Eger et al.,
- Unlocking the Potential of AI Researchers in Scientific Discovery:
What Is Missing?, Yu et al.,
- A review of llm-assisted ideation, Li et al.,
- Towards scientific intelligence: A survey of llm-based scientific
agents, Ren et al.,
- Agentichypothesis: A survey on hypothesis generation using llm
systems, Bazgir et al.,
- Agentic ai for scientific discovery: A survey of progress, challenges,
and future directions, Gridach et al.,
- A Survey on Hypothesis Generation for Scientific Discovery in the Era
of Large Language Models, Alkan et al.,
- Advancing the Scientific Method with Large Language Models: From
Hypothesis to Discovery, Zhang et al.,
- Scientific hypothesis generation and validation: Methods, datasets,
and future directions, Kulkarni et al.,
- AI-Driven Automation Can Become the Foundation of Next-Era Science of
Science Research, Chen et al.,
- Towards Agentic AI for Science: Hypothesis Generation, Comprehension,
Quantification, and Validation, Huang et al.,
- Position: The AI Conference Peer Review Crisis Demands Author Feedback
and Reviewer Rewards, Kim et al.,
- From Automation to Autonomy: A Survey on Large Language Models in
Scientific Discovery, Zheng et al.,
- AI Scientists Fail Without Strong Implementation Capability,
Zhu et al.,
9. Resources
9.1 AI for Scientific Comprehension
9.1.1 Textual Scientific Comprehension
- Pubmedqa: A dataset for biomedical research question
answering, Jin et al.,
- Medmcqa: A large-scale multi-subject multi-choice dataset for medical
domain question answering, Pal et al.,
- CoQUAD: a COVID-19 question answering dataset system, facilitating
research, benchmarking, and practice, Raza et
al.,
- Scienceqa: A novel resource for question answering on scholarly
articles, Saikh et al.,
- Clam: Selective clarification for ambiguous questions with generative
language models, Kuhn et al.,
- BioASQ-QA: A manually curated corpus for Biomedical Question
Answering, Krithara et al.,
- The sciqa scientific question answering benchmark for scholarly
knowledge, Auer et al.,
- Theoremqa: A theorem-driven question answering dataset, Chen
et al.,
- Scibench: Evaluating college-level scientific problem-solving
abilities of large language models, Wang et al.,
- What if: Generating code to answer simulation questions in chemistry
texts, Peretz et al.,
- Enabling Language Models to Implicitly Learn Self-Improvement,
Wang et al.,
- Paperqa: Retrieval-augmented generative agent for scientific
research, L{\'a}la et al.,
- Sciglm: Training scientific language models with self-reflective
instruction annotation and tuning, Zhang et al.,
- Generating Multiple Choice Questions from Scientific Literature via
Large Language Models, Luo et al.,
- Biomedlm: A 2.7 b parameter language model trained on biomedical
text, Bolton et al.,
- SciQAG: A Framework for Auto-Generated Science Question Answering
Dataset with Fine-grained Evaluation, Wan et
al.,
- M 3 CoT: A Novel Benchmark for Multi-Domain Multi-step
Multi-modal Chain-of-Thought, Chen et al.,
- Scifibench: Benchmarking large multimodal models for scientific figure
interpretation, Roberts et al.,
- Sciknoweval: Evaluating multi-level scientific knowledge of large
language models, Feng et al.,
- BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for
Biomedical Science, Lin et al.,
- Scholarchemqa: Unveiling the power of language models in chemical
research question answering, Chen et al.,
- Mmsci: A dataset for graduate-level multi-discipline multimodal
scientific understanding, Li et al.,
- SPIQA: A Dataset for Multimodal Question Answering on Scientific
Papers, Pramanick et al.,
- Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
Large Vision-Language Models, Li et al.,
- SceMQA: A Scientific College Entrance Level Multimodal Question
Answering Benchmark, Liang et al.,
- Language agents achieve superhuman synthesis of scientific
knowledge, Skarlinski et al.,
- Fine-Tuning Large Language Models for Scientific Text Classification:
A Comparative Study, Rostam et al.,
- Graphusion: a RAG framework for Knowledge Graph Construction with a
global perspective, Yang et al.,
- M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for
Evaluating Foundation Models, Li et al.,
- SciDQA: A Deep Reading Comprehension Dataset over Scientific
Papers, Singh et al.,
- SciAgent: Tool-augmented Language Models for Scientific
Reasoning, Ma et al.,
- SciRIFF: A Resource to Enhance Language Model Instruction-Following
over Scientific Literature, Wadden et al.,
- PaSa: An LLM Agent for Comprehensive Academic Paper Search, He
et al.,
- BioMaze: Benchmarking and Enhancing Large Language Models for
Biological Pathway Reasoning, Zhao et al.,
- AutoPaperBench: An MLLM-Based Framework for Automatic Generation of
Paper Understanding Evaluation Benchmarks, Kim
et al.,
- FRAME: Feedback-Refined Agent Methodology for Enhancing Medical
Research Insights, Yu et al.,
- SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context
Understanding in Large Language Models, Yu et
al.,
- EarthSE: A Benchmark Evaluating Earth Scientific Exploration
Capability for Large Language Models, Xu et al.,
- Scaling Physical Reasoning with the PHYSICS Dataset, Zheng et
al.,
9.1.2 Table & Chart Scientific Comprehension
- ChartQA: A Benchmark for Question Answering about Charts with Visual
and Logical Reasoning, Masry et al.,
- Chartx & chartvlm: A versatile benchmark and foundation model for
complicated chart reasoning, Xia et al.,
- Table Meets LLM: Can Large Language Models Understand Structured Table
Data? A Benchmark and Empirical Study, Sui et
al.,
- NovaChart: A Large-scale Dataset towards Chart Understanding and
Generation of Multimodal Large Language Models,
Hu et al.,
- CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal
LLMs, Wang et al.,
- The Mighty ToRR: A Benchmark for Table Reasoning and
Robustness, Ashury-Tahan et al.,
- Tablebench: A comprehensive and complex benchmark for table question
answering, Wu et al.,
9.2 AI for Academic Survey
- Ms2: Multi-document summarization of medical studies, DeYoung
et al.,
- Generating (factual?) narrative summaries of rcts: Experiments with
neural multi-document summarization, Wallace et
al.,
- Overview of MSLR2022: A shared task on multi-document summarization
for literature reviews, Wang et al.,
- Generating a structured summary of numerous academic papers: Dataset
and method, Liu et al.,
- SciReviewGen: a large-scale dataset for automatic literature review
generation, Kasanishi et al.,
- SurveySum: A Dataset for Summarizing Multiple Scientific Articles into
a Survey Section, Fernandes et al.,
- OAG-Bench: A Human-Curated Benchmark for Academic Graph
Mining, Zhang et al.,
- OARelatedWork: A Large-Scale Dataset of Related Work Sections with
Full-texts from Open Access Sources, Docekal et
al.,
- Autosurvey: Large language models can automatically write
surveys, Wang et al.,
- SurveyX: Academic Survey Automation via Large Language Models,
Liang et al.,
- SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and
Multi-dimensional Evaluation for Automated Survey
Writing, Yan et al.,
- Browsecomp: A simple yet challenging benchmark for browsing
agents, Wei et al.,
- LLM times MapReduce-V2: Entropy-Driven Convolutional
Test-Time Scaling for Generating Long-Form Articles from
Extremely Long Resources, Wang et al.,
- AcademicBrowse: Benchmarking Academic Browse Ability of LLMs,
Zhou et al.,
9.3 AI for Scientific Discovery
Idea Mining- OAG-Bench: A Human-Curated Benchmark for Academic Graph
Mining, Zhang et al.,
- Can Large Language Models Unlock Novel Scientific Research
Ideas?, Kumar et al.,
- LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea
Generation with Minimal Context, Ruan et al.,
- Large Language Models for Rediscovering Unseen Chemistry Scientific
Hypotheses, Yang et al.,
- Learning to Generate Research Idea with Dynamic Control, Li et
al.,
- Structuring Scientific Innovation: A Framework for Modeling and
Discovering Impactful Knowledge Combinations,
Chen et al.,
- ResearchBench: Benchmarking LLMs in Scientific Discovery via
Inspiration-Based Task Decomposition, Liu et
al.,
- Ai idea bench 2025: Ai research idea generation benchmark, Qiu
et al.,
- Sparks of science: Hypothesis generation using structured paper
data, O'Neill et al.,
- Spark: A System for Scientifically Creative Idea Generation,
Sanyal et al.,
- Improving Research Idea Generation Through Data: An Empirical
Investigation in Social Science, Liu et al.,
- CHIMERA: A Knowledge Base of Idea Recombination in Scientific
Literature, Sternlicht et al.,
- Blade: Benchmarking language model agents for data-driven
science, Gu et al.,
- Empowering AI as Autonomous Researchers: Evaluating LLMs in Generating
Novel Research Ideas through Automated Metrics,
Dasgupta et al.,
- LLMs Tackle Meta-Analysis: Automating Scientific Hypothesis Generation
with Statistical Rigor, Lin et al.,
- A Hierarchical Framework for Measuring Scientific Paper Innovation via
Large Language Models, Tan et al.,
- Hypobench: Towards systematic and principled benchmarking for
hypothesis generation, Liu et al.,
- Evaluating and Enhancing Large Language Models for Novelty Assessment
in Scholarly Publications, Lin et al.,
- Harnessing Large Language Models for Scientific Novelty
Detection, Liu et al.,
- Minif2f: a cross-system benchmark for formal olympiad-level
mathematics, Zheng et al.,
- FactKG: Fact verification via reasoning on knowledge graphs,
Kim et al.,
- Investigating zero-and few-shot generalization in fact
verification, Pan et al.,
- Fimo: A challenge formal dataset for automated theorem
proving, Liu et al.,
- Can Large Language Models Detect Misinformation in Scientific News
Reporting?, Cao et al.,
- Mustard: Mastering uniform synthesis of theorem and proof
data, Huang et al.,
- MAGIC: Multi-Argument Generation with Self-Refinement for Domain
Generalization in Automatic Fact-Checking, Kao
et al.,
- Zero-shot scientific claim verification using LLMs and citation
text, Alvarez et al.,
- Grounding fallacies misrepresenting scientific publications in
evidence, Glockner et al.,
- Augmenting the Veracity and Explanations of Complex Fact Checking via
Iterative Self-Revision with LLMs, Zhang et al.,
- DEFAME: Dynamic Evidence-based FAct-checking with Multimodal
Experts, Braun et al.,
- TheoremExplainAgent: Towards Video-based Multimodal Explanations for
LLM Theorem Understanding, Ku et al.,
- BioDSA-1K: Benchmarking Data Science Agents for Biomedical
Research, Wang et al.,
- Benchmarking compound activity prediction for real-world drug
discovery applications, Tian et al.,
- A bioactivity foundation model using pairwise meta-learning,
Feng et al.,
- BioProBench: Comprehensive Dataset and Benchmark in Biological
Protocol Understanding and Reasoning, Liu et
al.,
- LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with
Physician Validation, Zhang et al.,
- Mlagentbench: Evaluating language agents on machine learning
experimentation, Huang et al.,
- Infiagent-dabench: Evaluating agents on data analysis tasks,
Hu et al.,
- DSBench: How Far Are Data Science Agents to Becoming Data Science
Experts?, Jing et al.,
- Mle-bench: Evaluating machine learning agents on machine learning
engineering, Chan et al.,
- Mlgym: A new framework and benchmark for advancing ai research
agents, Nathani et al.,
- MLRC-Bench: Can Language Agents Solve Machine Learning Research
Challenges?, Zhang et al.,
- Scireplicate-bench: Benchmarking llms in agent-driven algorithmic
reproduction from research papers, Xiang et al.,
- Can AI Agents Design and Implement Drug Discovery Pipelines?,
Smbatyan et al.,
- EXP-Bench: Can AI Conduct AI Research Experiments?, Kon et
al.,
- Scienceboard: Evaluating multimodal autonomous agents in realistic
scientific workflows, Sun et al.,
- AutoReproduce: Automatic AI Experiment Reproduction with Paper
Lineage, Zhao et al.,
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning
Research, Chen et al.,
- Autobio: A simulation and benchmark for robotic automation in digital
biology laboratory, Lan et al.,
- ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine
Learning Research Code, Hua et al.,
- Microvqa: A multimodal reasoning benchmark for microscopy-based
scientific research, Burgess et al.,
- Ds-agent: Automated data science by empowering large language models
with case-based reasoning, Guo et al.,
- Discoverybench: Towards data-driven discovery with large language
models, Majumder et al.,
- Blade: Benchmarking language model agents for data-driven
science, Gu et al.,
- Scienceagentbench: Toward rigorous assessment of language agents for
data-driven scientific discovery, Chen et al.,
- DISCOVERYWORLD: A virtual environment for developing and evaluating
automated scientific discovery agents, Jansen et
al.,
- Curie: Toward rigorous and automated scientific experimentation with
ai agents, Kon et al.,
- A vision for auto research with llm agents, Liu et al.,
- Can AI Agents Design and Implement Drug Discovery Pipelines?,
Smbatyan et al.,
- Llm-srbench: A new benchmark for scientific equation discovery with
large language models, Shojaee et al.,
- Towards llm agents for earth observation, Kao et al.,
- Benchmarking AI scientists in omics data-driven biological
research, Luo et al.,
- ResearchBench: Benchmarking LLMs in Scientific Discovery via
Inspiration-Based Task Decomposition, Liu et
al.,
9.4 AI for Academic Writing
9.4.1 Semi-Automatic Academic Writing
Assistance During Manuscript Preparation.- LLM-Rubric: A Multidimensional, Calibrated Approach to Automated
Evaluation of Natural Language Texts, Hashemi et
al.,
- MoDeST: A dataset for Multi Domain Scientific Title
Generation, Bölücü et al.,
- CiteWorth: Cite-Worthiness Detection for Improved Scientific Document
Understanding, Wright et al.,
- Figgen: Text to scientific figure generation, Rodriguez et
al.,
- Scicapenter: Supporting caption composition for scientific figures
with machine-generated captions and ratings, Hsu
et al.,
- Figuring out Figures: Using Textual References to Caption Scientific
Figures, Cao et al.,
- CiteBART: Learning to Generate Citations for Local Citation
Recommendation, {\c{C}}elik et al.,
- TikZero: Zero-Shot Text-Guided Graphics Program Synthesis,
Belouadi et al.,
- Futuregen: Llm-rag approach to generate the future work of scientific
article, Azher et al.,
- ScholarCopilot: Training Large Language Models for Academic Writing
with Accurate Citations, Wang et al.,
- XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic
Paper Revision, Chen et al.,
- WikiAtomicEdits: A multilingual corpus of Wikipedia edits for modeling
language and discourse, Faruqui et al.,
- Learning to split and rephrase from Wikipedia edit history,
Botha et al.,
- Diamonds in the rough: Generating fluent sentences from early-stage
drafts for academic writing assistance, Ito et
al.,
- Neural Automated Writing Evaluation with Corrective Feedback,
Wang et al.,
- AAAR-1.0: Assessing AI's Potential to Assist Research, Lou et
al.,
- Paper2Poster: Towards Multimodal Poster Automation from Scientific
Papers, Pang et al.,
- The usage of a transformer based and artificial intelligence driven
multidimensional feedback system in english writing
instruction, Zheng et al.,
9.5 AI for Academic Peer Reviewing
- A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP
Applications, Kang et al.,
- Citetracked: A longitudinal dataset of peer reviews and
citations, Plank et al.,
- COMPARE: a taxonomy and dataset of comparison discussions in peer
reviews, Singh et al.,
- Peer review analyze: A novel benchmark resource for computational
analysis of peer reviews, Ghosal et al.,
- Reviewergpt? an exploratory study on using large language models for
paper reviewing, Liu et al.,
- NLPeer: A Unified Resource for the Computational Study of Peer
Review, Dycke et al.,
- Moprd: A multidisciplinary open peer review dataset, Lin et
al.,
- The Open Review-Based (ORB) dataset: Towards Automatic Assessment of
Scientific Papers and Experiment Proposals in
High-Energy Physics, Szumega et al.,
- Pre: A peer review based large language model evaluator, Chu
et al.,
- Is LLM a reliable reviewer? A comprehensive evaluation of LLM on
automatic paper reviewing tasks, Zhou et al.,
- PolitePEER: does peer review hurt? A dataset to gauge politeness
intensity in the peer reviews, Bharti et al.,
- RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper
Relevance, Couto et al.,
- Peer review as a multi-turn and long-context dialogue with role-based
interactions, Tan et al.,
- MASSW: A new dataset and benchmark tasks for ai-assisted scientific
workflows, Zhang et al.,
- Scientific opinion summarization: Paper meta-review generation
dataset, methods, and evaluation, Zeng et al.,
- Can large language models provide useful feedback on research papers?
A large-scale empirical analysis, Liang et al.,
- An Analysis of Tasks and Datasets in Peer Reviewing,
Staudinger et al.,
- PeerArg: Argumentative Peer Review with LLMs, Sukpanichnant et
al.,
- Enhancing peer review efficiency: A mixed-methods analysis of
artificial intelligence-assisted reviewer selection
across academic disciplines, Farber et al.,
- Automatic Large Language Model Evaluation via Peer Review, Chu
et al.,
- AAAR-1.0: Assessing AI's Potential to Assist Research, Lou et
al.,
- Is your paper being reviewed by an llm? investigating ai text
detectability in peer review, Yu et al.,
- WithdrarXiv: A Large-Scale Dataset for Retraction Study, Rao
et al.,
- OpenReviewer: A Specialized Large Language Model for Generating
Critical Scientific Paper Reviews, Idahl et al.,
- Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM
Reviews, Shin et al.,
- PeerQA: A Scientific Question Answering Dataset from Peer
Reviews, Baumg{\"a}rtner et al.,
- Revieweval: An evaluation framework for ai-generated reviews,
Kirtani et al.,
- LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer
Reviews, Purkayastha et al.,
- When AI co-scientists fail: SPOT-a benchmark for automated
verification of scientific research, Son et al.,
- Re 2: A Consistency-ensured Dataset for Full-stage Peer
Review and Multi-turn Rebuttal Discussions,
Zhang et al.,
- PaperEval: A universal, quantitative, and explainable paper evaluation
method powered by a multi-agent system, Huang et
al.,