Collection of papers and resources on how to unlock the reasoning ability of Large Language Models.
Also check out the Awesome-Multimodal-Reasoning collection!
Large Language Models have revolutionized the NLP landscape, showing improved performance and sample efficiency over smaller models. However, increasing model size alone has not proved sufficient for high performance on challenging reasoning tasks, such as solving arithmetic or commonsense problems. We present a collection of papers and resources on how to unlock these abilities.
-
Reasoning with Language Model Prompting: A Survey.
Preprint
Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen. [Paper] [Code], 2022.12
-
Towards Reasoning in Large Language Models: A Survey.
Preprint
-
Can language models learn from explanations in context?.
EMNLP 2022
Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill. [Paper], 2022.4
-
Emergent Abilities of Large Language Models.
TMLR 2022
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus. [Paper] [Blog], 2022.6
-
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
Preprint
Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei. [Paper] [Code], 2022.10
-
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters.
Preprint
Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun. [Paper] [Code], 2022.12
-
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning.
Preprint
Omar Shaikh, Hongxin Zhang, William Held, Michael Bernstein, Diyi Yang. [Paper], 2022.12
-
Dissociating language and thought in large language models: a cognitive perspective.
Preprint
Kyle Mahowald, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, Evelina Fedorenko. [Paper], 2023.1
-
Large Language Models Can Be Easily Distracted by Irrelevant Context.
Preprint
Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, Denny Zhou. [Paper], 2023.1
-
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.
Preprint
Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung. [Paper], 2023.2
-
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.
Preprint
Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. [Paper] [Code], 2023.5
-
Faith and Fate: Limits of Transformers on Compositionality.
Preprint
Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi. [Paper], 2023.5
-
Chain of Thought Prompting Elicits Reasoning in Large Language Models.
NeurIPS 2022
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou. [Paper] [Blog], 2022.1
-
Self-consistency improves chain of thought reasoning in language models.
ICLR 2023
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou. [Paper], 2022.3
-
Iteratively Prompt Pre-trained Language Models for Chain of Thought.
EMNLP 2022
-
Least-to-most prompting enables complex reasoning in large language models.
ICLR 2023
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi. [Paper], 2022.5
-
Large Language Models are Zero-Shot Reasoners.
NeurIPS 2022
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa. [Paper], 2022.5
-
On the Advance of Making Language Models Better Reasoners.
Preprint
Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen. [Paper], 2022.6
-
Large Language Models Still Can't Plan.
NeurIPS 2022
Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati. [Paper] [Code], 2022.6
-
Solving Quantitative Reasoning Problems with Language Models.
NeurIPS 2022
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra. [Paper] [Blog], 2022.6
-
Rationale-Augmented Ensembles in Language Models.
Preprint
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou. [Paper], 2022.7
-
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning.
ICLR 2023
Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan. [Project] [Paper] [Code], 2022.9
-
Ask Me Anything: A simple strategy for prompting language models.
Preprint
Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré. [Paper] [Code], 2022.10
-
Language Models are Multilingual Chain-of-Thought Reasoners.
ICLR 2023
Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei. [Paper], 2022.10
-
Measuring and Narrowing the Compositionality Gap in Language Models.
Preprint
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis. [Paper], 2022.10
-
Automatic Chain of Thought Prompting in Large Language Models.
Preprint
Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola. [Paper] [Code], 2022.10
-
ReAct: Synergizing Reasoning and Acting in Language Models.
NeurIPS 2022 Workshop FMDM
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao. [Project] [Paper] [Code] [Blog], 2022.10
-
Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems.
Preprint
Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang. [Paper], 2022.10
-
Mind's Eye: Grounded language model reasoning through simulation.
ICLR 2023
Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai. [Paper], 2022.10
-
Language Models of Code are Few-Shot Commonsense Learners.
EMNLP 2022
Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig. [Paper] [Code], 2022.10
-
Large Language Models Can Self-Improve.
Preprint
Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han. [Paper], 2022.10
-
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach.
EMNLP 2022
Wenhao Yu, Chenguang Zhu, Zhihan Zhang, Shuohang Wang, Zhuosheng Zhang, Yuwei Fang, Meng Jiang. [Paper] [Code], 2022.10
-
PAL: Program-aided Language Models.
Preprint
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig. [Project] [Paper] [Code], 2022.11
-
Unsupervised Explanation Generation via Correct Instantiations.
AAAI 2023
Sijie Cheng, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, Lingpeng Kong. [Paper], 2022.11
-
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks.
Preprint
Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen. [Paper] [Code], 2022.11
-
Complementary Explanations for Effective In-Context Learning.
Preprint
Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, Ramakanth Pasunuru. [Paper], 2022.11
-
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation.
Preprint
Swarnadeep Saha, Xinyan Velocity Yu, Mohit Bansal, Ramakanth Pasunuru, Asli Celikyilmaz. [Paper], 2022.12
-
Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model.
Preprint
Parishad BehnamGhader, Santiago Miret, Siva Reddy. [Paper] [Code], 2022.12
-
Large Language Models are reasoners with Self-Verification.
Preprint
Yixuan Weng, Minjun Zhu, Shizhu He, Kang Liu, Jun Zhao. [Paper] [Code], 2022.12
-
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions.
Preprint
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal. [Paper] [Code], 2022.12
-
Language Models as Inductive Reasoners.
Preprint
Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei. [Paper], 2022.12
-
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language.
Preprint
Seyed Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran. [Paper], 2022.12
-
Rethinking with Retrieval: Faithful Large Language Model Inference.
Preprint
Hangfeng He, Hongming Zhang, Dan Roth. [Paper], 2023.1
-
Specializing Smaller Language Models towards Multi-Step Reasoning.
Preprint
Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot. [Paper], 2023.1
-
Faithful Chain-of-Thought Reasoning.
Preprint
Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch. [Paper], 2023.1
-
Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning.
Preprint
Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, Yongbin Li. [Paper], 2023.1
-
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models.
Preprint
Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen. [Paper], 2023.2
-
Multimodal Chain-of-Thought Reasoning in Language Models.
Preprint
Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola. [Paper] [Code], 2023.2
-
Active Prompting with Chain-of-Thought for Large Language Models.
Preprint
Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang. [Paper] [Code], 2023.2
-
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data.
Preprint
KaShun Shum, Shizhe Diao, Tong Zhang. [Paper] [Code], 2023.2
-
Language Is Not All You Need: Aligning Perception with Language Models.
Preprint
Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei. [Paper] [Code], 2023.2
-
ART: Automatic multi-step reasoning and tool-use for large language models.
Preprint
Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, Marco Tulio Ribeiro. [Paper], 2023.3
-
REFINER: Reasoning Feedback on Intermediate Representations.
Preprint
Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings. [Project] [Paper] [Code], 2023.4
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
Preprint
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan. [Paper] [Code], 2023.5
-
Reasoning Implicit Sentiment with Chain-of-Thought Prompting
ACL
Hao Fei, Bobo Li, Qian Liu, Lidong Bing, Fei Li, Tat-Seng Chua. [Paper] [Code], 2023.05
-
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond.
Preprint
Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu. [Paper], 2023.5
-
Reasoning with Language Model is Planning with World Model.
Preprint
Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu. [Paper], 2023.5
-
Scaling Instruction-Finetuned Language Models.
Preprint
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei. [Paper], 2022.10
-
Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions.
Preprint
Kumar Shridhar, Alessandro Stolfo, Mrinmaya Sachan. [Paper], 2022.12
-
Teaching Small Language Models to Reason.
Preprint
Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn. [Paper], 2022.12
-
Large Language Models Are Reasoning Teachers.
Preprint
Namgyu Ho, Laura Schmid, Se-Young Yun. [Paper] [Code], 2022.12
-
Specializing Smaller Language Models towards Multi-Step Reasoning.
Preprint
Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot. [Paper], 2023.1
Reasoning Ability | Benchmarks |
---|---|
Arithmetic | GSM8K / SVAMP / ASDiv / AQuA / MAWPS / AddSub / MultiArith / SingleEq / SingleOp / Lila |
Commonsense | CommonsenseQA / StrategyQA / ARC / BoolQ / HotpotQA / OpenBookQA / PIQA |
Symbolic | CoinFlip / LastLetterConcatenation / ReverseList |
Logical | ReClor / LogiQA / ProofWriter |
Other | BIG-bench / AGIEval / ALERT / CONDAQA / SCAN / WikiWhy |
Note: Although there is no official version for the Symbolic Reasoning benchmarks, you can generate your own here!
- Chain-of-Thought Hub Benchmarking LLM reasoning performance with chain-of-thought prompting.
- ThoughtSource Central and open resource for data and tools related to chain-of-thought reasoning in large language models.
- CoTEVer Chain of Thought Prompting Annotation Toolkit for Explanation Verification.
- AgentChain Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks.
- Cascades Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference, and more.
- LogiTorch PyTorch-based library for logical reasoning on natural language.
- Promptify Solve NLP Problems with LLM's & Easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more.
- MiniChain Tiny library for large language models.
- LlamaIndex Provides a central interface to connect your LLM's with external data.
- EasyInstruct Easy to use package for instructing Large Language Models (LLMs) like GPT-3 in research experiments.
- Awesome-Multimodal-Reasoning Collection of papers and resources on Multimodal Reasoning, including Vision-Language Models, Multimodal Chain-of-Thought, Visual Inference, and others.
- Chain-of-ThoughtsPapers A trend starts from "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models".
- LM-reasoning Collection of papers and resources on Reasoning in Large Language Models.
- Prompt4ReasoningPapers Repository for the paper "Reasoning with Language Model Prompting: A Survey".
- ReasoningNLP Paper list on reasoning in NLP
- Instruction-Tuning-Papers Reading list of Instruction-tuning.
- Deep-Reasoning-Papers Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning.
- Awesome-LLM Curated list of Large Language Model.
- Add a new paper or update an existing paper, thinking about which category the work should belong to.
- Use the same format as existing entries to describe the work.
- Add the abstract link of the paper (
/abs/
format if it is an arXiv publication).
Don't worry if you do something wrong, it will be fixed for you!