Research Highlights

AI Passes Wall Street's Toughest Exam

Srikanth Jagabathula

Overview: In the paper titled, “Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III,” NYU Stern Professor Srikanth Jagabathula and co-authors Pranam Shetty (Rochester Institute of Technology), Abhisek Upadhayaya (NYU Tandon School of Engineering), Parth Mitesh Shah, Shilpi Nayak, and Anna Joo Fee (GoodFin, Inc.) explore if today’s top artificial intelligence (AI) models are smart enough to work in high-stake finance.

Why study this now: Financial institutions are increasingly adopting AI for various purposes, including research, education, and client services. As these AI models become more prevalent in companies’ operations, it is important to ensure they can meet strict standards required in this industry before entrusting them with significant financial decisions. The gold standard in the industry is the Chartered Financial Analyst (CFA) exam. Previous studies have shown that AI can easily pass CFA Levels I and II, but have struggled to pass the more analytical Level III exam.

What the authors found: Examining both reasoning and non-reasoning AI models with three different prompting strategies, the researchers found that:

  • Advanced AI models without access to the Internet or practice examples as part of the prompt can now score well above passing grades for the CFA Level III exam… something they could not do just 18 months ago.
  • Prompting strategy is important: using a “chain-of-thought” prompt resulted in higher scores on the exam
  • While most models performed similarly on multiple-choice questions, there was significant variance between models on the essay questions, highlighting that nuanced analysis is still a new frontier in this landscape

What does this change: The gap between human experts and top-tier AI is closing, but it is important to remember that a human might understand context that an AI model might not. This research shows companies what AI tools can (and cannot) do when it comes to high-stakes financial reasoning, as well as highlights the importance of knowing how to interact with it efficiently.

Key insight: “These results demonstrate significant progress in the financial reasoning capabilities of [Large Language Models] LLMs,” note the researchers. “Our open source evaluation framework provides a foundation for continued research into making LLMs more capable and cost-effective tools for financial professionals, while maintaining the high standards necessary for responsible deployment in this critical domain.”