Hi, I’m Soren Dunn. I’m currently a second-year master’s student at the University of Illinois at Urbana-Champaign supervised by Professor Lingming Zhang. I received my bachelor’s triple majoring in Data Science, Statistics, and Chemistry from the University of Chicago in 2023.
My research goal is to improve both the benchmarks we use to evaluate model reasoning along with the rigor of the approaches which we use to evaluate models on them in order to hone current efforts to build model systems capable of real-world problem solving.I am particularly focused on conducting reasoning evaluations for language models in the coding domain.
News
- February 26th, 2025: Agentless Lite doubles state-of-the-art on SWE-bench Multimodal from 12.19% to 25.34% for 1/4th of the cost without even requiring a runtime environment!
- February 14th, 2025: I released Agentless Lite - a generalized, lightweight adaptation of the Agentless scaffold which is competitive with SOTA agents while only requiring sampling from a single RAG prompt.
- January 31st, 2025: Agentless used by both DeepSeek and OpenAI to evaluate their new reasoning models (r1 and o3-mini) on SWE-bench
- Dec 2nd, 2024: Integrated Agentless with Claude 3.5 Sonnet to achieve 40.7% solve rate on SWE-bench lite and 50.8% solve rate on SWE-bench verified
- Oct 28th, 2024: Released OpenAutoCoder-Agentless 1.5 which increases Agentless performance from 27.3% to 32.00% on SWE-bench lite
- September 26, 2024: MedCalc-Bench was accepted as an oral presentation for the NeurIPS 2024 Datasets and Benchmark Track
- September 12, 2024: Agentess was used by OpenAI as their scaffold of choice for evalauting gpt-4o, o1-mini, and o1-preview’s model autonomy as part of their preparedness framework
- July 1st, 2024: Released OpenAutoCoder-Agentless 1.0! Agentless currently is the best open-source approach on SWE-bench lite with 82 fixes (27.3%) and costing on average $0.34 per issue.
Education
- Data Science (B.S.) with Honors, Statistics (B.A.), Chemistry (B.A.), University of Chicago October 2019 - June 2023
- Masters of Statistics, University of Illinois at Urbana-Champaign, August 2023 - June 2025
Publications
[NeurIPS 2024 Benchmark Track Oral] Nikhil Khandekar*, Qiao Jin*, Guangzhi Xiong*, Soren Dunn, Serina S Applebaum, Zain Anwar, Maame Sarfo-Gyamfi, Conrad W Safranek, Abid Anwar, Andrew Jiaxing Zhang, Aidan Gilson, Maxwell B Singer, Amisha D Dave, R. Andrew Taylor, Aidong Zhang, Qingyu Chen, Zhiyong Lu. MedCalc-Bench: Evaluating Large Language Models for Medical Calculations, June, 2024 [paper] [code]
[Preprint] Chunqiu Steven Xia*, Yinlin Deng*, Soren Dunn, Lingming Zhang. Agentless: Demystifying LLM-based Software Engineering Agents, July, 2024 [paper] [code]
[Journal of the American Chemical Society] Xuanyu Feng, Yang Song, Justin Chen, Ziwan Xu, Soren Dunn, Wenbin Lin. Rational Construction of an Artificial Binuclear Copper Monooxygenase in a Metal–Organic Framework, January, 2021 [paper]