Publications

You can also find my articles on my Google Scholar profile.

Medcalc-Bench

Published in NeurIPS 2024 Datasets and Benchmark Track Oral, 2024

We propose MedCalc-Bench, a first-of-its-kind dataset focused on evaluating the medical calculation capability of LLMs. MedCalc-Bench contains an evaluation set of over 1000 manually reviewed instances from 55 different medical calculation tasks. Each instance in MedCalc-Bench consists of a patient note, a question requesting to compute a specific medical value, a ground truth answer, and a step-by-step explanation showing how the answer is obtained.

Recommended citation: Khandekar, N., Jin, Q., Xiong, G., Dunn, S., Applebaum, S. S., Anwar, Z., Sarfo-Gyamfi, M., Safranek, C. W., Anwar, A. A., Zhang, A., Gilson, A., Singer, M. B., Dave, A., Taylor, A., Zhang, A., Chen, Q., & Lu, Z. (2024). MedCalc-Bench: Evaluating Large Language Models for Medical Calculations. arXiv. https://arxiv.org/abs/2406.12036.
Download Paper

Agentless: Demystifying LLM-based Software Engineering Agents

Published in Under Review, 2024

Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless – an agentless approach to automatically solve software development problems.

Recommended citation: Xia, C. S., Deng, Y., Dunn, S., & Zhang, L. (2024). Agentless: Demystifying LLM-based Software Engineering Agents. arXiv preprint arXiv:2407.01489. https://arxiv.org/abs/2407.01489
Download Paper