"Measuring and Improving the Efficiency of Python Code Generated by LLM" by Ramya Jonnala and Sai Kiran Chillimuntha
 

Measuring and Improving the Efficiency of Python Code Generated by LLMs Using CoT Prompting and Fine-Tuning

Document Type

Conference Proceeding

Publication Date

4-2025

Abstract

With the advanced AI technologies, Large Language Models (LLMs) have improved programming automation. However, LLMs often produce code with unnecessary logic, hallucinated content, and errors due to ambiguous prompts. This research measures the efficiency of Python code generated by GPT-4o-Mini, GPT-3.5-Turbo, and GPT-4-Turbo models using metrics like execution time, memory usage, and maximum memory usage, while maintaining problem-solving correctness. Using the EffiBench dataset on Google’s Vertex AI Workbench with different machine configurations, the study uses the seed parameter for consistency and optimization techniques like Chain-of-Thought (CoT) prompting and fine-tuning GPT-4o-Mini. Except for GPT-4-Turbo, the results show that CoT prompting improves efficiency metrics for GPT-4o-Mini and GPT-3.5-Turbo. GPT-4o-Mini was selected for fine-tuning due to its better results with CoT prompt and its cost-effectiveness, but fine-tuning compromises accuracy and efficiency. Overall, high-CPU machine configurations, along with GPT-4o-Mini and CoT prompting, improves the efficiency and correctness of LLM-generated code in resource-intensive scenarios.

Comments

1:00-2:00 p.m.

BLH 262

Studies in Mathematical, Physical & Engineering

Walter Den, Moderator

Share

COinS