Do Automatic Comment Generation Techniques Fall Short?
Exploringthe Influence of Method Dependencies on Code Understanding
Published in EASE 2025 (A Category Conference)
Paper Link: https://arxiv.org/abs/2504.19459
Method-level comments are critical for improving code comprehension and supporting software maintenance. With advancements in large language models (LLMs), automated comment generation has become a major research focus. However, existing approaches often overlook method dependencies, where one method relies on or calls others, affecting comment quality and code understandability. This study investigates the prevalence and impact of dependent methods in software projects and introduces a dependency-aware approach for method-level comment generation. Analyzing a dataset of 10 popular Java GitHub projects, we found that dependent methods account for 69.25% of all methods and exhibit higher engagement and change proneness compared to independent methods. Across 448K dependent and 199K independent methods, we observed that state-of-the-art fine-tuned models (e.g., CodeT5+, CodeBERT) struggle to generate comprehensive comments for dependent methods, a trend also reflected in LLM-based approaches like ASAP. To address this, we propose HelpCOM, a novel dependency-aware technique that incorporates helper method information to improve comment clarity, comprehensiveness, and relevance. Experiments show that HelpCOM outperforms baseline methods by 5.6% to 50.4% across syntactic (e.g., BLEU), semantic (e.g., SentenceBERT), and LLM-based evaluation metrics. A survey of 156 software practitioners further confirms that HelpCOM significantly improves the comprehensibility of code involving dependent methods, highlighting its potential to enhance documentation, maintainability, and developer productivity in large-scale systems.
Are Large Language Models a Threat to Programming Platforms? An Exploratory Study
Published in ESEM 2024 (A Category Conference)
Paper Link: https://dl.acm.org/doi/10.1145/3674805.3686689
Background: Competitive programming platforms such as LeetCode, Codeforces, and HackerRank provide challenges to evaluate programming skills. Technical recruiters frequently utilize these platforms as a criterion for screening resumes. With the recent advent of advanced Large Language Models (LLMs) like ChatGPT, Gemini, and Meta AI, there is a need to assess their problem-solving ability on the programming platforms. Aims: This study aims to assess LLMs’ capability to solve diverse programming challenges across programming platforms with varying difficulty levels, providing insights into their performance in real-time and offline scenarios, comparing them to human programmers, and identifying potential threats to established norms in programming platforms. Method: This study utilized 98 problems from LeetCode and 126 from Codeforces, covering 15 categories and varying difficulty levels. Then, we participated in nine online contests from Codeforces and LeetCode. Finally, two certification tests were attempted on HackerRank to gain insights into LLMs’ real-time performance. Prompts were used to guide LLMs in solving problems, and iterative feedback mechanisms were employed. We also tried to find any possible correlation among the LLMs in different scenarios. Results: LLMs generally achieved higher success rates on LeetCode (e.g., ChatGPT at 71.43%) but faced challenges on Codeforces. While excelling in HackerRank certifications, they struggled in virtual contests, especially on Codeforces. Despite diverse performance trends, ChatGPT consistently performed well across categories, yet all LLMs struggled with harder problems and lower acceptance rates. In LeetCode archive problems, LLMs generally outperformed users in time efficiency and memory usage but exhibited moderate performance in live contests, particularly in harder Codeforces contests compared to humans. Conclusions: While not necessarily a threat, the performance of LLMs on programming platforms is indeed a cause for concern. With the prospect of more efficient models emerging in the future, programming platforms need to address this issue promptly.