Skip to main content
placeholder image

Lessons learned from using a deep tree-based model for software defect prediction in practice

Conference Paper


Abstract


  • Defects are common in software systems and cause many problems for software users. Different methods have been developed to make early prediction about the most likely defective modules in large codebases. Most focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and multiple levels of semantics of source code, a potentially important capability for building accurate prediction models. In this paper, we report on our experience of deploying a new deep learning tree-based defect prediction model in practice. This model is built upon the tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code. We discuss a number of lessons learned from developing the model and evaluating it on two datasets, one from open source projects contributed by our industry partner Samsung and the other from the public PROMISE repository.

UOW Authors


  •   Dam, Hoa
  •   Pham, Trang (external author)
  •   Ng, Shien Wee (external author)
  •   Tran, Truyen (external author)
  •   Grundy, John (external author)
  •   Ghose, Aditya
  •   Kim, Taeksu (external author)
  •   Kim, Chul (external author)

Publication Date


  • 2019

Citation


  • Dam, H. Khanh., Pham, T., Ng, S., Tran, T., Grundy, J., Ghose, A., Kim, T. & Kim, C. (2019). Lessons learned from using a deep tree-based model for software defect prediction in practice. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) (pp. 46-57). United States: IEEE.

Scopus Eid


  • 2-s2.0-85072330101

Start Page


  • 46

End Page


  • 57

Place Of Publication


  • United States

Abstract


  • Defects are common in software systems and cause many problems for software users. Different methods have been developed to make early prediction about the most likely defective modules in large codebases. Most focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and multiple levels of semantics of source code, a potentially important capability for building accurate prediction models. In this paper, we report on our experience of deploying a new deep learning tree-based defect prediction model in practice. This model is built upon the tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code. We discuss a number of lessons learned from developing the model and evaluating it on two datasets, one from open source projects contributed by our industry partner Samsung and the other from the public PROMISE repository.

UOW Authors


  •   Dam, Hoa
  •   Pham, Trang (external author)
  •   Ng, Shien Wee (external author)
  •   Tran, Truyen (external author)
  •   Grundy, John (external author)
  •   Ghose, Aditya
  •   Kim, Taeksu (external author)
  •   Kim, Chul (external author)

Publication Date


  • 2019

Citation


  • Dam, H. Khanh., Pham, T., Ng, S., Tran, T., Grundy, J., Ghose, A., Kim, T. & Kim, C. (2019). Lessons learned from using a deep tree-based model for software defect prediction in practice. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) (pp. 46-57). United States: IEEE.

Scopus Eid


  • 2-s2.0-85072330101

Start Page


  • 46

End Page


  • 57

Place Of Publication


  • United States