patternMinor

What empirical evidence do we have for or against a correlation between fault density and LOC?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

faultwhatdensityevidenceagainstbetweenlocforempiricaland

Problem

LOC = lines of code

KLOC = Thousand lines of code

Fault (or defect) density = number of reported bugs per line of code.

Software artifact = function, class, module

Reading research papers on fault density and fault prediction, it seems a bit hard to get an overview, because there are lots of studies, and lots of different technologies used, both statistics and machine learning.

The rational behind a correlation between fault density and LOC is that with higher LOC of a software artifact comes higher complexity, and it gets harder for the programmer to change the code; changeability decreases, which leads to more bugs. Note that this is not a question about a correlation between fault density and complexity, even if there might be such a connection too.

I guess what I'm looking for is a study or a book that has an overview over current research in this topic. :)

Here's one study claiming there is such a correlation: https://www.gwern.net/docs/dual-n-back/1997-hatton.pdf

Here's another one claiming there isn't: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.452.8933&rep=rep1&type=pdf

Edit: Google Scholar has an option to search for only review articles, so I'll dig into this. Will post anything interesting.

Edit 2: One review article: https://media.neliti.com/media/publications/90270-EN-a-systematic-literature-review-of-softwa.pdf

However, even though various defect prediction methods
have been proposed, but none has been proven to be
consistently accurate (Challagulla et al., 2005) (Lessmann et
al., 2008). The accurate and reliable classification algorithm to
build a better prediction model is an open issue in software defect prediction. There is a need for an accurate defect prediction framework which has to be more robust to noise and
other problems associated with on datasets

And another: https://romisatriawahono.net/lecture/rm/survey/software%20engineering/Software%20Fault%20Defect%20Prediction/Radjenovic%20-%

Solution

Not an easy question to answer, because LOC can mean many different things:

A large function

A large class (or module in FP)

A large library

A large file

A large code-base

In some of these cases (function and class), LOC can be a proxy for complexity [citation needed]. You can also normalize complexity based on size.

The Software fault prediction metrics: A systematic literature review paper contains a section on LOC in fault prediction research, which I will quote fully below. The conclusion is in the last sentence:

the overall effectiveness of the LOC metric was estimated as moderate

The comparison to other metrics is the interesting part, however.

(A description of the different metrics is available in appendix C.)

The first metrics are complexity measures that work on all languages. The middle part is metrics only applicable to object-oriented code. The last part is metrics that relate to process and change in the code.

It's interesting to note the effectiveness of metrics in the pre- and post-release scenarios. LOC is noted as being more predictive before the release.

Full section:

4.3.1. RQ2.1: Are size metrics useful for fault prediction?

The simplest, the easiest to extract and the most frequently
used metric, i.e. LOC, is still being discussed to this day. There
are many studies investigating the relationship between lines of
code and number of faults. The simplest studies have ranked the
modules according to their size to find out whether a small number
of large modules are responsible for a large proportion of faults.
E.g. in Zhang [151] three versions of Eclipse were used to investigate pre-release and post-release ranking ability of LOC at the package level. This study showed that 20% of the largest modules
were responsible for 51–63% of the defects.

Ostrand et al. [130] used the negative binomial regression model on two large industrial systems. In the simple model, using only
LOC, the percentage of faults, contained in the 20% of the files that
were the largest in terms of the number of lines of code, was on
average 73% and 74% for the two systems. In a richer model, where
other metrics were used, the top-20% of files ordered by fault count
contained, on average, 59% of the lines of code and 83% of the
faults. The top-20% of files contained many large files, because
the model predicted a large number of faults in large files. In analyzing which files were likely to contain the largest number of
faults relative to their size, they used the model’s predicted number of faults and the size of each file to compute a predicted fault
density. The top-20% of files contained, on average, only 25% of
the lines of code and 62% of the faults. Sorting files by predicted
fault density was not as effective as sorting files according to fault
count at finding large numbers of faults, but it does result in considerably less code in the end.

Fenton and Ohlsson [84] investigated, among many hypotheses,
the Pareto principle [28] and the relationship between size metrics
and the number of faults. They used a graphical technique called
the Alberg diagram [127] and two versions of a telecommunication
software. As independent variables LOC, McCabe’s cyclomatic complexity and SigFF metrics were used. In pre-release 20% of the modules were responsible for nearly 60% of the faults and contained
just 30% of the code. A replicated study by Andersson and Runeson
[59] found an even larger proportion of faults, in a smaller proportion of the modules. This result is also in agreement with
[130,151]

Fenton and Ohlsson also tested the hypothesis of whether size
metrics (such as LOC) are good predictors of pre-release and
post-release faults in a module and whether they are good predictors of a module’s pre-release and post-release fault density. They
showed that size metrics (such as LOC) are moderate predictors of
the number of pre-release faults in a module, although they do not
predict the number of post-release failures in a module, nor can
they predict a module’s fault density. Even though the hypothesis
was rejected, the authors concluded that LOC is quite good at ranking the most fault-prone modules. Andersson and Runeson, on the
other hand, got varying results. The first two projects did not indicate any particularly strong ranking ability for LOC. However, in
the third project, 20% of the largest modules were responsible for
57% of all the faults

Koru et al. [35,104] reported that defect proneness increases
with size, but at a slower rate. This makes smaller modules proportionally more problematic compared with larger ones

In [34,59], it is noted that relating size metrics to fault density
may be misleading, as there will always be a negative correlation
between size metrics and fault density, because of the functional
relationship between the variables [47]. However, no studies using
fault density as dependent variable were excluded from the review
because we wanted to represent the enti

Context

StackExchange Computer Science Q#144527, answer score: 2

Revisions (0)

No revisions yet.