Pull Request Coverage: How to Measure What Really Matters in a Pull Request

Lior Froimovich

Dec 3, 2025

Summary
Introduction
Pull Request Weighted Coverage Definition

file-based PR coverage
method-based PR coverage

The Tradeoff Between the Two Approaches
How We Use Pull Request Coverage at Early
Closing thoughts
FAQ

‍

Summary

Pull Request Coverage offers a clearer and more accurate way to measure the quality of code being introduced in a PR. Unlike global project coverage, which can hide newly introduced untested logic, PR Coverage focuses directly at the code that changed and how thoroughly it’s tested.

This article introduces two complementary methods for evaluating PR coverage:

Weighted PR Coverage: evaluates coverage across all touched files, weighted by their size and number of statements. ‍
Function-Level Coverage: measures coverage only for the functions whose logic actually changed in the PR.

Together, these approaches give teams a practical framework for understanding the real test quality of incremental code in modern CI/CD pipelines.

Introduction

Engineering teams often rely on overall project coverage as a proxy for test quality, but this metric breaks down when evaluating new code in a pull request.

A repository might show 90% overall coverage, yet a developer could introduce hundreds of new, untested lines, and the metric won’t move at all. This creates a clear blind spot when reviewing incremental changes.

This blog post introduces a practical method to measure the real quality of code added or modified in a PR. It begins with file-level weighted PR coverage, and then extends the approach with function-level diff coverage to provide a more precise view of how thoroughly the changed logic is tested.

Pull Request Weighted Coverage Definition

1. File-level PR coverage

Simple averages treat all files equally, but they shouldn’t.

A file with 1 uncovered statement should not weigh the same as a file with 500 well-tested statements. Weighted PR coverage addresses this by giving larger, more logic-heavy files a proportionally higher impact on the final metric.

Weighted PR Coverage Formula

Example

A PR touches three files:

File	Coverage	Statements
user.ts	100%	10
auth.ts	50%	20
api.ts	75%	30

Weighted coverage reflects the true testing effort because files with more logic and more potential risk contribute more to the result.

Why It Works

Prioritizes testing where it matters
Makes coverage drops difficult to hide
Helps CI catch meaningful quality regressions
Aligns with real-world software risk

2. Function-Level PR Coverage

Why Go Deeper?

Sometimes a file is marked as “changed” because of formatting, imports, or comments, even though the underlying logic wasn’t touched. Large files may also contain many functions, while only one of them was modified. In this case, using file-level granularity can dramatically overstate how much code truly changed.

To address this, we calculate coverage only for the functions that were added or changed in the PR.

Coverage of Changed/New Functions Only

Detect new or modified functions
Count statements within those functions
Measure covered vs. total statements
Compute coverage as a percentage

Example

A PR adds two functions:

addUser() – 5 statements
deleteUser() – 7 statements

Tests cover 9 of 12 statements → 75% function-level coverage

This provides a far more precise measurement of how thoroughly the actual logical changes are tested.

The Tradeoff Between the Two Approaches

Approach	What It Measures	Strengths	Limitations
Weighted PR Coverage	Coverage of all touched files, weighted by size and number of statements	Improves quality of the entire area touched Gradually increases project-wide coverage Helps surface legacy debt	Can be noisy when small edits hit large low-coverage files May trigger tests unrelated to the changed logic
Function-Level Coverage	Coverage of only the functions whose logic changed	Focused, precise, and fair High accountability for new work	Does not raise coverage of neighboring functions Low-coverage areas remain low until later changes

In short:

File-level coverage improves the surrounding area (“the neighborhood”) around the change. ‍
Function-level coverage evaluates the intent and correctness of the change itself.

How We Use Pull Request Coverage at Early

At Early, PR Coverage isn’t just a concept, it is built directly into how our agents work.

When an agent generates tests for a pull request, it also evaluates the quality of those tests using the same metrics described in this article. We surface both file-level and function-level coverage for each PR, so developers can immediately see:

how well the changed code is tested
where coverage increased
and where gaps still exist

This gives teams a clear, visual understanding of the quality impact of every PR the agent touches, making test generation measurable, trustworthy, and grounded in the actual logic that changed.

Closing thoughts

Weighted PR Coverage provides a balanced view across all touched files. ‍
Function-Level PR Coverage zooms into the exact logic that changed.
Combined, they measure both the breadth and depth of test quality in a pull request.
Teams can rely on them, rather than misleading global coverage to enforce real quality in CI/CD pipelines. ‍

FAQ

What is Pull Request Coverage?
Pull Request Coverage is a metric that measures how well the code introduced or modified in a pull request is tested, focusing only on the changed logic rather than the entire project. ‍
Why is global code coverage misleading for PRs?
Global coverage reflects historical test quality and can hide newly introduced untested code. A PR can add hundreds of untested lines without changing the overall metric. ‍
What is Weighted PR Coverage?
Weighted PR Coverage calculates coverage across all touched files, weighted by the number of executable statements in each file, giving more influence to logic-heavy files. ‍
What is Function-Level PR Coverage?
Function-Level Coverage measures only the coverage of functions whose logic changed in the PR, providing a precise evaluation of new or modified behavior. ‍
Which approach should engineering teams use?
Both. Weighted PR Coverage improves the surrounding area touched by the PR, while Function-Level Coverage evaluates the quality of the specific logic that changed. Together, they provide a complete picture. ‍
How does PR Coverage integrate with CI/CD?
PR Coverage can run automatically during PR validation in CI/CD pipelines, blocking or flagging changes that introduce insufficiently tested logic. ‍
How does PR Coverage relate to AI-generated code?
As AI tools generate more code, PR Coverage ensures that the logic introduced by both humans and AI is properly tested, making the quality of generated code measurable and trustworthy. ‍
Does PR Coverage replace traditional code coverage?
No. Global coverage still reflects long-term test health, while PR Coverage focuses on the quality of incremental changes. Together they form a complete testing strategy.
These answers summarize how PR Coverage provides a practical and precise way to evaluate test quality in modern development workflows.

For teams looking to evaluate code quality beyond individual pull requests, see how these metrics fit into Early Quality Score (EQS), our unified benchmark for real test quality.

Try Early’s test-generation agents and see how PR Coverage becomes measurable, actionable, and effortless.

👉 Book a demo

‍