Pull Request Coverage: How to Measure What Really Matters in a Pull Request

Lior Froimovich
Dec 3, 2025

Summary

Pull Request Coverage offers a clearer and more accurate way to measure the quality of code being introduced in a PR. Unlike global project coverage, which can hide newly introduced untested logic, PR Coverage focuses directly at the code that changed and how thoroughly it’s tested.

This article introduces two complementary methods for evaluating PR coverage:

  • Weighted PR Coverage: evaluates coverage across all touched files, weighted by their size and number of statements.
  • Function-Level Coverage: measures coverage only for the functions whose logic actually changed in the PR.

Together, these approaches give teams a practical framework for understanding the real test quality of incremental code in modern CI/CD pipelines.

Introduction

Engineering teams often rely on overall project coverage as a proxy for test quality, but this metric breaks down when evaluating new code in a pull request.

A repository might show 90% overall coverage, yet a developer could introduce hundreds of new, untested lines, and the metric won’t move at all. This creates a clear blind spot when reviewing incremental changes.  

This blog post introduces a practical method to measure the real quality of code added or modified in a PR. It begins with file-level weighted PR coverage, and then extends the approach with function-level diff coverage to provide a more precise view of how thoroughly the changed logic is tested.

Pull Request Weighted Coverage Definition

1. File-level PR coverage

Simple averages treat all files equally, but they shouldn’t.

A file with 1 uncovered statement should not weigh the same as a file with 500 well-tested statements. Weighted PR coverage addresses this by giving larger, more logic-heavy files a proportionally higher impact on the final metric.

Weighted PR Coverage Formula

A diagram of mathematical equationsAI-generated content may be incorrect.

Example

A PR touches three files:

File Coverage Statements
user.ts 100% 10
auth.ts 50% 20
api.ts 75% 30

Weighted coverage reflects the true testing effort because files with more logic and more potential risk contribute more to the result.

A black numbers on a white backgroundAI-generated content may be incorrect.

Why It Works

  • Prioritizes testing where it matters
  • Makes coverage drops difficult to hide
  • Helps CI catch meaningful quality regressions
  • Aligns with real-world software risk

2.  Function-Level PR Coverage

Why Go Deeper?

Sometimes a file is marked as “changed” because of formatting, imports, or comments, even though the underlying logic wasn’t touched. Large files may also contain many functions, while only one of them was modified. In this case, using file-level granularity can dramatically overstate how much code truly changed.

To address this, we calculate coverage only for the functions that were added or changed in the PR.

Coverage of Changed/New Functions Only

  • Detect new or modified functions
  • Count statements within those functions
  • Measure covered vs. total statements
  • Compute coverage as a percentage

Example

A PR adds two functions:

  • addUser() – 5 statements
  • deleteUser() – 7 statements

Tests cover 9 of 12 statements → 75% function-level coverage

This provides a far more precise measurement of how thoroughly the actual logical changes are tested.

The Tradeoff Between the Two Approaches

Approach What It Measures Strengths Limitations
Weighted PR Coverage Coverage of all touched files, weighted by size
and number of statements
  • Improves quality of the entire area touched
  • Gradually increases project-wide coverage
  • Helps surface legacy debt
  • Can be noisy when small edits hit large low-coverage files
  • May trigger tests unrelated to the changed logic
Function-Level Coverage Coverage of only the functions
whose logic changed
  • Focused, precise, and fair
  • High accountability for new work
  • Does not raise coverage of neighboring functions
  • Low-coverage areas remain low until later changes

In short:

  • File-level coverage improves the surrounding area (“the neighborhood”) around the change.
  • Function-level coverage evaluates the intent and correctness of the change itself.

How We Use Pull Request Coverage at Early

At Early, PR Coverage isn’t just a concept, it is built directly into how our agents work.

When an agent generates tests for a pull request, it also evaluates the quality of those tests using the same metrics described in this article. We surface both file-level and function-level coverage for each PR, so developers can immediately see:

  • how well the changed code is tested
  • where coverage increased
  • and where gaps still exist

This gives teams a clear, visual understanding of the quality impact of every PR the agent touches, making test generation measurable, trustworthy, and grounded in the actual logic that changed.

A screenshot of a graphAI-generated content may be incorrect.

Closing thoughts

  • Weighted PR Coverage provides a balanced view across all touched files.
  • Function-Level PR Coverage zooms into the exact logic that changed.
  • Combined, they measure both the breadth and depth of test quality in a pull request.
  • Teams can rely on them, rather than misleading global coverage to enforce real quality in CI/CD pipelines.

FAQ

  • What is Pull Request Coverage?
    Pull Request Coverage is a metric that measures how well the code introduced or modified in a pull request is tested, focusing only on the changed logic rather than the entire project.
  • Why is global code coverage misleading for PRs?
    Global coverage reflects historical test quality and can hide newly introduced untested code. A PR can add hundreds of untested lines without changing the overall metric.
  • What is Weighted PR Coverage?
    Weighted PR Coverage calculates coverage across all touched files, weighted by the number of executable statements in each file, giving more influence to logic-heavy files.
  • What is Function-Level PR Coverage?
    Function-Level Coverage measures only the coverage of functions whose logic changed in the PR, providing a precise evaluation of new or modified behavior.
  • Which approach should engineering teams use?
    Both. Weighted PR Coverage improves the surrounding area touched by the PR, while Function-Level Coverage evaluates the quality of the specific logic that changed. Together, they provide a complete picture.
  • How does PR Coverage integrate with CI/CD?
    PR Coverage can run automatically during PR validation in CI/CD pipelines, blocking or flagging changes that introduce insufficiently tested logic.
  • How does PR Coverage relate to AI-generated code?
    As AI tools generate more code, PR Coverage ensures that the logic introduced by both humans and AI is properly tested, making the quality of generated code measurable and trustworthy.
  • Does PR Coverage replace traditional code coverage?
    No. Global coverage still reflects long-term test health, while PR Coverage focuses on the quality of incremental changes. Together they form a complete testing strategy.
  • These answers summarize how PR Coverage provides a practical and precise way to evaluate test quality in modern development workflows.

For teams looking to evaluate code quality beyond individual pull requests, see how these metrics fit into Early Quality Score (EQS), our unified benchmark for real test quality.

Try Early’s test-generation agents and see how PR Coverage becomes measurable, actionable, and effortless.

👉 Book a demo