A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.github.io/peps/pep-0770/ below:

PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials

PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials
Author:
Seth Larson <seth at python.org>
Sponsor:
Brett Cannon <brett at python.org>
PEP-Delegate:
Brett Cannon <brett at python.org>
Discussions-To:
Discourse thread
Status:
Accepted
Type:
Standards Track
Topic:
Packaging
Created:
02-Jan-2025
Post-History:
05-Nov-2024, 06-Jan-2025
Resolution:
11-Apr-2025
Table of Contents Abstract

Almost all Python packages today are accurately measurable by software composition analysis (SCA) tools. For projects that are not accurately measurable, there is no existing mechanism to annotate a Python package with composition data to improve measurability.

Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnostic method for describing software composition, provenance, heritage, and more. SBOMs are used as inputs for SCA tools, such as scanners for vulnerabilities and licenses, and have been gaining traction in global software regulations and frameworks.

This PEP proposes using SBOM documents included in Python packages as a means to improve automated software measurability for Python packages.

Motivation Measurability and Phantom Dependencies

Python packages are particularly affected by the “phantom dependency” problem, where software components that aren’t written in Python are included in Python packages for many reasons, such as ease of installation and compatibility with standards:

These software components can’t be described using Python package metadata and thus are likely to be missed by software composition analysis (SCA) software which can mean vulnerable software components aren’t reported accurately.

For example, the Python package Pillow includes 16 shared object libraries in the wheel that were bundled by auditwheel as a part of the build. None of those shared object libraries are detected when using common SCA tools like Syft and Grype. If an SBOM document is included annotating all the included shared libraries then SCA tools can identify the included software reliably.

Build Tools, Environment, and Reproducibility

Going beyond the runtime dependencies of a package: SBOMs can also record the tools and environments used to build a package. Recording the exact tools and versions used to build a package is often required to establish build reproducibility. Build reproducibility is a property of software that can be used to detect incorrectly or maliciously modified software components when compared to their upstream sources. Without a recorded list of build tools and versions it can become difficult to impossible for a third-party to verify build reproducibility.

Regulations

SBOMs are required by recent software security regulations, like the Secure Software Development Framework (SSDF) and the Cyber Resilience Act (CRA). Due to their inclusion in these regulations, the demand for SBOM documents of open source projects is expected to be high. One goal is to minimize the demands on open source project maintainers by enabling open source users that need SBOMs to self-serve using existing tooling.

Another goal is to enable contributions from users who need SBOMs to annotate projects they depend on with SBOM information. Today there is no mechanism to propagate the results of those contributions for a Python package so there is no incentive for users to contribute this type of work.

Rationale Using SBOM standards instead of Core Metadata fields

Attempting to add every field offered by SBOM standards into Python package Core Metadata would result in an explosion of new Core Metadata fields, including the need to keep up-to-date as SBOM standards continue to evolve to suit new needs in that space.

Instead, this proposal delegates SBOM-specific metadata to SBOM documents that are included in Python packages into a named directory under dist-info.

This standard also doesn’t aim to replace Core Metadata with SBOMs, instead focusing on the SBOM information being supplemental to Core Metadata. Included SBOMs only contain information about dependencies included in the package archive or information about the top-level software in the package that can’t be encoded into Core Metadata but is relevant for the SBOM use-case (“software identifiers”, “purpose”, “support level”, etc).

Zero-or-more opaque SBOM documents

Rather than requiring at most one included SBOM document per Python package, this PEP proposes that one or more SBOM documents may be included in a Python package. This means that code attempting to annotate a Python package with SBOM data may do so without being concerned about corrupting data already contained within other SBOM documents.

Additionally, this PEP treats SBOM document data opaquely instead relying on final end-users of the SBOM data to process the contained SBOM data. This choice acknowledges that SBOM standards are an active area of development where there is not yet (and may never be) a single definitive SBOM standard and that SBOM standards can continue to evolve independent of Python packaging standards. Already tools that consume SBOM documents support a multitude of SBOM standards to handle this reality.

These decisions mean this PEP is capable of supporting any SBOM standard and does not favor one over the other, instead deferring the decision to producing projects and tools and consuming user tooling.

Adding data to Python packages without new metadata versions

The rollout of a new metadata version and field requires that many different projects and teams need to adopt the metadata version in sequence to avoid widespread breakage. This effect usually means a substantial delay in how quickly users and tools can start using new packaging features.

For example, a single metadata version bump requires updates to PyPI, various pyproject.toml parsing and schema projects, the packaging library, wait for releases, then pip and other installers need to bundle the changes to packaging and release, then build backends can begin emitting the new metadata version, again wait for releases, and only then can projects begin using the new features. Even with this careful approach it’s not guaranteed that tools won’t break on new metadata versions and fields.

To avoid this delay, simplify overall how to include SBOMs, and to give flexibility to build backends and tools, this PEP proposes using a subdirectory under .dist-info to safely add data to a Python package while avoiding the need for new metadata fields and versions. This mechanism allows build backends and tools to begin using the feature described in this PEP immediately after acceptance without the head-of-line blocking on other projects adopting the PEP.

Storing files in the .dist-info or .data directory

There are two top-level directories in binary distributions where files beyond the software itself can be stored: .dist-info and .data. This specification chose to use the .dist-info directory for storing subdirectories and files.

Firstly, the .data directory has no corresponding location in the installed package, compared to .dist-info which does preserve the link between the binary distribution to the installed package in an environment. The .data directory instead has all its contents merged between all installed packages in an environment which can lead to collisions between similarly named files.

Secondly, subdirectories under the .data directory require new definitions to the Python sysconfig module. This means defining additional directories require waiting for a change to Python and using the directory requires waiting for adoption of the new Python version by users. Subdirectories under .dist-info don’t have these requirements, they can be used by any user, build backend, and installer immediately after a new subdirectory name is registered regardless of Python or metadata version.

What are the differences between PEP 770 and PEP 725?

PEP 725 (“Specifying external dependencies in pyproject.toml”) is a different PEP with some similarities to PEP 770, such as attempting to describe non-Python software within Python packaging metadata. This section aims to show how these two PEPs are tracking different information and serving different use-cases:

Specification

The changes necessary to implement this PEP include:

In addition to the above, an informational PEP will be created for tools consuming included SBOM documents and other Python package metadata to generate complete SBOM documents for Python packages.

Reserving the .dist-info/sboms directory

This PEP introduces a new registry of reserved subdirectory names allowed in the .dist-info directory for the distribution archive and installed project s project types. Future additions to this registry will be made through the PEP process. The initial values in this registry are:

See Backwards Compatibility for a complete methodology for avoiding backwards incompatibilities with selecting this directory name.

SBOM files in project formats

A few additions will be made to the existing specifications.

Built distributions (wheels)
The wheel specification will be updated to add the new registry of reserved directory names and to reflect that if the .dist-info/sboms subdirectory is specified that the directory contains SBOM files.
Installed projects
The Recording Installed Projects specification will be updated to reflect that if the .dist-info/sboms subdirectory is specified that the directory contains SBOM files and that any files in this directory MUST be copied from wheels by install tools.
SBOM data interoperability

This PEP treats data contained within SBOM documents as opaque, recognizing that SBOM standards are an active area of development. However, there are some considerations for SBOM data producers that when followed will improve the interoperability and usability of SBOM data made available in Python packages:

PyPI and other indices MAY validate the contents of SBOM documents specified by this PEP, but MUST NOT validate or reject data for unknown SBOM standards, versions, or fields.

Backwards Compatibility Reserved .dist-info/sboms subdirectory

The new reserved .dist-info/sboms subdirectory represents a new reservation that wasn’t previously documented, thus has the potential to break assumptions being made by already existing tools.

To check what .dist-info subdirectory names are in use today a query across all files in package archives on PyPI was executed:

SELECT (
  regexp_extract(archive_path, '.*\.dist-info/([^/]+)/', 1) AS dirname,
  COUNT(DISTINCT project_name) AS projects
)
FROM '*.parquet'
WHERE archive_path LIKE '%.dist-info/%/%'
GROUP BY dirname ORDER BY projects DESC;

Note that this only includes records for files and thus won’t return results for empty directories. Empty directories being pervasively used and somehow load-bearing is unlikely, so is an accepted risk of using this method. This query yielded the following results:

Subdirectory Unique Projects licenses 22,026 license_files 1,828 LICENSES 170 .ipynb_checkpoints 85 license 18 .wex 9 dist 8 include 6 build 5 tmp 4 src 3 calmjs_artifacts 3 .idea 2

Not shown above are around ~50 other subdirectory names that are used in a single project. From these results we can see:

As a result of this query we can see there are already some projects placing directories under .dist-info, so we can’t require that build frontends raise errors for unregistered subdirectories. Instead the recommendation is that build frontends MAY warn the user or raise an error in this scenario.

Security Implications

SBOM documents are only as useful as the information encoded in them. If an SBOM document contains incorrect information then this can result in incorrect downstream analysis by SCA tools. For this reason, it’s important for tools including SBOM data into Python packages to be confident in the information they are recording. SBOMs are capable of recording “known unknowns” in addition to known data. This practice is recommended when not certain about the data being recorded to allow for further analysis by users.

Because SBOM documents can encode information about the original system where a Python package is built (for example, the operating system name and version, less commonly the names of paths). This information has the potential to “leak” through the Python package to installers via SBOMs. If this information is sensitive, then that could represent a security risk.

How to Teach This

Most typical users of Python and Python packages won’t need to know the details of this standard. The details of this standard are most important to either maintainers of Python packages and developers of SCA tools such as SBOM generation tools and vulnerability scanners.

What do Python package maintainers need to know?

Python package metadata can already describe the top-level software included in a package archive, but what if a package archive contains other software components beyond the top-level software? For example, the Python wheel for “Pillow” contains a handful of other software libraries bundled inside, like libjpeg, libpng, libwebp, and so on. This scenario is where this PEP is most useful, for adding metadata about bundled software to a Python package.

Some build tools may be able to automatically annotate bundled dependencies. Typically tools can automatically annotate bundled dependencies when those dependencies come from a “packaging ecosystem” (such as PyPI, Linux distros, Crates.io, NPM, etc).

What do SBOM tool authors need to know?

Developers of SBOM generation tooling will need to know about the existence of this PEP and that Python packages may begin publishing SBOM documents within package archives. This information needs to be included as a part of generating an SBOM document for a particular Python package or Python environment.

A follow-up informational PEP will be authored to describe how to transform Python packaging metadata, including the mechanism described in this PEP, into an SBOM document describing Python packages. Once the informational PEP is complete, tracking issues will be opened specifically linking to the informational PEP to spur the adoption of PEP 770 by SBOM tools.

A benchmark is being created to compare the outputs of different SBOM tools when run with various Python packaging inputs (package archive, installed package, environment, container image) is being created to track the progress of different SBOM generation tools. This benchmark will inform where tools have gaps in support of this PEP and Python packages.

What do users of SBOM documents need to know?

Many users of this PEP won’t know of its existence, instead their software composition analysis tools, SBOM tools, or vulnerability scanners will simply begin giving more comprehensive information after an upgrade. For users that are interested in the sources of this new information, the “tool” field of SBOM metadata already provides linkages to the projects generating their SBOMs.

For users who need SBOM documents describing their open source dependencies the first step should always be “create them yourself”. Using the benchmarks above a list of tools that are known to be accurate for Python packages can be documented and recommended to users. For projects which require additional manual SBOM annotation: tips for contributing this data and tools for maintaining the data can be recommended.

Note that SBOM documents can vary across different Python package archives due to variance in dependencies, Python version, platform, architecture, etc. For this reason users SHOULD only use the SBOM documents contained within the actual downloaded and installed Python package archive and not assume that the SBOM documents are the same for all archives in a given package release.

Reference Implementation

Auditwheel fork which generates CycloneDX SBOM documents to include in wheels describing bundled shared library files. These SBOM documents worked as expected for the Syft and Grype SBOM and vulnerability scanners.

Rejected Ideas Requiring a single SBOM standard

There is no universally accepted SBOM standard and this area is still rapidly evolving (for example, SPDX released a new major version of their standard in April 2024). Most discussion and development around SBOMs today focuses on two SBOM standards: CycloneDX and SPDX.

To avoid locking the Python ecosystem into a specific standard ahead of when a clear winner emerges this PEP treats SBOM documents as opaque and only makes recommendations to promote compatibility with downstream consumers of SBOM document data.

None of the decisions in this PEP restrict a future PEP to select a single SBOM standard. Tools that use SBOM data today already need to support multiple formats to handle this situation, so a future standard that updates to require only one standard would have no effect on downstream SBOM tools.

Using metadata fields to specify SBOM files in archives

A previous iteration of this specification used an Sbom-File metadata field to specify an SBOM file within a source or binary distribution archive. This would make the implementation similar to PEP 639 which uses the License-File field to enumerate license files in archives.

The primary issue with this approach is that SBOM files can originate from both static and dynamic sources: like versioned source code, the build backend, or from tools adding SBOM files after the build has completed (like auditwheel).

Metadata fields must either be static or dynamic, not both. This is in direct conflict with the best-case scenario for SBOM data: that SBOM files are added automatically by tools during the build of a Python package without user-involvement or knowledge. Compare this situation to license files which are almost always static.

The 639-style approach was ultimately dropped in favor of defining SBOMs simply by their presence in the .dist-info/sboms directory. This approach allows build backends and tools to add their own SBOM data without the static/dynamic conflict.

A future PEP will define the process for statically defining SBOM files to be added to the .dist-info/sboms directory.

References Acknowledgements

Thanks to Karolina Surma for authoring and leading PEP 639 to acceptance. This PEP’s initial design was heavily inspired by PEP 639 and adopts a similar approach of using a subdirectory under .dist-info to store files.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4