A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://sethmlarson.dev/visualizing-the-python-package-sbom-data-flow below:

Visualizing the Python package SBOM data flow

Visualizing the Python package SBOM data flow
This critical role would not be possible without funding from the Alpha-Omega project.

TLDR: Skip intro, take me to the visualization!

I'm working on improving measurability of Python packages by allowing Software Bill-of-Materials documents (SBOM) to be included in Python packages so that projects and build tools can record information about a package for downstream use.

This is a cross-functional project where I need input from Python projects, Python packaging tools (build backends+tools and installers), but also from folks completely outside the Python community like SBOM tooling maintainers. With projects like this, it can be difficult to "see the forest through the trees". When you're reviewing the packaging PEP, it can be difficult to imagine how or who is using the new standard. This article is to help visualize the end-to-end data flow.

How SBOM data will be included in Python packages§

In short, the proposal is:

End-to-end SBOM data flow§

There are two Python packages being shown, Package A on the left and Package B on the right. Package A depends on Package B. Package A is a pure-Python package with no bundled dependencies. Package B uses binary extensions and uses auditwheel to bundle shared libraries.

Auditwheel Python Environment Build Backend Python... Python... Source Forge Source Code B SBOM Generator Src... Src... Build... 3rd P... SO /... Build... Src... Build... 3rd P... Py... Build... Src... Build... METADATA Python... METADATA Operational SBOM (OBOM) 1 2 3 5 6 Package B Data Data Data Data Build Backend Python... Source Forge Source Code A METADATA Package A Data Python... METADATA Python Package Index install_re... 4 DEPENDS_ON ref ref ref Text is not SVG - cannot display
How SBOM data flows from Python package source code, build, to an SBOM generation tool

Stage 1: If the Python project bundles third-party software in their own source code then the project may specify one or more SBOM documents through project.sbom-files in pyproject.toml. Build backends copy these documents into source distributions and wheels.

Stage 2: If the Python build-backend pulls dependencies (like Maturin and Cargo) while building a wheel those dependencies can be recorded in another SBOM document in the wheel.

Stage 3: If a tool that modifies wheels by adding dependencies is used (like auditwheel) then that tool can record modifications in an SBOM document. At this point there are three separate SBOM documents included in the Package B archive.

Stage 4: Archives are uploaded to an index like PyPI. The index can do some validation of included SBOM documents, if any.

Stage 5: Installers download and install the Python package archives. The SBOM files are placed into the .dist-info/sboms/ directory in the Python environment and referenced in package metadata.

Stage 6: SBOM generation tools scan the Python environment and using existing Python package metadata and new SBOM documents with per-package data stitch together an Operational SBOM (OBOM) detailing the Python environment.

Who does what?§

The plan is to allow each "actor" in the system adding SBOM data to a Python package to create their own SBOM document inside the Python package.

This means they can choose any SBOM standard (although we'll recommend sticking to a well-known one like CycloneDX and SPDX) and that intermediate tools won't need to "merge" SBOM data together. Avoiding this merging is extremely important, because cross-standard SBOM data merges are a very hard problem. This problem is deferred to SBOM generation tools which already need to support multiple SBOM standards.

My hope is that the most difficult part of this work (manually annotating a package if automatic tools can't) will enable a new type of contribution from users of Python packages to provide SBOM data. Previously there was no standardized method to have SBOM data propagate through Python packages, thus discouraged this type of contribution.

If you're interested in having your use-case covered or you have concerns about the approach, please open a GitHub issue on the project tracker.

That's all for this post! 👋 If you're interested in more you can read the last report.

Wow, you made it to the end! Let me know what you thought on Mastodon or Bluesky. Get notified of new posts by subscribing to the RSS feed or email newsletter.

Want more content now? This blog's archive has 130 ready-to-read articles. I also curate a list of link I find on the internet.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4