A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pycompression/xopen below:

pycompression/xopen: Open compressed files in Python

This Python module provides an xopen function that works like Python’s built-in open function but also transparently deals with compressed files. xopen selects the most efficient method for reading or writing a compressed file.

Supported compression formats are:

Open a file for reading:

from xopen import xopen

with xopen("file.txt.gz") as f:
    content = f.read()

Write to a file in binary mode, set the compression level and avoid using an external process:

from xopen import xopen

with xopen("file.txt.xz", mode="wb", threads=0, compresslevel=3) as f:
    f.write(b"Hello")

The xopen module offers a single function named xopen with the following signature:

xopen(
  filename: str | bytes | os.PathLike,
  mode: Literal["r", "w", "a", "rt", "rb", "wt", "wb", "at", "ab"] = "r",
  compresslevel: Optional[int] = None,
  threads: Optional[int] = None,
  *,
  encoding: str = "utf-8",
  errors: Optional[str] = None,
  newline: Optional[str] = None,
  format: Optional[str] = None,
) -> IO

The function opens the file using a function suitable for the detected file format and returns an open file-like object.

When writing, the file format is chosen based on the file name extension: .gz, .bz2, .xz, .zst. This can be overriden with format. If the extension is not recognized, no compression is used.

When reading and a file name extension is available, the format is detected from the extension. When reading and no file name extension is available, the format is detected from the file signature <https://en.wikipedia.org/wiki/File_format#Magic_number>.

filename (str, bytes, or os.PathLike): Name of the file to open.

If set to "-", standard output (in mode "w") or standard input (in mode "r") is returned.

mode, encoding, errors, newline: These parameters have the same meaning as in Python’s built-in open function except that the default encoding is always UTF-8 instead of the preferred locale encoding. encoding, errors and newline are only used when opening a file in text mode.

compresslevel: The compression level for writing to gzip, xz and Zstandard files. If set to None, a default depending on the format is used: gzip: 1, xz: 6, Zstandard: 3.

This parameter is ignored for other compression formats.

format: Override the autodetection of the input or output format. Possible values are: "gz", "xz", "bz2", "zst".

threads: Set the number of additional threads spawned for compression or decompression. May be ignored if the backend does not support threads.

If threads is None (the default), as many threads as available CPU cores are used, but not more than four.

xopen tries to offload the (de)compression to other threads to free up the main Python thread for the application. This can either be done by using a subprocess to an external application or using a library that supports threads.

Set threads to 0 to force xopen to use only the main Python thread.

Opening of gzip files is delegated to one of these programs or libraries:

For xz files, a pipe to the xz program is used because it has built-in support for multithreaded compression.

For bz2 files, pbzip2 (parallel bzip2) is used.

xopen falls back to Python’s built-in functions (gzip.open, lzma.open, bz2.open) if none of the other methods can be used.

xopen writes gzip files in a reproducible manner.

Normally, gzip files contain a timestamp in the file header, which means that compressing the same data at different times results in different output files. xopen disables this for all of the supported gzip compression backends. For example, when using an external process, it sets the command-line option --no-name (same as -n).

Note that different gzip compression backends typically do not produce identical output, so reproducibility may no longer be given when the execution environment changes from one xopen() invocation to the next. This includes the CPU architecture as igzip adjusts its algorithm depending on it.

bzip2 and xz compression methods do not store timestamps in the file headers, so output from them is also reproducible.

Optional Zstandard support

For reading and writing Zstandard (.zst) files, either the zstd command-line program or the Python zstandard package needs to be installed.

To ensure that you get the correct zstandard version, you can specify the zstd extra for xopen, that is, install it using pip install xopen[zstd].

The name xopen was taken from the C function of the same name in the utils.h file that is part of BWA.

Some ideas were taken from the canopener project. If you also want to open S3 files, you may want to use that module instead.

@kyleabeauchamp contributed support for appending to files before this repository was created.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4