A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://rapidfuzz.github.io/RapidFuzz/Usage/process.html below:

rapidfuzz.process - RapidFuzz 3.13.0 documentation

Toggle table of contents sidebar

rapidfuzz.process cdist
rapidfuzz.process.cdist(queries, choices, *, scorer=<cyfunction ratio>, processor=None, score_cutoff=None, score_hint=None, score_multiplier=1, dtype=None, workers=1, **kwargs)

Compute distance/similarity between each pair of the two collections of inputs.

Parameters:
  • queries (Collection[Sequence[Hashable]]) – list of all strings the queries

  • choices (Collection[Sequence[Hashable]]) – list of all strings the query should be compared

  • scorer (Callable, optional) – Optional callable that is used to calculate the matching score between the query and each choice. This can be any of the scorers included in RapidFuzz (both scorers that calculate the edit distance or the normalized edit distance), or a custom function, which returns a normalized edit distance. fuzz.ratio is used by default.

  • processor (Callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (Any, optional) – Optional argument for a score threshold to be passed to the scorer. Default is None, which deactivates this behaviour.

  • score_hint (Any, optional) – Optional argument for an expected score to be passed to the scorer. This is used to select a faster implementation. Default is None, which deactivates this behaviour.

  • score_multiplier (Any, optional) – Optional argument to multiply the calculated score with. This is applied as the final step, so e.g. score_cutoff is applied on the unmodified score. This is mostly useful to map from a floating point range to an integer to reduce the memory usage. Default is 1, which deactivates this behaviour.

  • dtype (data-type, optional) –

    The desired data-type for the result array. Depending on the scorer type the following dtypes are supported:

    • similarity: - np.float32, np.float64 - np.uint8 -> stores fixed point representation of the result scaled to a range 0-100

    • distance: - np.int8, np.int16, np.int32, np.int64

    If not given, then the type will be np.float32 for similarities and np.int32 for distances.

  • workers (int, optional) – The calculation is subdivided into workers sections and evaluated in parallel. Supply -1 to use all available CPU cores. This argument is only available for scorers using the RapidFuzz C-API so far, since it releases the Python GIL.

  • scorer_kwargs (dict[str, Any], optional) – any other named parameters are passed to the scorer. This can be used to pass e.g. weights to Levenshtein.distance

Returns:

Returns a matrix of dtype with the distance/similarity between each pair of the two collections of inputs.

Return type:

ndarray

cpdist
rapidfuzz.process.cpdist(queries, choices, *, scorer=<cyfunction ratio>, processor=None, score_cutoff=None, score_hint=None, score_multiplier=1, dtype=None, workers=1, **kwargs)

Compute the pairwise distance/similarity between corresponding elements of the queries & choices.

Parameters:
  • queries (Collection[Sequence[Hashable]]) – list of strings used to compute the distance/similarity.

  • choices (Collection[Sequence[Hashable]]) – list of strings the queries should be compared with. Must be the same length as the queries.

  • scorer (Callable, optional) – Optional callable that is used to calculate the matching score between the query and each choice. This can be any of the scorers included in RapidFuzz (both scorers that calculate the edit distance or the normalized edit distance), or a custom function, which returns a normalized edit distance. fuzz.ratio is used by default.

  • processor (Callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour.

  • score_cutoff (Any, optional) – Optional argument for a score threshold to be passed to the scorer. Default is None, which deactivates this behaviour.

  • score_hint (Any, optional) – Optional argument for an expected score to be passed to the scorer. This is used to select a faster implementation. Default is None, which deactivates this behaviour.

  • score_multiplier (Any, optional) – Optional argument to multiply the calculated score with. This is applied as the final step, so e.g. score_cutoff is applied on the unmodified score. This is mostly useful to map from a floating point range to an integer to reduce the memory usage. Default is 1, which deactivates this behaviour.

  • dtype (data-type, optional) –

    The desired data-type for the result array. Depending on the scorer type the following dtypes are supported:

    • similarity: - np.float32, np.float64 - np.uint8 -> stores fixed point representation of the result scaled to a range 0-100

    • distance: - np.int8, np.int16, np.int32, np.int64

    If not given, then the type will be np.float32 for similarities and np.int32 for distances.

  • workers (int, optional) – The calculation is subdivided into workers sections and evaluated in parallel. Supply -1 to use all available CPU cores. This argument is only available for scorers using the RapidFuzz C-API so far, since it releases the Python GIL.

  • scorer_kwargs (dict[str, Any], optional) – any other named parameters are passed to the scorer. This can be used to pass e.g. weights to Levenshtein.distance

Returns:

Returns a matrix of size (n x 1) of dtype with the distance/similarity between each pair of the two collections of inputs.

Return type:

ndarray


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4