RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://www.geeksforgeeks.org/mongodb/substrcp-aggregation-operator-in-mongodb/ below:

MongoDB $substrCP Operator - GeeksforGeeks

MongoDB $substrCP Operator

Last Updated : 15 Jul, 2025

The $substrCP operator in MongoDB is a powerful tool used within the aggregation framework to extract substrings from strings based on Unicode code points. Unlike traditional substring functions that operate on bytes, $substrCP ensures accurate substring extraction for both ASCII and non-ASCII characters, making it an essential operator for handling multilingual and special character-based text data.

What is $substrCP Operator in MongoDB?

The $substrCP Operator in MongoDB is used in the aggregation pipeline to find substrings from a given string expression. It uses the Unicode code point index and count to determine the substring. This makes it useful when working with strings that contain non-ASCII characters, as it handles the Unicode code points correctly ensuring that characters are correctly processed regardless of their encoding.

For example: { $substrCP: [ "geeksforgeeks", 0, 5 ] } will give the output as "geeks" as 0 is given as starting location, and from there 5 characters need to be taken and hence "geeks" is the result

Why Use $substrCP?

Works seamlessly with non-ASCII characters (e.g., Chinese, Arabic, emojis, etc.).
Uses Unicode code points instead of byte positions, ensuring accuracy.
Supports multibyte character sets effectively.
Enhances data processing in multilingual applications.
Helps extract portions of text fields for analysis, filtering, and transformations.

Syntax:

{ $substrCP: [ <your string expression>, <code point index>, <code point count> ] }

Key Terms:

string expression: It is a valid string expression with alpha/alphanumeric and also with special characters from which the substring will be extracted.
code point index: It is a non-negative integer that represents the starting point of the substring
code point counts: Non-negative integer specifying the number of characters that need to be taken from the code point index.

Examples of MongoDB $substrCP Operator

To understand MongoDB $substrCP Operator we need a collection on which we will perform various operations and queries.

Database: geeksforgeeks
Collection: articles
Documents: three documents that contain the details of the articles in the form of field-value pairs.

Example 1: Using $substrCP operator

We have an articles collection with a publishedon field storing publication dates in YYYYMMDD format. We need to extract the publication month and publication year separately.

Query:

db.articles.aggregate([
  {
    $project: {
      articlename: 1,
      publicationmonth: { $substrCP: [ "$publishedon", 0, 4 ] },
      publicationyear: {
        $substrCP: [
          "$publishedon",
          4,
          { $subtract: [ { $strLenCP: "$publishedon" }, 4 ] }
        ]
      }
    }
  }
])

Output:

Explanation:

"publicationmonth" extracts the first 4 characters of publishedon, representing the year.
"publicationyear" extracts the remaining characters by using $subtract to calculate the length dynamically.

Example 2: Single-Byte Character Set

Suppose we have a collection articles in the geeksforgeeks database with documents containing an articlename field. We want to create a new field shortName with only the first 10 characters of each article's name. This is useful for displaying short previews of article titles.

Query:

db.articles.aggregate([
  {
    $project: {
      articlename: 1,
      shortName: {
        $substrCP: ["$articlename", 0, 10]
      }
    }
  }
]);

Output:

{
  "articlename": "Deep learning in R Programming",
  "shortName": "Deep learn"
}

Explanation:

$substrCP extracts a substring starting from index 0 (first character) and taking 10 characters from articlename.
The resulting shortName contains the first 10 characters, which can be used as a preview or snippet of the full title.
This approach ensures correct handling of Unicode characters, preventing any corruption in case of multibyte characters.

Example 3: Handling Multibyte Character Set

Suppose another document in the articles collection has an articlename in a Multibyte Character Set.

Query:

db.articles.aggregate([
  {
    $project: {
      shortName: {
        $substrCP: ["$articlename", 0, 15]
      }
    }
  }
]);

Output:

{ "shortName": "Social Media AP" }

Explanation: $substrCP ensures that characters are correctly extracted even if they are multibyte characters, preventing data corruption.

Important Points About MongoDB $substrCP Operator

The $substrCP operator is used in the aggregation pipeline to extract a substring from a given string expression.
It uses the Unicode code point index and count to determine the substring.
The $substrCP operator is useful when working with strings that contain non-ASCII characters, as it handles the Unicode code points correctly.
The $substrCP operator is designed to work efficiently within the MongoDB aggregation framework, providing a way to manipulate string data based on Unicode code points.

Conclusion

In MongoDB, the $substrCP operator is crucial for accurately extracting substrings based on Unicode code points. It supports efficient handling of both single-byte and multibyte character sets which making it essential for applications managing diverse textual data. By using $substrCP, MongoDB users can effectively manipulate string data within the aggregation framework. Whether we're dealing with date parsing, text truncation, or multilingual support, $substrCP is a must-use operator for handling Unicode-compliant string manipulation in MongoDB.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4