Whether you are growing a nascent app or already running a high-traffic service, you can benefit from this guide's insights and recommendations on how to scale smoothly with FCM. These concepts and practices can help you avoid negative impacts when you need to send large volumes of messages.
Key terms and conceptsMessage Request: A FCM message request; used interchangeably with "request", "message", or "query".
Requests-per-second (RPS): A metric to describe the rate of incoming requests to FCM; used interchangeably with Queries-per-second (QPS).
Quota Tokens, Token Buckets, and Refills: When sending messages against the FCM HTTP v1 API, each request consumes an allotted Quota Token in a given time window. This window, called a "Token Bucket", refills to full at the end of the time window. For example: the HTTP v1 API allots 600K Quota Tokens for each 1-minute Token Bucket, which refills to full at the end of each 1-minute window.
Server-side Throttling: When traffic volume exceeds the FCM service's capacity, requests beyond serving capacity are rejected to rate-limit ingress flow. 429
error responses with retry-after
headers may be returned to indicate that you should wait a given time period before retrying the request.
Client-side Throttling: When clients observe request failures, high latency, or 429
errors, they should voluntarily rate-limit egress flow to avoid exacerbating congestion.
Exponential backoff: When retrying errors, add exponentially increasing time delays. For example: 1s, 2s, 4s, 8s, 16s, 32s, and so forth.
Jittering: Avoiding retrying requests at exact intervals. With jittering, you vary the retry delays through a random process to distribute them uniformly over time (for example: 0.9s, 2.3s, 4.1s, 8.5s, 17.9s, 34.7s).
Retry amplification: When failed requests are retried without exponential backoff/jittering, they often accumulate and add to ongoing traffic load, potentially "amplifying" and exacerbating traffic congestion problems.
The problem: traffic spikesFCM processes millions of requests per second (RPS). The biggest contributor to systemic congestion, latency problems, and outages is traffic spikes.
What is spiky traffic?There are several different types of traffic spikes.
On-the-hour spikes: FCM receives more than double traffic during the first 30 seconds to 2 minutes of each hour. Similar, albeit lesser, spikes are also observed at the half-hour and quarter-hour marks (examples: 00:15, 00:30, 00:45)
Retry amplification: Retrying failed or timed-out requests without Exponential backoff can accumulate into repeating waves of traffic on top of existing traffic crests.
Abrupt traffic pattern changes: Directing new traffic to FCM or moving traffic to FCM across regions without smoothing factors such as gradual ramp-up can cause spikes.
Front-loading quota token usage: Exhausting all quota tokens at the start of quota windows instead of spreading out the requests evenly across the quota windows will create on-off oscillations that are difficult and expensive to load-balance.
Tip: Intra-minute volatility can create an apparent confusion where429
errors are being served while the customer appears to be below quota. This happens because the monitoring and quota enforcement are not time aligned.
Special events: Traffic spikes during holidays (New Year's Eve) or sports events (FIFA World Cup).
Remedy traffic spikes by "flattening the curve"This section describes strategies to smooth out traffic spikes where possible—strategies to "flatten the curve."
Use FCM only for appropriate use casesThere are some use cases where using FCM to deliver a notification is not necessary or appropriate.
For example, for calendar event notifications, you can schedule a local task in your app to display a notification at the appropriate times instead of sending it from your app server. Limit FCM messages to calendar syncs.
Avoid spikesOne scaling anti-pattern is to send FCM notifications as quickly as systems will allow, instead of applying server-side throttling. Consider the following:
Wherever possible: avoid strategies that result in immediately exhausting your FCM send quota, only to repeat the pattern as soon as your token bucket refills. This access pattern creates load-balancing problems for FCM and its dependent systems. Ramp up traffic as gradually as possible. At minimum, ramp from 0 to the max RPS across a 60 second time-window. Prefer longer windows for higher RPS.
Avoid "on-the-hour" trafficWhere possible: avoid sending messages within a 2 minute window of each of the :00, :15, :30, and :45 minute marks.
Implement server-side throttlingImplement server-side throttling to monitor and manage the flow of traffic to FCM.
Tip: Monitoring is indispensable for investigating and debugging FCM scaling issues. Your production graphs help Firebase Support contextualize time, correlation, errors, and magnitude (absolute & relative). Handling retriesWhile FCM strives to be highly available, at times some requests will time out or fail. While the reasons vary, the following best practices optimize retry behavior to deliver messages as soon as possible while minimizing impact to traffic congestion.
TimeoutsSet at least a 10 second timeout on send requests before retrying. Most of FCM's internal Remote Procedure Calls use a 10 second timeout.
ErrorsTo avoid retry amplification, implement exponential back-off with jittering for retrying requests. The Firebase Admin SDK, for example, implements exponential backoff.
Here are some more recommended settings:
If a request is continually retried with exponential backoff and is still failing 60 minutes later, it is either miscategorized as a retryable error, or FCM is experiencing an outage where retries may be inadvertently exacerbating the situation.
Tip: Implementing a geographically distributed server topology will improve redundancy and improve outcomes for both initial and backoff traffic. Create rollout and rollback plans, and make gradual changesWhen making large-scale traffic changes, such as increasing traffic to FCM or shifting traffic across regions or networks, designing a rollout/rollback plan and implementing gradual changes will protect your users, your service, and FCM.
Here is a hypothetical scenario for migrating 500,000 RPS globally from the FCM Legacy HTTP API to the FCM HTTP v1 API:
Week Step Gradual Ramp-up Strategy 0 1% ramp-up Ramp-up smoothly from 0 to 5,000 RPS to FCM HTTP v1 over the course of an hour. 1 5% ramp-up Ramp-up smoothly from 5,000 to 25,000 RPS over 2 hours. 2 10% ramp-up Ramp-up smoothly from 25,000 to 50,000 RPS over 2 hours 3 25% ramp-up Ramp-up from 50,000 to 125,000 RPS over 3 hours 4 50% ramp-up Ramp-up from 125,000 to 250,000 RPS over 6 hours 5 75% ramp-up Ramp-up from 250,000 to 375,000 RPS over 6 hours 6 100% ramp-up Ramp-up from 375,000 to 500,000 RPS over 6 hoursHypothetical rollback plan:
Contact FCM through Firebase Support if any of the following apply:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3