SQL window functions are essential for advanced data analysis and database management. It is a type of function that allows us to perform calculations across a specific set of rows related to the current row. These calculations happen within a defined window of data and they are particularly useful for aggregates, rankings and cumulative totals without modifying the dataset.
The OVER clause is a key for defining this window. It partitions the data into different sets (using the PARTITION BY clause) and orders them (using the ORDER BY clause). These windows enable functions like SUM(), AVG(), ROW_NUMBER(), RANK() and DENSE_RANK() to be applied in an organized manner.
Syntax
SELECT column_name1,
window_function(column_name2)
OVER([PARTITION BY column_name1] [ORDER BY column_name3]) AS new_column
FROM table_name;
Key Terms
SQL window functions can be categorized into two primary types: aggregate window functions and ranking window functions. These two types serve different purposes but share a common ability to perform calculations over a defined set of rows while retaining the original data. The employee table contains details about employees, such as their name, age, department and salary.
Employees Table Employees Table 1. Aggregate Window FunctionAggregate window functions calculate aggregates over a window of rows while retaining individual rows. Common aggregate functions include:
SELECT Name, Age, Department, Salary,
AVG(Salary) OVER( PARTITION BY Department) AS Avg_Salary
FROM employee
Output
AVG() FunctionExplanation:
These functions provide rankings of rows within a partition based on specific criteria. Common ranking functions include:
The RANK() function assigns ranks to rows within a partition, with the same rank given to rows with identical values. If two rows share the same rank, the next rank is skipped.
Example: UsingRANK()
to Rank Employees by Salary
SELECT Name, Department, Salary,
RANK() OVER(PARTITION BY Department ORDER BY Salary DESC) AS emp_rank
FROM employee;
Output
RANK() FunctionExplanation:
Rows with the same salary (e.g., Ramesh and Suresh) are assigned the same rank. The next rank is skipped (e.g., rank 2) due to duplicate ranks.
DENSE_RANK() FunctionIt assigns rank to each row within partition. Just like rank function first row is assigned rank 1 and rows having same value have same rank. The difference between RANK() and DENSE_RANK() is that in DENSE_RANK(), for the next rank after two same rank, consecutive integer is used, no rank is skipped.
Example:SELECT Name, Department, Salary,
DENSE_RANK() OVER(PARTITION BY Department ORDER BY Salary DESC) AS emp_dense_rank
FROM employee;
Output
Name Department Salary emp_dense_rank Ramesh Finance 50,000 1 Suresh Finance 50,000 1 Ram Finance 20,000 2 Deep Sales 30,000 1 Pradeep Sales 20,000 2Explanation: The DENSE_RANK()
function works similarly to RANK()
, but it doesn't skip rank numbers when there are ties. For example, if two employees have the same salary, both will receive rank 1, and the next employee will receive rank 2.
ROW_NUMBER() gives each row a unique number. It numbers rows from one to the total rows. The rows are put into groups based on their values. Each group is called a partition. In each partition, rows get numbers one after another. No two rows have the same number in a partition.
Example: Using ROW_NUMBER() for Unique Row Numbers
SELECT Name, Department, Salary,
ROW_NUMBER() OVER(PARTITION BY Department ORDER BY Salary DESC) AS emp_row_no
FROM employee;
Output
Name Department Salary emp_row_no Ramesh Finance 50,000 1 Suresh Finance 50,000 2 Ram Finance 20,000 3 Deep Sales 30,000 1 Pradeep Sales 20,000 2Explanation: ROW_NUMBER()
assigns a unique number to each employee based on their salary within the department. No two rows will have the same row number.
Window functions are extremely versatile and can be used in a variety of practical scenarios. Below are some examples of how these functions can be applied in real-world data analysis.
Example 1: Calculating Running TotalsWe want to calculate a running total of sales for each day without resetting the total, every time a new day starts.
SELECT Date, Sales,
SUM(Sales) OVER(ORDER BY Date) AS Running_Total
FROM sales_data;
Explanation: This query calculates the cumulative total of sales for each day, ordered by date.
Example 2: Finding Top N Values in Each CategoryWe need to retrieve the top 3 employees in each department based on their salary.
WITH RankedEmployees AS (
SELECT Name, Department, Salary,
RANK() OVER(PARTITION BY Department ORDER BY Salary DESC) AS emp_rank
FROM employee
)
SELECT Name, Department, Salary
FROM RankedEmployees
WHERE emp_rank <= 3;
Explanation: This query retrieves the top 3 employees per department based on salary. It uses RANK()
to rank employees within each department and filters for the top 3
While SQL window functions are incredibly powerful, there are some common pitfalls and challenges that users may encounter:
PARTITION BY
clause is used correctly. If no partition is defined, the entire result set is treated as a single window.ORDER BY
clause within the window function determines the order of calculations. Always verify that it aligns with the logic of your calculation.SQL window functions are a crucial feature for advanced data analysis and provide flexibility when working with partitioned data. By mastering the OVER, PARTITION BY and ORDER BY clause, we can perform complex calculations like aggregate calculations, ranking and cumulative totals while preserving the row-level data. The combination of window functions with ORDER BY and PARTITION BY provides a flexible approach for data manipulation across different types of datasets.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4