SQL (Structured Query Language) is the standard language for managing and manipulating relational databases. It is widely used for querying, updating, and managing data in databases such as PostgreSQL, MySQL, SQL Server, and others.
Used to specify the columns that you want to retrieve from a database table.
Specifies the table from which to retrieve the data.
Used to assign an alias to a column or table for simplicity or readability.
Used to return only distinct (different) values in the specified column.
SELECT DISTINCT name
FROM employees;
This query returns unique employee names from the employees table.
The SELECT
and FROM
clauses are used to display data without storing the results. If you want to store the results in a new table, you can use CREATE VIEW
:
CREATE VIEW view_name AS
SELECT column_name
FROM table_name;
This query creates a view (a virtual table) based on the specified columns from the original table.
The COUNT()
function is used to count the number of rows or records that contain values in a specified column.
SELECT COUNT(column_name) AS alias_name
FROM table_name;
This counts the number of rows in the specified column.
To count the number of distinct (unique) values in a column:
SELECT COUNT(DISTINCT column_name)
FROM table_name;
Limits the number of rows returned by a query:
SELECT column_name
FROM table_name
LIMIT 10;
The COALESCE
function in SQL is used to return the first non-null expression among its arguments. When you use COALESCE(fieldname, 0)
, it checks the value of fieldname
:
- If
fieldname
is notNULL
, it returns its value. - If
fieldname
isNULL
, it returns0
(the second argument).
This is particularly useful when you want to ensure that you don't have NULL
values in your result set, and instead, you have a default value like 0
.
SELECT COALESCE(total_sales, 0) AS adjusted_sales
FROM sales_data;
In this example:
- If
total_sales
isNULL
,adjusted_sales
will be0
. - If
total_sales
has a value,adjusted_sales
will have that value.
Used to filter data based on specified conditions:
SELECT column_name
FROM table_name
WHERE condition;
The order of execution is as follows:
- FROM
- WHERE
- SELECT
- LIMIT
Used to add multiple conditions in a query:
SELECT column_name
FROM table_name
WHERE condition1 AND condition2;
Used to filter text based on patterns:
%
matches zero, one, or many characters._
matches a single character.
SELECT column_name
FROM table_name
WHERE column_name LIKE 'A%'; -- Finds values that start with 'A'
Represents empty values. Used in filtering:
SELECT column_name
FROM table_name
WHERE column_name IS NULL;
AVG()
- calculates the average of a set of values.SUM()
- sums up a set of values.
MIN()
- finds the minimum value.MAX()
- finds the maximum value.COUNT()
- counts the number of values.
SELECT AVG(salary)
FROM employees;
Used to round numbers to a specified number of decimal places.
SELECT ROUND(123.56655, 2);
-- Result: 123.57
SELECT ROUND(123456, -3);
-- Result: 123000
Sorts the results by the specified column in ascending (ASC) or descending (DESC) order.
SELECT column_name
FROM table_name
ORDER BY column_name ASC;
Used to group data by a specific column. For example, grouping people by the certification they have obtained.
SELECT certificat, COUNT(name) AS numOfName
FROM table_name
GROUP BY certificat;
Result:
certificat | numOfName |
---|---|
cnna | 3 |
sql | 2 |
git | 1 |
The HAVING
clause is used to filter grouped data after aggregation. It is similar to WHERE
but used after GROUP BY
.
SELECT certificat, COUNT(name) AS numOfName
FROM table_name
GROUP BY certificat
HAVING COUNT(name) > 1;
Result:
certificat | numOfName |
---|---|
cnna | 3 |
sql | 2 |
Description: Selects only the matching rows from both tables based on the specified condition.
Example:
SELECT customers.customer_id, orders.order_id
FROM customers
INNER JOIN orders
ON customers.customer_id = orders.customer_id;
Resulting Table:
customer_id | order_id |
---|---|
1 | 101 |
2 | 102 |
Description: When joining on two identical column names, we can use the USING
command followed by the shared column name.
Example:
SELECT customers.customer_id, orders.order_id
FROM customers
INNER JOIN orders USING (customer_id);
Resulting Table:
customer_id | order_id |
---|---|
1 | 101 |
2 | 102 |
Description: Returns all rows from the left table and the matching rows from the right table. The result is NULL
from the right side if there is no match.
Example:
SELECT customers.customer_id, orders.order_id
FROM customers
LEFT JOIN orders
ON customers.customer_id = orders.customer_id;
Resulting Table:
customer_id | order_id |
---|---|
1 | 101 |
2 | 102 |
3 | NULL |
Description: Returns all rows from the right table and the matching rows from the left table. The result is NULL
from the left side if there is no match.
Example:
SELECT customers.customer_id, orders.order_id
FROM customers
RIGHT JOIN orders
ON customers.customer_id = orders.customer_id;
Resulting Table:
customer_id | order_id |
---|---|
1 | 101 |
2 | 102 |
NULL | 103 |
Description: Returns all rows from both tables. It returns NULL
on the side where there is no match.
Example:
SELECT customers.customer_id, orders.order_id
FROM customers
FULL JOIN orders
ON customers.customer_id = orders.customer_id;
Resulting Table:
customer_id | order_id |
---|---|
1 | 101 |
2 | 102 |
3 | NULL |
NULL | 103 |
Description: Returns the Cartesian product of the two tables. All possible combinations of rows from both tables are included.
Example:
SELECT customers.customer_id, orders.order_id
FROM customers
CROSS JOIN orders;
Resulting Table:
customer_id | order_id |
---|---|
1 | 101 |
1 | 102 |
1 | 103 |
2 | 101 |
2 | 102 |
2 | 103 |
Description: A join where a table is joined with itself to combine rows with other rows from the same table.
Example:
SELECT A.employee_id, B.manager_id
FROM employees A
INNER JOIN employees B
ON A.manager_id = B.employee_id;
Resulting Table:
employee_id | manager_id |
---|---|
2 | 1 |
3 | 1 |
4 | 2 |
- Description: Combines the result sets of two or more queries and removes duplicate rows.
- Usage: When you want to merge the results from multiple queries while excluding duplicates.
Consider two tables, Employees
and Managers
:
EmployeeID | FirstName | LastName | DepartmentID |
---|---|---|---|
1 | John | Doe | 10 |
2 | Jane | Smith | 20 |
3 | Jim | Brown | 30 |
4 | Jake | White | 40 |
ManagerID | FirstName | LastName | DepartmentID |
---|---|---|---|
1 | John | Doe | 10 |
2 | Jane | Smith | 20 |
5 | Susan | Green | 50 |
6 | Tom | Black | 60 |
SELECT FirstName, LastName FROM Employees
UNION
SELECT FirstName, LastName FROM Managers;
Result: The output would be a list of unique FirstName
and LastName
values from both tables:
FirstName | LastName |
---|---|
John | Doe |
Jane | Smith |
Jim | Brown |
Jake | White |
Susan | Green |
Tom | Black |
- Description: Combines the result sets of two or more queries, including all duplicate rows.
- Usage: When you want to merge results from multiple queries and include all rows, even duplicates.
SELECT FirstName, LastName FROM Employees
UNION ALL
SELECT FirstName, LastName FROM Managers;
Result: The output would be a list of all FirstName
and LastName
values, including duplicates:
FirstName | LastName |
---|---|
John | Doe |
Jane | Smith |
Jim | Brown |
Jake | White |
John | Doe |
Jane | Smith |
Susan | Green |
Tom | Black |
- Description: Returns only the rows that are common between the result sets of two queries.
- Usage: When you want to find common rows between multiple queries.
SELECT FirstName, LastName FROM Employees
INTERSECT
SELECT FirstName, LastName FROM Managers;
Result: The output would be a list of FirstName
and LastName
values that appear in both tables:
FirstName | LastName |
---|---|
John | Doe |
Jane | Smith |
- Description: Returns rows from the first query that do not appear in the result set of the second query.
- Usage: When you want to find rows that exist in one query but not in the other.
SELECT FirstName, LastName FROM Employees
EXCEPT
SELECT FirstName, LastName FROM Managers;
Result: The output would be a list of FirstName
and LastName
values from the Employees
table that are not in the Managers
table:
FirstName | LastName |
---|---|
Jim | Brown |
Jake | White |
- INTERSECT: Returns only the common values between two queries based on the selected columns.
- INNER JOIN: Joins two tables based on a specific condition, allowing you to return additional columns that are not part of the intersection.
SELECT Employees.FirstName, Employees.LastName
FROM Employees
INNER JOIN Managers ON Employees.FirstName = Managers.FirstName AND Employees.LastName = Managers.LastName;
Result: This would return similar results as INTERSECT
, but could also include additional columns from either table:
FirstName | LastName |
---|---|
John | Doe |
Jane | Smith |
-
Semi-Joins: Returns rows from the first table where one or more matches are found in the second table.
Example: Find all employees who have managed projects.
SELECT DISTINCT EmployeeID FROM Employees e WHERE EXISTS ( SELECT 1 FROM Projects p WHERE p.ManagerID = e.EmployeeID );
-
Anti-Joins: Returns rows from the first table where no matches are found in the second table.
Example: Find all employees who have not managed any projects.
SELECT DISTINCT EmployeeID FROM Employees e WHERE NOT EXISTS ( SELECT 1 FROM Projects p WHERE p.ManagerID = e.EmployeeID );
-
SQL allows multiple conditions in the
WHERE
clause using logical operators likeAND
,OR
, andNOT
.Example: Find employees who work in the "Sales" department and have a salary greater than $50,000.
SELECT * FROM Employees WHERE Department = 'Sales' AND Salary > 50000;
-
A semi-join checks for the existence of a relationship between two tables, returning rows from the first table where a corresponding match is found in the second table.
Example: List all departments that have employees.
SELECT DepartmentName FROM Departments d WHERE EXISTS ( SELECT 1 FROM Employees e WHERE e.DepartmentID = d.DepartmentID );
-
Anti-joins are useful in diagnosing issues like finding records in one table that do not have corresponding records in another.
Example: Find all orders that have not been shipped.
SELECT OrderID FROM Orders o WHERE NOT EXISTS ( SELECT 1 FROM Shipments s WHERE s.OrderID = o.OrderID );
-
Subqueries within the
WHERE
clause filter results based on another query’s output.Example in WHERE: List employees whose salary is greater than the average salary in their department.
SELECT EmployeeID, FirstName, LastName FROM Employees e WHERE Salary > ( SELECT AVG(Salary) FROM Employees WHERE DepartmentID = e.DepartmentID );
-
Subqueries in the
SELECT
clause allow calculating additional information related to each row.Example in SELECT: Show the total number of employees in each department.
SELECT DepartmentName, (SELECT COUNT(*) FROM Employees e WHERE e.DepartmentID = d.DepartmentID) AS EmployeeCount FROM Departments d;
-
A subquery within the
WHERE
clause enables dynamic filtering based on another query's result.Example: Retrieve all employees who have the highest salary in their department.
SELECT EmployeeID, FirstName, LastName FROM Employees e WHERE Salary = ( SELECT MAX(Salary) FROM Employees WHERE DepartmentID = e.DepartmentID );
-
A subquery in the
SELECT
clause calculates additional metrics related to the main query.Example: Display each employee’s name along with the number of projects they have managed.
SELECT FirstName, LastName, (SELECT COUNT(*) FROM Projects p WHERE p.ManagerID = e.EmployeeID) AS ProjectCount FROM Employees e;
-
A subquery in the
FROM
clause acts like a temporary table that can be joined with other tables.Example: Find the average salary by department and compare it against the company’s average salary.
SELECT DepartmentID, AVG(Salary) AS AvgDeptSalary FROM (SELECT DepartmentID, Salary FROM Employees) AS DeptSalaries GROUP BY DepartmentID;
-
Combining multiple subqueries in the
FROM
clause for complex operations.Example: Combine average salary calculations with other metrics in the same query.
SELECT d.DepartmentName, ds.AvgDeptSalary, COUNT(e.EmployeeID) AS EmployeeCount FROM Departments d JOIN (SELECT DepartmentID, AVG(Salary) AS AvgDeptSalary FROM Employees GROUP BY DepartmentID) ds ON d.DepartmentID = ds.DepartmentID JOIN Employees e ON e.DepartmentID = d.DepartmentID GROUP BY d.DepartmentName, ds.AvgDeptSalary;
Example:
We have two tables: "Employees" and "Projects". We want to extract employees who managed projects.
Employees Table:
EmployeeID | Name | DepartmentID |
---|---|---|
1 | Alice | 10 |
2 | Bob | 20 |
3 | Charlie | 10 |
4 | Dave | 30 |
Projects Table:
ProjectID | Name | ManagerID |
---|---|---|
101 | Project A | 1 |
102 | Project B | 3 |
Query:
SELECT DISTINCT EmployeeID, Name
FROM Employees e
WHERE EXISTS (
SELECT 1
FROM Projects p
WHERE p.ManagerID = e.EmployeeID
);
Result:
EmployeeID | Name |
---|---|
1 | Alice |
3 | Charlie |
Example:
We want to extract employees who did not manage any projects.
Query:
SELECT DISTINCT EmployeeID, Name
FROM Employees e
WHERE NOT EXISTS (
SELECT 1
FROM Projects p
WHERE p.ManagerID = e.EmployeeID
);
Result:
EmployeeID | Name |
---|---|
2 | Bob |
4 | Dave |
Example:
We have an "Employees" table and want to extract employees who work in the "Sales" department and have a salary greater than 50,000.
Employees Table:
EmployeeID | Name | Department | Salary |
---|---|---|---|
1 | Alice | Sales | 60000 |
2 | Bob | HR | 45000 |
3 | Charlie | Sales | 55000 |
4 | Dave | Sales | 48000 |
Query:
SELECT *
FROM Employees
WHERE Department = 'Sales'
AND Salary > 50000;
Result:
EmployeeID | Name | Department | Salary |
---|---|---|---|
1 | Alice | Sales | 60000 |
3 | Charlie | Sales | 55000 |
Example:
We want to extract all departments that have employees.
Departments Table:
DepartmentID | DepartmentName |
---|---|
10 | Sales |
20 | HR |
30 | IT |
Employees Table:
EmployeeID | Name | DepartmentID |
---|---|---|
1 | Alice | 10 |
2 | Bob | 20 |
3 | Charlie | 10 |
Query:
SELECT DepartmentName
FROM Departments d
WHERE EXISTS (
SELECT 1
FROM Employees e
WHERE e.DepartmentID = d.DepartmentID
);
Result:
DepartmentName |
---|
Sales |
HR |
Example:
We have an "Orders" table and a "Shipments" table. We want to extract orders that have not been shipped yet.
Orders Table:
OrderID | CustomerID | OrderDate |
---|---|---|
1 | 101 | 2024-08-01 |
2 | 102 | 2024-08-02 |
3 | 103 | 2024-08-03 |
Shipments Table:
ShipmentID | OrderID | ShipmentDate |
---|---|---|
201 | 1 | 2024-08-05 |
202 | 2 | 2024-08-06 |
Query:
SELECT OrderID
FROM Orders o
WHERE NOT EXISTS (
SELECT 1
FROM Shipments s
WHERE s.OrderID = o.OrderID
);
Result:
OrderID |
---|
3 |
Example:
We want to extract employees whose salary is higher than the average salary in their department.
Employees Table:
EmployeeID | Name | DepartmentID | Salary |
---|---|---|---|
1 | Alice | 10 | 60000 |
2 | Bob | 20 | 50000 |
3 | Charlie | 10 | 55000 |
4 | Dave | 20 | 47000 |
Query:
SELECT EmployeeID, Name
FROM Employees e
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
WHERE DepartmentID = e.DepartmentID
);
Result:
EmployeeID | Name |
---|---|
1 | Alice |
2 | Bob |
Example:
We want to display the number of employees in each department.
Departments Table:
DepartmentID | DepartmentName |
---|---|
10 | Sales |
20 | HR |
Query:
SELECT DepartmentName,
(SELECT COUNT(*)
FROM Employees e
WHERE e.DepartmentID = d.DepartmentID) AS EmployeeCount
FROM Departments d;
Result:
DepartmentName | EmployeeCount |
---|---|
Sales | 2 |
HR | 2 |
Example:
We want to extract all employees who have the highest salary in their department.
Employees Table:
EmployeeID | Name | DepartmentID | Salary |
---|---|---|---|
1 | Alice | 10 | 60000 |
2 | Bob | 20 | 50000 |
3 | Charlie | 10 | 55000 |
4 | Dave | 20 | 47000 |
Query:
SELECT EmployeeID, Name
FROM Employees e
WHERE Salary = (
SELECT MAX(Salary)
FROM Employees
WHERE DepartmentID = e.DepartmentID
);
Result:
EmployeeID | Name |
---|---|
1 | Alice |
2 | Bob |
Example:
We want to display each employee's name with the number of projects they managed.
Employees Table:
EmployeeID | Name | DepartmentID |
---|---|---|
1 | Alice | 10 |
2 | Bob | 20 |
3 | Charlie | 10 |
4 | Dave | 20 |
Projects Table:
ProjectID | Name | ManagerID |
---|---|---|
101 | Project A | 1 |
102 | Project B | 3 |
Query:
SELECT Name,
(SELECT COUNT(*)
FROM Projects p
WHERE p.ManagerID = e.EmployeeID) AS ProjectCount
FROM Employees e;
Result:
Name | ProjectCount |
---|---|
Alice | 1 |
Bob | 0 |
Charlie | 1 |
Dave | 0 |
Example:
We want to extract the average salary for each department.
Employees Table:
EmployeeID | Name | DepartmentID | Salary |
---|---|---|---|
1 | Alice | 10 | 60000 |
2 | Bob | 20 | 50000 |
3 | Charlie | 10 | 55000 |
4 | Dave | 20 | 47000 |
Query:
SELECT DepartmentID, AVG(Salary) AS AvgDeptSalary
FROM (SELECT DepartmentID, Salary
FROM Employees) AS DeptSalaries
GROUP BY DepartmentID;
Result:
DepartmentID | AvgDeptSalary |
---|---|
10 | 57500 |
20 | 48500 |
Example:
We want to extract the department with the highest average salary.
Query:
SELECT DepartmentID
FROM (SELECT DepartmentID, AVG(Salary) AS AvgSalary
FROM Employees
GROUP BY DepartmentID) AS DeptAvgSalaries
ORDER BY AvgSalary DESC
LIMIT 1;
Result:
DepartmentID |
---|
10 |
The CASE
statement in SQL allows you to perform conditional logic within your queries, similar to an "if-else" statement in programming languages. It is often used to create new calculated fields or to modify existing values based on certain conditions.
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE result3
END AS new_column_name
You can also use CASE
statements inside aggregate functions like AVG()
, SUM()
, etc., to calculate conditional aggregates.
Suppose you have a Sales
table:
OrderID | Customer | Product | Quantity | Price |
---|---|---|---|---|
1 | Alice | Laptop | 2 | 1000 |
2 | Bob | Smartphone | 5 | 300 |
3 | Alice | Tablet | 1 | 400 |
4 | John | Laptop | 1 | 950 |
You want to create a new column that shows the "Order Type" based on the product:
SELECT
OrderID,
Customer,
Product,
Quantity,
Price,
CASE
WHEN Product = 'Laptop' THEN 'High Value'
WHEN Product = 'Smartphone' THEN 'Medium Value'
ELSE 'Low Value'
END AS OrderType
FROM Sales;
If you want to find the average price for only the "Laptop" orders:
SELECT
AVG(
CASE
WHEN Product = 'Laptop' THEN Price
ELSE NULL
END
) AS AvgLaptopPrice
FROM Sales;
A simple subquery (or nested query) is a query within another query. The subquery is executed once, and its result is used by the outer query. A simple subquery does not reference columns from the outer query.
Example:
Find the customers who have placed an order with the maximum price:
SELECT Customer
FROM Sales
WHERE Price = (SELECT MAX(Price) FROM Sales);
A correlated subquery is a subquery that references a column from the outer query. It is evaluated once for each row processed by the outer query.
Example:
Find all customers who have made more than one purchase:
SELECT DISTINCT Customer
FROM Sales s1
WHERE 1 < (
SELECT COUNT(*)
FROM Sales s2
WHERE s2.Customer = s1.Customer
);
A Common Table Expression (CTE) is a temporary result set that you can reference within a SELECT
, INSERT
, UPDATE
, or DELETE
statement. CTEs make queries more readable and easier to maintain, especially when you have complex queries or multiple layers of nested subqueries.
WITH cte_name AS (
SELECT ...
FROM ...
)
SELECT ...
FROM cte_name;
Suppose we want to find the total sales by each customer:
WITH CustomerSales AS (
SELECT Customer, SUM(Quantity * Price) AS TotalSales
FROM Sales
GROUP BY Customer
)
SELECT *
FROM CustomerSales;
You can define multiple CTEs by separating them with commas:
WITH
ProductSales AS (
SELECT Product, SUM(Quantity) AS TotalQuantity
FROM Sales
GROUP BY Product
),
HighValueOrders AS (
SELECT OrderID, Customer, Product
FROM Sales
WHERE Product = 'Laptop'
)
SELECT p.Product, p.TotalQuantity, h.Customer
FROM ProductSales p
JOIN HighValueOrders h ON p.Product = h.Product;
-
CASE Statements:
- Create derived columns based on conditional logic.
- Conditional aggregation (e.g., counting or summing only rows that meet certain criteria).
-
Subqueries:
- Simple Subqueries: Used to filter results or perform calculations where the results are not dependent on the outer query.
- Correlated Subqueries: Useful when the subquery needs information from the outer query to execute correctly (e.g., row-by-row comparisons).
-
CTEs:
- Breaking down complex queries into more manageable parts.
- Improving query readability and maintainability.
- Recursion (e.g., hierarchical or tree-like data).
Window functions are a powerful feature in SQL that allow you to perform calculations across a set of table rows that are related to the current row. They are used extensively for analytics and reporting.
This function assigns a unique sequential integer to rows within a partition of a result set, starting at 1.
-
Usage: Useful for generating row numbers or rankings.
-
Syntax:
ROW_NUMBER() OVER (PARTITION BY column ORDER BY column)
-
Example: Given a table
sales
:SaleID Employee Amount 1 John 500 2 Jane 700 3 John 400 4 Jane 600 We want to assign row numbers to each employee’s sales:
SELECT SaleID, Employee, Amount, ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Amount DESC) AS RowNum FROM sales;
Result:
SaleID Employee Amount RowNum 2 Jane 700 1 4 Jane 600 2 1 John 500 1 3 John 400 2
This function provides access to a row at a specified physical offset before the current row within a result set.
-
Usage: Useful for comparing a value with a previous row’s value.
-
Syntax:
LAG(column, offset, default_value) OVER (PARTITION BY column ORDER BY column)
-
Example: Continuing with the
sales
table:SELECT SaleID, Employee, Amount, LAG(Amount, 1, 0) OVER (PARTITION BY Employee ORDER BY Amount DESC) AS PreviousSale FROM sales;
Result:
SaleID Employee Amount PreviousSale 2 Jane 700 0 4 Jane 600 700 1 John 500 0 3 John 400 500
This function provides access to a row at a specified physical offset after the current row within a result set.
-
Usage: Useful for comparing a value with a following row’s value.
-
Syntax:
LEAD(column, offset, default_value) OVER (PARTITION BY column ORDER BY column)
-
Example: Continuing with the
sales
table:SELECT SaleID, Employee, Amount, LEAD(Amount, 1, 0) OVER (PARTITION BY Employee ORDER BY Amount DESC) AS NextSale FROM sales;
Result:
SaleID Employee Amount NextSale 2 Jane 700 600 4 Jane 600 0 1 John 500 400 3 John 400 0
This function returns the first value in an ordered set of values.
-
Usage: Useful when you need to retrieve the earliest value in a set.
-
Syntax:
FIRST_VALUE(column) OVER (PARTITION BY column ORDER BY column)
-
Example:
SELECT SaleID, Employee, Amount, FIRST_VALUE(Amount) OVER (PARTITION BY Employee ORDER BY Amount DESC) AS FirstSale FROM sales;
Result:
SaleID Employee Amount FirstSale 2 Jane 700 700 4 Jane 600 700 1 John 500 500 3 John 400 500
This function returns the last value in an ordered set of values.
-
Usage: Useful for retrieving the most recent value in a set.
-
Syntax:
LAST_VALUE(column) OVER (PARTITION BY column ORDER BY column)
-
Example:
SELECT SaleID, Employee, Amount, LAST_VALUE(Amount) OVER (PARTITION BY Employee ORDER BY Amount ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastSale FROM sales;
Result:
SaleID Employee Amount LastSale 2 Jane 700 600 4 Jane 600 600 1 John 500 400 3 John 400 400
This function assigns a rank to each row within a partition of a result set. The rank starts at 1 and continues based on the ordering, but it leaves gaps in rank when there are ties.
-
Usage: Useful for ranking data with gaps for tied values.
-
Syntax:
RANK() OVER (PARTITION BY column ORDER BY column)
-
Example:
SELECT SaleID, Employee, Amount, RANK() OVER (PARTITION BY Employee ORDER BY Amount DESC) AS Rank FROM sales;
Result:
SaleID Employee Amount Rank 2 Jane 700 1 4 Jane 600 2 1 John 500 1 3 John 400 2
Similar to RANK()
, but without gaps in the rank values.
-
Usage: Useful for ranking data without gaps for tied values.
-
Syntax:
DENSE_RANK() OVER (PARTITION BY column ORDER BY column)
-
Example:
SELECT SaleID, Employee, Amount, DENSE_RANK() OVER (PARTITION BY Employee ORDER BY Amount DESC) AS DenseRank FROM sales;
Result:
SaleID Employee Amount DenseRank 2 Jane 700 1 4 Jane 600 2 1 John 500 1 3 John 400 2
This function distributes rows of an ordered partition into a specified number of roughly equal groups.
-
Usage: Useful for dividing data into buckets (e.g., quartiles).
-
Syntax:
NTILE(num_buckets) OVER (PARTITION BY column ORDER BY column)
-
Example:
SELECT SaleID, Employee, Amount, NTILE(2) OVER (PARTITION BY Employee ORDER BY Amount DESC) AS NTile FROM sales;
Result:
SaleID Employee Amount NTile 2 Jane 700 1 4 Jane 600 2 1 John 500 1 3 John 400 2
These aggregate functions can also be used with window functions to calculate cumulative totals or maximums over a window of data.
-
Usage: Useful for cumulative totals or tracking maximum values over a series of rows.
-
Example:
SELECT SaleID, Employee, Amount, SUM(Amount) OVER (PARTITION BY Employee ORDER BY SaleID) AS CumulativeSum, MAX(Amount) OVER (PARTITION BY Employee ORDER BY SaleID) AS MaxAmount FROM sales;
Result:
SaleID Employee Amount CumulativeSum MaxAmount 1 John 500 500 500 3 John 400 900 500 2 Jane 700 700 700 4 Jane 600 1300 700
RANGE and ROWS are used to define a frame of rows in the result set to which the window function is applied.
-
RANGE BETWEEN: Defines a window based on the value range.
-
ROWS BETWEEN: Defines a window based on the number of rows.
-
Example:
SELECT SaleID, Employee, Amount, SUM(Amount) OVER (ORDER BY SaleID ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS RowSum, SUM(Amount) OVER (ORDER BY SaleID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RangeSum FROM sales;
Result:
SaleID Employee Amount RowSum RangeSum 1 John 500 500 500 3 John 400 900 900 2 Jane 700 700 700 4 Jane 600 1300 1300
In this example, RowSum
considers only the previous row and the current row, while RangeSum
considers all rows from the start to the current row.