Question: How to return the 2 largest cities for each country?
Assume we have the following table definition and data:
CREATE TABLE cities ( city VARCHAR(80), country VARCHAR(80), population INT ); INSERT INTO cities VALUES ('New York', 'United States', 8175133); INSERT INTO cities VALUES ('Los Angeles', 'United States', 3792621); INSERT INTO cities VALUES ('Chicago', 'United States', 2695598); INSERT INTO cities VALUES ('Paris', 'France', 2181000); INSERT INTO cities VALUES ('Marseille', 'France', 808000); INSERT INTO cities VALUES ('Lyon', 'France', 422000); INSERT INTO cities VALUES ('London', 'United Kingdom', 7825300); INSERT INTO cities VALUES ('Birmingham', 'United Kingdom', 1016800); INSERT INTO cities VALUES ('Leeds', 'United Kingdom', 770800);
To get the 2 largest cities for each country, you can use the following query in MySQL:
SELECT city, country, population FROM (SELECT city, country, population, @country_rank := IF(@current_country = country, @country_rank + 1, 1) AS country_rank, @current_country := country FROM cities ORDER BY country, population DESC ) ranked WHERE country_rank <= 2;
The query returns:
city | country | population |
Paris | France | 2181000 |
Marseille | France | 808000 |
London | United Kingdom | 7825300 |
Birmingham | United Kingdom | 1016800 |
New York | United States | 8175133 |
Los Angeles | United States | 3792621 |
Explanation of the MySQL query:
Currently MySQL does not support ROW_NUMBER() function that can assign a sequence number within a group, but as a workaround we can use MySQL session variables.
These variables do not require declaration, and can be used in a query to do calculations and to store intermediate results.
@current_country := country |
This code is executed for each row and stores the value of country column to @current_country variable.
@country_rank := IF(@current_country = country, @country_rank + 1, 1) |
In this code, if @current_country is the same we increment rank, otherwise set it to 1. For the first row @current_country is NULL, so rank is also set to 1.
For correct ranking, we need to have ORDER BY country, population DESC
So if we just execute the subquery:
SELECT city, country, population, @country_rank := IF(@current_country = country, @country_rank + 1, 1) AS country_rank, @current_country := country FROM cities ORDER BY country, population DESC
We get the list of cities ranked by population within the country:
city | country | population | country_rank | current_country |
Paris | France | 2181000 | 1 | France |
Marseille | France | 808000 | 2 | France |
Lyon | France | 422000 | 3 | France |
London | United Kingdom | 7825300 | 1 | United Kingdom |
Birmingham | United Kingdom | 1016800 | 2 | United Kingdom |
Leeds | United Kingdom | 770800 | 3 | United Kingdom |
New York | United States | 8175133 | 1 | United States |
Los Angeles | United States | 3792621 | 2 | United States |
Chicago | United States | 2695598 | 3 | United States |
When we have a rank assigned to each city within its country, we can retrieve the required range:
-- Get top 2 for each country SELECT city, country, population FROM (/*subquery above*/) ranked WHERE country_rank <= 2; -- Get the city with 3rd population for each country SELECT city, country, population FROM (/*subquery above*/) ranked WHERE country_rank = 3;
In Oracle, SQL Server and PostgreSQL you can achieve the same functionality using ROW_NUMBER function:
SELECT city, country, population FROM (SELECT city, country, population, ROW_NUMBER() OVER (PARTITION BY country ORDER BY population DESC) as country_rank FROM cities) ranked WHERE country_rank <= 2;
This query works in Oracle, SQL Server and PostgreSQL without any changes and returns:
city | country | population |
Paris | France | 2181000 |
Marseille | France | 808000 |
London | United Kingdom | 7825300 |
Birmingham | United Kingdom | 1016800 |
New York | United States | 8175133 |
Los Angeles | United States | 3792621 |