LogoInterview Master

SQL Learning

SQL HAVING Clause

Understanding the SQL HAVING Clause

The HAVING clause in SQL is specifically designed to filter groups based on a specified condition, typically involving an aggregate function. It works in tandem with the GROUP BY clause, allowing you to apply criteria *after* rows have been grouped and aggregate calculations (like COUNT, SUM, AVG, MIN, MAX) have been performed.

Think of it this way: WHERE filters individual rows before they enter the grouping process, while HAVING filters the final summary rows produced by GROUP BY.

Basic Syntax

SELECT column_name(s), AGGREGATE_FUNCTION(column_name)
FROM table_name
[WHERE condition]       -- Filters rows BEFORE grouping
GROUP BY column_name(s)
HAVING condition_involving_aggregate_function -- Filters groups AFTER grouping
[ORDER BY column_name(s)];

Key HAVING Concepts & Interview Questions

  • What is the fundamental difference between the WHERE clause and the HAVING clause? When must you use HAVING?
  • Can HAVING be used in a query without a GROUP BY clause? (Technically yes in some DBs, acting like WHERE, but it's unconventional and confusing).
  • Provide an example where you would filter groups based on their COUNT or SUM.
  • How does the logical processing order of SQL clauses (FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY) explain the difference between WHERE and HAVING?
  • Can you use columns that are *not* aggregated and *not* in the GROUP BY clause within the HAVING condition? (Generally no, similar to the SELECT list rule).

HAVING vs. WHERE: The Crucial Difference

WHERE filters ROWS, HAVING filters GROUPS

  • WHERE Clause: Operates on individual rows. Filters data before any grouping or aggregation occurs. You cannot use aggregate functions (like COUNT(*), SUM(views)) directly in the WHERE clause because these functions operate on groups, which haven't been formed yet.
  • HAVING Clause: Operates on the summary rows created by GROUP BY. Filters data after grouping and aggregation have occurred. Its primary purpose is to filter based on the results of aggregate functions.

Example: Finding Frequently Interacting Users

Let's find users who have made more than 5 interactions (likes, comments, etc.).

-- Correct use of HAVING
SELECT
  user_id,
  COUNT(*) AS interaction_count
FROM Interactions
GROUP BY user_id
HAVING COUNT(*) > 5 -- Filter groups based on the aggregated count
ORDER BY interaction_count DESC;

-- Incorrect use - WHERE cannot use aggregates
-- SELECT user_id, COUNT(*) FROM Interactions WHERE COUNT(*) > 5 GROUP BY user_id; -- ERROR!

We must use HAVING COUNT(*) > 5 because the condition depends on the result of the COUNT(*) aggregate function, which is only available after the GROUP BY operation.

Using Both WHERE and HAVING

Combine both clauses for powerful, multi-stage filtering.

-- Find users from 'Canada' who created at least 2 videos in 2023
SELECT
  v.user_id,
  COUNT(v.video_id) AS count_2023_videos
FROM Videos v
JOIN Users u ON v.user_id = u.user_id
WHERE u.country = 'Canada' -- Filter for Canadian users first (BEFORE grouping)
  AND STRFTIME('%Y', v.upload_date) = '2023' -- Filter for 2023 videos first (BEFORE grouping)
GROUP BY v.user_id
HAVING COUNT(v.video_id) >= 2 -- Filter groups based on the count (AFTER grouping)
ORDER BY count_2023_videos DESC;

Query Flow:

  1. Join Videos and Users.
  2. WHERE keeps only rows for Canadian users and 2023 videos.
  3. GROUP BY groups the remaining rows by user_id.
  4. COUNT calculates the number of videos in each user group.
  5. HAVING keeps only those user groups where the count is 2 or more.
  6. SELECT returns the user ID and their count.
  7. ORDER BY sorts the final result.

Practical HAVING Examples with TokTuk

Finding Videos with High Average Views

Identify videos whose average interaction timestamp is recent (indicating ongoing engagement).

-- Find videos whose average interaction time is within the last 7 days
SELECT
  video_id,
  COUNT(*) AS interaction_count,
  MAX(timestamp) AS last_interaction -- Using MAX to find the latest timestamp
FROM Interactions
GROUP BY video_id
HAVING MAX(timestamp) >= DATE('now', '-7 days') -- Filter groups where the latest interaction is recent
ORDER BY last_interaction DESC;

This query helps find content that is still actively being engaged with. We group interactions by video and use HAVING to filter for videos where the maximum (latest) interaction timestamp falls within the last week.

Identifying Users with Significant Total Views

-- Find users whose total video views exceed 10,000
SELECT
  v.user_id,
  u.username,
  SUM(v.views) AS total_user_views
FROM Videos v
JOIN Users u ON v.user_id = u.user_id
GROUP BY v.user_id, u.username
HAVING SUM(v.views) > 10000 -- Filter based on the sum of views for the user
ORDER BY total_user_views DESC;

This identifies the platform's "power creators" based on their cumulative view count.

Finding Categories (e.g., Users) Meeting Multiple Criteria

Apply multiple aggregate conditions in the HAVING clause.

-- Find users who have uploaded at least 3 videos AND have an average view count > 1000
SELECT
  v.user_id,
  u.username,
  COUNT(v.video_id) AS video_count,
  AVG(v.views) AS average_views
FROM Videos v
JOIN Users u ON v.user_id = u.user_id
GROUP BY v.user_id, u.username
HAVING COUNT(v.video_id) >= 3 -- Condition 1 on COUNT
   AND AVG(v.views) > 1000    -- Condition 2 on AVG
ORDER BY average_views DESC;

This finds users who are both reasonably prolific (at least 3 videos) and consistently popular (average views > 1000).

Advanced HAVING Scenarios

Filtering Based on Ratios or Expressions of Aggregates

Conditions in HAVING can involve calculations based on multiple aggregate functions.

-- Find videos where the number of likes is more than double the number of comments
SELECT
  video_id,
  COUNT(CASE WHEN interaction_type = 'like' THEN 1 END) AS likes,
  COUNT(CASE WHEN interaction_type = 'comment' THEN 1 END) AS comments
FROM Interactions
GROUP BY video_id
HAVING COUNT(CASE WHEN interaction_type = 'like' THEN 1 END) > 2 * COUNT(CASE WHEN interaction_type = 'comment' THEN 1 END)
   AND COUNT(CASE WHEN interaction_type = 'comment' THEN 1 END) > 0 -- Avoid division by zero issues implicitly
ORDER BY likes DESC;

This uses conditional aggregation (COUNT with CASE) and then filters groups based on the ratio between the calculated like and comment counts.

Using Subqueries within HAVING (Less Common, Use with Care)

While possible, comparing an aggregate result to a value derived from a subquery within HAVING can sometimes be complex or less efficient than using CTEs or joining pre-aggregated results.

-- Find users whose average video views exceed the overall platform average
SELECT
  user_id,
  AVG(views) AS user_avg_views
FROM Videos
GROUP BY user_id
HAVING AVG(views) > (SELECT AVG(views) FROM Videos) -- Subquery in HAVING
ORDER BY user_avg_views DESC;

This compares each user's average view count against the global average calculated in a subquery. While valid, alternative approaches using window functions or CTEs might be preferred for readability or performance in some databases.

Summary

The HAVING clause is the designated tool in SQL for filtering based on the results of aggregate functions applied to groups:

  • It filters groups after the GROUP BY clause has formed them and aggregate functions have been computed.
  • It contrasts with the WHERE clause, which filters individual rows before grouping.
  • Conditions in HAVING typically involve aggregate functions like COUNT, SUM, AVG, MIN, MAX.
  • You can use multiple conditions combined with AND or OR within a single HAVING clause.
  • It allows you to answer questions like "Which categories have a total sum greater than X?" or "Which groups have an average value below Y?".

Correctly understanding and applying the distinction between WHERE and HAVING is crucial for writing accurate and efficient aggregate queries in SQL, and it's a common point of discussion in technical interviews.

Loading SQL editor...