SQL Learning
SQL AVG Function
SQL Learning
SQL AVG Function
Learn how to calculate average values (mean) using SQL's AVG aggregate function to discover central tendencies and insights in your numeric data.
Understanding the SQL AVG Function
The AVG
function is a core SQL aggregate function used to calculate the average (arithmetic mean) value of a set of numbers. It sums the values in a specified numeric column and divides by the count of non-NULL values, providing a crucial measure of central tendency for your data.
Basic Syntax
SELECT AVG(numeric_column_name)
FROM table_name
[WHERE condition];
Like SUM
, AVG
operates exclusively on columns with numeric data types (e.g., INTEGER, FLOAT, DECIMAL).
Common SQL AVG Interview Questions
- Explain precisely how the
AVG
function handles NULL values. - What is the difference between the mean (calculated by
AVG
) and the median? When might median be preferred? - How can you calculate a weighted average in SQL (since
AVG
provides an unweighted mean)? - Provide a real-world example where using
AVG
withGROUP BY
would be insightful. - How would you find rows with values above or below the overall average?
AVG Function Basics
Simple Average Calculation
AVG
calculates the mean by summing the non-NULL values in the column and dividing by the count of those non-NULL values.
-- Calculate the average view count across all TokTuk videos
SELECT AVG(views) AS average_platform_views
FROM Videos;
Result: A single row with the average value of the views
column. If views were [100, 200, NULL, 300], the average would be (100 + 200 + 300) / 3 = 200.
AVG and NULL Value Handling
This is a critical point: AVG
ignores NULL values entirely. It does not treat them as zero or include them in the count when calculating the average.
-- Demonstrating NULL handling
-- Assume a table with values: 10, 20, NULL
SELECT
SUM(value) AS total_sum, -- Result: 30 (10 + 20)
COUNT(value) AS non_null_count, -- Result: 2
AVG(value) AS average_val -- Result: 15 (30 / 2)
FROM SampleData;
If you need to treat NULLs as 0 for the average calculation, you must explicitly use AVG(COALESCE(column_name, 0))
. This would change the average in the example above to (10 + 20 + 0) / 3 = 10.
AVG with DISTINCT
You can use AVG(DISTINCT column_name)
to calculate the average of only the unique non-NULL values in a column.
-- Calculate the average of distinct view counts
SELECT AVG(DISTINCT views) AS avg_of_distinct_views
FROM Videos;
If view counts were [100, 200, 100, 300, NULL], this calculates the average of [100, 200, 300], resulting in (100 + 200 + 300) / 3 = 200.
Formatting AVG Results (ROUND)
Since AVG
often produces results with decimal places (even when averaging integers, depending on the database), using ROUND
, CAST
, or specific formatting functions is common for presentation.
-- Calculate average views, rounded to the nearest whole number
SELECT ROUND(AVG(views)) AS rounded_average_views
FROM Videos;
-- Calculate average views, formatted to 2 decimal places
SELECT ROUND(AVG(views), 2) AS formatted_average_views
FROM Videos;
Mean (AVG) vs. Median
- Mean (
AVG
): Sum of values divided by the count of values. Sensitive to outliers (extreme values). - Median: The middle value when data is sorted. Less sensitive to outliers.
SQL databases typically provide AVG
directly. Calculating the median usually requires window functions like PERCENTILE_CONT
or NTILE
, or more complex self-joins, depending on the specific SQL dialect. For skewed distributions (like income or view counts), the median often gives a better sense of the "typical" value than the mean.
Using AVG with GROUP BY
AVG
becomes particularly insightful when used with GROUP BY
to compute the average value for different segments or categories within your data.
Calculating Averages per Category
Compute the average value for each distinct group.
-- Calculate the average view count per user
SELECT
user_id,
AVG(views) AS average_views_per_user
FROM Videos
GROUP BY user_id -- Group rows by creator
ORDER BY average_views_per_user DESC;
This identifies users whose videos, on average, perform better or worse than others.
Combining AVG with Other Aggregates
Provide richer context by including counts or sums alongside the average.
-- Show username, video count, and average views per user
SELECT
u.username,
COUNT(v.video_id) AS number_of_videos,
AVG(COALESCE(v.views, 0)) AS average_views -- Treat NULL views as 0 if desired
FROM Users u
LEFT JOIN Videos v ON u.user_id = v.user_id
GROUP BY u.user_id, u.username
ORDER BY average_views DESC;
Seeing the video count alongside the average helps interpret the result (e.g., a high average based on only one video is less significant than a high average across many videos).
Filtering Groups Based on AVG (HAVING Clause)
Use the HAVING
clause to select only those groups whose average meets a certain criterion.
-- Find users whose average video views exceed 5000
SELECT
u.username,
AVG(v.views) AS average_views
FROM Users u
JOIN Videos v ON u.user_id = v.user_id
GROUP BY u.user_id, u.username
HAVING AVG(v.views) > 5000 -- Filter groups based on the calculated average
ORDER BY average_views DESC;
HAVING
allows filtering based on the result of the AVG()
aggregate function after the groups have been formed.
Summary
The AVG
function provides the arithmetic mean, a key measure of central tendency in SQL:
- It calculates the sum of non-NULL numeric values and divides by the count of non-NULL values.
- Crucially, it ignores NULLs. Use
COALESCE(column, 0)
if you need to include NULLs as zeros in the average calculation. - Use with
WHERE
to calculate averages for specific data subsets. - Combine with
GROUP BY
to find average values for different categories. - Use with
HAVING
to filter groups based on their calculated average. - Often used with
ROUND
or formatting functions for clearer presentation. - Remember it calculates the mean, which can be skewed by outliers; the median might sometimes be more appropriate but requires different techniques.
AVG
is indispensable for understanding typical values, comparing group performance, and identifying trends within your numeric data.