LogoInterview Master

SQL Learning

SQL AVG Function

Understanding the SQL AVG Function

The AVG function is a core SQL aggregate function used to calculate the average (arithmetic mean) value of a set of numbers. It sums the values in a specified numeric column and divides by the count of non-NULL values, providing a crucial measure of central tendency for your data.

Basic Syntax

SELECT AVG(numeric_column_name)
FROM table_name
[WHERE condition];

Like SUM, AVG operates exclusively on columns with numeric data types (e.g., INTEGER, FLOAT, DECIMAL).

Common SQL AVG Interview Questions

  • Explain precisely how the AVG function handles NULL values.
  • What is the difference between the mean (calculated by AVG) and the median? When might median be preferred?
  • How can you calculate a weighted average in SQL (since AVG provides an unweighted mean)?
  • Provide a real-world example where using AVG with GROUP BY would be insightful.
  • How would you find rows with values above or below the overall average?

AVG Function Basics

Simple Average Calculation

AVG calculates the mean by summing the non-NULL values in the column and dividing by the count of those non-NULL values.

-- Calculate the average view count across all TokTuk videos
SELECT AVG(views) AS average_platform_views
FROM Videos;

Result: A single row with the average value of the views column. If views were [100, 200, NULL, 300], the average would be (100 + 200 + 300) / 3 = 200.

AVG and NULL Value Handling

This is a critical point: AVG ignores NULL values entirely. It does not treat them as zero or include them in the count when calculating the average.

-- Demonstrating NULL handling
-- Assume a table with values: 10, 20, NULL
SELECT
  SUM(value) AS total_sum,   -- Result: 30 (10 + 20)
  COUNT(value) AS non_null_count, -- Result: 2
  AVG(value) AS average_val   -- Result: 15 (30 / 2)
FROM SampleData;

If you need to treat NULLs as 0 for the average calculation, you must explicitly use AVG(COALESCE(column_name, 0)). This would change the average in the example above to (10 + 20 + 0) / 3 = 10.

AVG with DISTINCT

You can use AVG(DISTINCT column_name) to calculate the average of only the unique non-NULL values in a column.

-- Calculate the average of distinct view counts
SELECT AVG(DISTINCT views) AS avg_of_distinct_views
FROM Videos;

If view counts were [100, 200, 100, 300, NULL], this calculates the average of [100, 200, 300], resulting in (100 + 200 + 300) / 3 = 200.

Formatting AVG Results (ROUND)

Since AVG often produces results with decimal places (even when averaging integers, depending on the database), using ROUND, CAST, or specific formatting functions is common for presentation.

-- Calculate average views, rounded to the nearest whole number
SELECT ROUND(AVG(views)) AS rounded_average_views
FROM Videos;

-- Calculate average views, formatted to 2 decimal places
SELECT ROUND(AVG(views), 2) AS formatted_average_views
FROM Videos;

Mean (AVG) vs. Median

  • Mean (AVG): Sum of values divided by the count of values. Sensitive to outliers (extreme values).
  • Median: The middle value when data is sorted. Less sensitive to outliers.

SQL databases typically provide AVG directly. Calculating the median usually requires window functions like PERCENTILE_CONT or NTILE, or more complex self-joins, depending on the specific SQL dialect. For skewed distributions (like income or view counts), the median often gives a better sense of the "typical" value than the mean.

Using AVG with GROUP BY

AVG becomes particularly insightful when used with GROUP BY to compute the average value for different segments or categories within your data.

Calculating Averages per Category

Compute the average value for each distinct group.

-- Calculate the average view count per user
SELECT
  user_id,
  AVG(views) AS average_views_per_user
FROM Videos
GROUP BY user_id -- Group rows by creator
ORDER BY average_views_per_user DESC;

This identifies users whose videos, on average, perform better or worse than others.

Combining AVG with Other Aggregates

Provide richer context by including counts or sums alongside the average.

-- Show username, video count, and average views per user
SELECT
  u.username,
  COUNT(v.video_id) AS number_of_videos,
  AVG(COALESCE(v.views, 0)) AS average_views -- Treat NULL views as 0 if desired
FROM Users u
LEFT JOIN Videos v ON u.user_id = v.user_id
GROUP BY u.user_id, u.username
ORDER BY average_views DESC;

Seeing the video count alongside the average helps interpret the result (e.g., a high average based on only one video is less significant than a high average across many videos).

Filtering Groups Based on AVG (HAVING Clause)

Use the HAVING clause to select only those groups whose average meets a certain criterion.

-- Find users whose average video views exceed 5000
SELECT
  u.username,
  AVG(v.views) AS average_views
FROM Users u
JOIN Videos v ON u.user_id = v.user_id
GROUP BY u.user_id, u.username
HAVING AVG(v.views) > 5000 -- Filter groups based on the calculated average
ORDER BY average_views DESC;

HAVING allows filtering based on the result of the AVG() aggregate function after the groups have been formed.

Summary

The AVG function provides the arithmetic mean, a key measure of central tendency in SQL:

  • It calculates the sum of non-NULL numeric values and divides by the count of non-NULL values.
  • Crucially, it ignores NULLs. Use COALESCE(column, 0) if you need to include NULLs as zeros in the average calculation.
  • Use with WHERE to calculate averages for specific data subsets.
  • Combine with GROUP BY to find average values for different categories.
  • Use with HAVING to filter groups based on their calculated average.
  • Often used with ROUND or formatting functions for clearer presentation.
  • Remember it calculates the mean, which can be skewed by outliers; the median might sometimes be more appropriate but requires different techniques.

AVG is indispensable for understanding typical values, comparing group performance, and identifying trends within your numeric data.

Loading SQL editor...