Skip to main content

4.2.2 Aggregate Functions

Aggregate functions perform a calculation on a set of values and return a single value. Common aggregate functions include:

  • COUNT(): Counts the number of rows in each group.
  • SUM(): Adds up the values in a specific column for each group.
  • AVG(): Calculates the average of values in a specific column for each group.
  • MAX() and MIN(): Determine the highest and lowest values in a specific column for each group, respectively.
  • COUNT(DISTINCT): Counts the number of distinct (unique) values in a column for each group.

SQL aggregations operate similarly to Salesforce, so technically you already have built and deployed aggregate functions if you have ever used Salesforce summary reports or built Salesforce Rollup fields.

  1. To add your first aggregation field, add the following SQL COUNT aggregation function. COUNT(*) after COUNTRY in your SELECT statement as shown below:
SELECT "COUNTRY",COUNT(*) FROM SFDC.STAGE.LEAD

 GROUP BY "COUNTRY";

If you leave the COUNT function without an argument within the parenthesis, it will return an error. The wild card “*” within the COUNT(*) tells the count function to count all records in the query. Do you see anything wrong with the column names in the query result result?

ANSWER

It is missing a human readable column name. When you use aggregate functions you need to declare a field name alias because you are creating this as a new column in your query result. Let’s make the correction to our query and work on the next aggregation:

SELECT "COUNTRY",COUNT(*) as "Lead Count"

FROM SFDC.STAGE.LEAD

GROUP BY "COUNTRY";
  1. Count the number of statuses per country. Add another COUNT() function to count the number of “STATE” records.

COUNT(“STATE”)

Now, the result illustrates that there are states present for the USA but not for other countries. However, my intent was not to merely count the number of records with and without a state. 

4.2.1 Grouping and Summarizing

In SQL, the GROUP BY clause and aggregate functions are fundamental tools for organizing and summarizing data. Understanding these concepts allows you to extract meaningful insights from your data sets.

GROUP BY Clause: The GROUP BY clause in SQL is used to group rows that have the same values in specified columns into summary rows. It’s often used with aggregate functions to perform calculations on each group of data. For example, you can use GROUP BY to gather all records from the same country and then use an aggregate function to count the number of records in each country.

To continue to draw reference to Salesforce, we can look to the report group feature. We are going to rebuild this same view in SQL.

Lab 5: Grouping and Summarizing Data

  1. Open a worksheet in Snowflake and Name it “Lab 6- Aggregating Data”.
  2. Enter the following SELECT statement:
SELECT "COUNTRY" FROM SFDC.STAGE.LEAD;
  1. Click the Run button.

As you can see, the result represents the COUNTRY field for every record in the table. The results of the query are not grouped automatically. Snowflake provides a nice summary analysis of the table on the right and reports there are three other countries in our data set. How do we code SQL to return that same analysis?

That is where the GROUP BY clause will assist you in rolling up COUNTRY so we can also use other aggregation functions to structure and calculate values at the right level of detail. 

  1. Add the following SQL including the GROUP BY clause to your worksheet.

The GROUP BY clause comes after the table name and after your WHERE clause, as I demonstrate in the commented code in the screenshot below.

SELECT "COUNTRY" FROM SFDC.STAGE.LEAD

 GROUP BY “COUNTRY”;


This result is much closer to the Snowflake preview but we are missing the count of records.  That is where aggregation functions come into play.