Clarity on Translating Requirements to Code
This is where we pause for a minute on a step by step coding and take a step back to focus on an important practical application of what we are learning. Gathering requirements for SQL is similar to Salesforce where you need clear definitions and intent. It may require asking “why” multiple times to get clarification.
In this scenario when provided guidance to count the number of states, it would seem logical to follow the same pattern. In a real world requirement gathering scenario, simply asking “why” would have revealed that I want to know how many states exist for each country. Creating new calculated fields on the fly is powerful but also requires a clear dictionary and good discipline to understand how the data will be used.
Re-calculate State count to represent the number of unique States within each country. To calculate distinct values, we are adding a new “DISTINCT” keyword inside of the COUNT function, right before the field that you are calculating distinct values for. Counting distinct values is a big part of aggregating data.
COUNT (DISTINCT "STATE") "State Count"
Our result tells us of the 19 leads, 14 of them reside in unique states.
8. Add a second dimension to group by “STATUS.” You will group by 2 dimensions. COUNTRY and then STATUS. Following some of the patterns you have learned, spend some time experimenting to see if you can add to round out your basic understanding of GROUP BY and AGGREGATION functions.
RESULT:
SELECT "COUNTRY",
"STATUS",
COUNT(*) as "Lead Count",
COUNT (DISTINCT "STATE") "State Count"
FROM SFDC.STAGE.LEAD
GROUP BY "COUNTRY", "STATUS";
How did you do? Hopefully, it looks something like the following result. The solution is simply add the “STATUS” value after the country and then add it to the SELECT statement too. That may not be obvious quite yet, but is a good way to learn how to use GROUP BY correctly.
I have provided a longer list of aggregation functions supported by Snowflake which may differ in some cases from other databases. There is a massive list to experiment with and learn from.
Function | Basic Definition |
---|---|
ANY_VALUE | Returns any value from a group |
AVG | Calculates the average of a set of values |
CORR | Computes the correlation coefficient between two expressions |
COUNT | Counts the number of rows in a group |
COUNT_IF | Counts the number of rows in a group that meet a specific condition |
COVAR_POP | Calculates the population covariance |
COVAR_SAMP | Calculates the sample covariance |
LISTAGG | Concatenates values from a group into a single string |
MAX | Finds the maximum value in a set |
MAX_BY | Returns the value that corresponds to the maximum value in another set |
MEDIAN | Finds the median value |
MIN | Finds the minimum value in a set |
MIN_BY | Returns the value that corresponds to the minimum value in another set |
MODE | Determines the mode (most frequently occurring value) |
PERCENTILE_CONT | Calculates a continuous percentile |
PERCENTILE_DISC | Calculates a discrete percentile |
STDDEV | Computes the sample standard deviation (alias for STDDEV_SAMP) |
STDDEV_POP | Computes the population standard deviation |
STDDEV_SAMP | Computes the sample standard deviation |
SUM | Calculates the sum of values |
VAR_POP | Computes the population variance |
VAR_SAMP | Computes the sample variance |
VARIANCE_POP | Computes the population variance (alias for VAR_POP) |
VARIANCE, VARIANCE_SAMP | Computes the sample variance (alias for VAR_SAMP) |