Skip to main content

5.2 JOIN Labs

Lab Joining 2 Data Sources with a Left Join

What if we wanted to return all leads with and without matched accounts? The following query with a LEFT JOIN will return all Leads that have and have not been converted to an Account.

SELECT L."Name" as "LeadName", A."Name" as "AccountName" 

FROM LEAD AS L 

LEFT JOIN ACCOUNT AS A 

ON L."ConvertedAccountId" = A."Id"

Lab Joining Two Data Sources with Full Outer Join

The following query with a LEFT JOIN will return all Leads that have and have not been converted to an Account and all Accounts that may not be associated with a Lead.

SELECT L."Name" as "LeadName", A."Name" as "AccountName" 

FROM LEAD AS L 

LEFT JOIN ACCOUNT AS A 

ON L."ConvertedAccountId" = A."Id"

After you perform a full outer join between leads and accounts, you may ask yourself three important questions:

  1. How many matches exist?
  2. How many leads exist without a converted account?
  3. How many accounts exist that were not converted from a lead?

We are not going to dig into these questions quite yet as we continue to learn and build up our working knowledge of expressions. We will refer back to this in our final lesson when we dive into a practical application of a full outer join.

Lab Joining Two Data Sources with Right Join

The following query with a RIGHT JOIN will return all Accounts that were converted from leads or exist without a lead, and only Leads that were converted to an Account. 

SELECT L."Name" as "LeadName", A."Name" as "AccountName" 

FROM LEAD AS L 

RIGHT JOIN ACCOUNT AS A 

ON L."ConvertedAccountId" = A."Id"

5.1 JOIN ON Condition

To understand a join, let’s first consider 2 queries:

Query 1 – Lead Query

SELECT "Name" as "LeadName"

FROM LEAD

Query 2 – Account Query

SELECT "Name" as "AccountName" 

FROM ACCOUNT

Table Alias

When working with tables like Leads and Accounts, where both have columns named “Id” and “Name”, aliasing becomes particularly useful to avoid ambiguity. For instance, in a SQL query involving a join between these two tables, you might encounter confusion if you just refer to “Id” or “Name”, since it’s unclear whether you’re referring to the “Id” and “Name” from Leads or Accounts.

To resolve this, you can alias the tables, like L for Leads and A for Accounts. With the table alias defined, you can now call “Id” or “Name” fields, prefixed with the alias to specify the table. If you review the revised queries below, it’s now clear which table’s “Name” you are referencing.

NOTE: Technically you do not need the keyword “AS” when you create an alias for a SQL table or column. It does however help for readability while starting writing SQL. 

Query 1 – Lead Query

SELECT L."Name" as "LeadName"

FROM LEAD AS L

Query 2 – Account Query

SELECT A."Name" as "AccountName" 

FROM ACCOUNT AS A

JON ON Condition

The ON condition in a SQL join specifies the criteria that determine how rows from two tables are matched together. It’s essentially a rule that says, “only link rows where this condition is true.” For example, in the condition ON Lead.ConvertedAccountId = Account.Id, it means that a row from the Lead table will only be joined with a row from the Account table if the ConvertedAccountId field in the Lead table matches the Id field in the Account table. This ensures that the data combined in the join result is relevant and correctly associated, linking each lead to its corresponding account based on these matching IDs.

SELECT L."Name" as "LeadName", A."Name" as "AccountName" 

FROM LEAD AS L 

JOIN ACCOUNT AS A 

ON L."ConvertedAccountId" = A."Id"

4.2.3 Translating Requirements to Code

Clarity on Translating Requirements to Code

This is where we pause for a minute on a step by step coding and take a step back to focus on an important practical application of what we are learning. Gathering requirements for SQL is similar to Salesforce where you need clear definitions and intent. It may require asking “why” multiple times to get clarification. 

In this scenario when provided guidance to count the number of states, it would seem logical to follow the same pattern. In a real world requirement gathering scenario, simply asking “why” would have revealed that I want to know how many states exist for each country. Creating new calculated fields on the fly is powerful but also requires a clear dictionary and good discipline to understand how the data will be used. 

Re-calculate State count to represent the number of unique States within each country. To calculate distinct values, we are adding a new “DISTINCT” keyword inside of the COUNT function, right before the field that you are calculating distinct values for. Counting distinct values is a big part of aggregating data. 

 COUNT (DISTINCT "STATE") "State Count"

 Our result tells us of the 19 leads, 14 of them reside in unique states.

8. Add a second dimension to group by “STATUS.” You will group by 2 dimensions. COUNTRY and then STATUS. Following some of the patterns you have learned, spend some time experimenting to see if you can add to round out your basic understanding of GROUP BY and AGGREGATION functions.


RESULT:

SELECT "COUNTRY",

 "STATUS",

 COUNT(*) as "Lead Count", 

 COUNT (DISTINCT "STATE") "State Count"

 FROM SFDC.STAGE.LEAD

 GROUP BY "COUNTRY", "STATUS";

How did you do? Hopefully, it looks something like the following result. The solution is simply add the “STATUS” value after the country and then add it to the SELECT statement too. That may not be obvious quite yet, but is a good way to learn how to use GROUP BY correctly.

I have provided a longer list of aggregation functions supported by Snowflake which may differ in some cases from other databases. There is a massive list to experiment with and learn from. 

FunctionBasic Definition
ANY_VALUEReturns any value from a group
AVGCalculates the average of a set of values
CORRComputes the correlation coefficient between two expressions
COUNTCounts the number of rows in a group
COUNT_IFCounts the number of rows in a group that meet a specific condition
COVAR_POPCalculates the population covariance
COVAR_SAMPCalculates the sample covariance
LISTAGGConcatenates values from a group into a single string
MAXFinds the maximum value in a set
MAX_BYReturns the value that corresponds to the maximum value in another set
MEDIANFinds the median value
MINFinds the minimum value in a set
MIN_BYReturns the value that corresponds to the minimum value in another set
MODEDetermines the mode (most frequently occurring value)
PERCENTILE_CONTCalculates a continuous percentile
PERCENTILE_DISCCalculates a discrete percentile
STDDEVComputes the sample standard deviation (alias for STDDEV_SAMP)
STDDEV_POPComputes the population standard deviation
STDDEV_SAMPComputes the sample standard deviation
SUMCalculates the sum of values
VAR_POPComputes the population variance
VAR_SAMPComputes the sample variance
VARIANCE_POPComputes the population variance (alias for VAR_POP)
VARIANCE, VARIANCE_SAMPComputes the sample variance (alias for VAR_SAMP)

4.2.2 Aggregate Functions

Aggregate functions perform a calculation on a set of values and return a single value. Common aggregate functions include:

  • COUNT(): Counts the number of rows in each group.
  • SUM(): Adds up the values in a specific column for each group.
  • AVG(): Calculates the average of values in a specific column for each group.
  • MAX() and MIN(): Determine the highest and lowest values in a specific column for each group, respectively.
  • COUNT(DISTINCT): Counts the number of distinct (unique) values in a column for each group.

SQL aggregations operate similarly to Salesforce, so technically you already have built and deployed aggregate functions if you have ever used Salesforce summary reports or built Salesforce Rollup fields.

  1. To add your first aggregation field, add the following SQL COUNT aggregation function. COUNT(*) after COUNTRY in your SELECT statement as shown below:
SELECT "COUNTRY",COUNT(*) FROM SFDC.STAGE.LEAD

 GROUP BY "COUNTRY";

If you leave the COUNT function without an argument within the parenthesis, it will return an error. The wild card “*” within the COUNT(*) tells the count function to count all records in the query. Do you see anything wrong with the column names in the query result result?

ANSWER

It is missing a human readable column name. When you use aggregate functions you need to declare a field name alias because you are creating this as a new column in your query result. Let’s make the correction to our query and work on the next aggregation:

SELECT "COUNTRY",COUNT(*) as "Lead Count"

FROM SFDC.STAGE.LEAD

GROUP BY "COUNTRY";
  1. Count the number of statuses per country. Add another COUNT() function to count the number of “STATE” records.

COUNT(“STATE”)

Now, the result illustrates that there are states present for the USA but not for other countries. However, my intent was not to merely count the number of records with and without a state. 

4.2.1 Grouping and Summarizing

In SQL, the GROUP BY clause and aggregate functions are fundamental tools for organizing and summarizing data. Understanding these concepts allows you to extract meaningful insights from your data sets.

GROUP BY Clause: The GROUP BY clause in SQL is used to group rows that have the same values in specified columns into summary rows. It’s often used with aggregate functions to perform calculations on each group of data. For example, you can use GROUP BY to gather all records from the same country and then use an aggregate function to count the number of records in each country.

To continue to draw reference to Salesforce, we can look to the report group feature. We are going to rebuild this same view in SQL.

Lab 5: Grouping and Summarizing Data

  1. Open a worksheet in Snowflake and Name it “Lab 6- Aggregating Data”.
  2. Enter the following SELECT statement:
SELECT "COUNTRY" FROM SFDC.STAGE.LEAD;
  1. Click the Run button.

As you can see, the result represents the COUNTRY field for every record in the table. The results of the query are not grouped automatically. Snowflake provides a nice summary analysis of the table on the right and reports there are three other countries in our data set. How do we code SQL to return that same analysis?

That is where the GROUP BY clause will assist you in rolling up COUNTRY so we can also use other aggregation functions to structure and calculate values at the right level of detail. 

  1. Add the following SQL including the GROUP BY clause to your worksheet.

The GROUP BY clause comes after the table name and after your WHERE clause, as I demonstrate in the commented code in the screenshot below.

SELECT "COUNTRY" FROM SFDC.STAGE.LEAD

 GROUP BY “COUNTRY”;


This result is much closer to the Snowflake preview but we are missing the count of records.  That is where aggregation functions come into play. 

4.1.1 Filtering Data with Conditions

In this topic, we will learn how to apply the same knowledge and experience filtering data in Salesforce reports to implement a WHERE clause in SQL to constrain data to a smaller subset. Within Salesforce reports you have filter operators that can be applied to text and operators that can be applied to values.

Filter by Text 

In a Salesforce report, if I wanted to filter the leads table to only display leads with a Country equal to ‘USA’, I would enter it like this:

The syntax in SQL places a WHERE clause after the table name. As you notice, we are defining the field name equaling a literal value of ‘USA’ in single quotes.

SELECT * FROM SFDC.STAGE.LEAD

WHERE "COUNTRY" = 'USA';
  1. Open a worksheet in Snowflake and name it “Lab 4 – Filtering Data”.
  2. Copy and paste the following statement containing the WHERE clause for leads with a COUNTRY in ‘USA’. 
  3. As a reminder, we end each query with a semicolon.
SELECT * FROM SFDC.STAGE.LEAD

WHERE "COUNTRY" = 'USA';

Running this query should result in 19 records. Let’s build this together to start lab number 4.

Adding Comments

While working on a worksheet or any standard SQL editor it is common to have one or multiple queries or subqueries that you want to test or combine. To document and provide context to your queries, you can write comments. For single-line comments, you can add 2 dashes “–” which will comment only a single line. More information on comments is available from Snowflake.

  1. Add a comment to the query worksheet that you just created: Filter data by Country = USA.

Your result should look like the following:

4.1.3 Setting an Expression to Compare Two Fields

Expressions with Multiple Conditions

To this point, we have constructed a WHERE clause with a single condition that is being evaluated to filter the resulting data. A condition is a type of expression that is often used in a WHERE clause. In simple terms, an expression is a phrase in SQL that can be evaluated to produce a value. This value can be a number, text, date, time, or any other data type that SQL supports. The way we are currently using the expression NUMBEROFLOCATIONS__C” >3 is considered a condition because it exists after the WHERE clause. There are other places where we could use this same expression like a calculated field that we will explore in later labs.

When dealing with logical operators like AND, OR, and NOT,  multiple conditions are used to define the conditions that rows in a database table must meet to be included in the query results.


There is a lot of new terminology that we just absorbed, so let’s put it into practice by creating a new query with two conditions. We want to know how many records meet both criteria where the country is the USA and the number of locations is greater than three. How do you think this would be represented in the final SQL?

  1. Add a comment: 

–Multiple conditions. USA AND > 3 locations

  1. Attempt to write a SQL query below the comments that filter results where the following conditions are true.
"COUNTRY" = 'USA'

"NUMBEROFLOCATIONS__C" >3

  1. Click your mouse cursor right before the SQL.
  2. Click on the Run button to execute the SQL.

RESULT

If you came up with the following, you are right. If you landed on a different answer resulting in an error, no problem! That is a big part of learning code syntax and SQL.

SELECT * FROM SFDC.STAGE.LEAD

WHERE "NUMBEROFLOCATIONS__C" >3 AND "COUNTRY" = 'USA';

Your result should have 16 records.

TIP – Because we are entering multiple SQL queries in the same worksheet, another way to run only one query is to highlight the code you would like to execute. Simply click and drag to select the code that you would like to run and then hit the same run button. 

In this query, you have two conditions that are evaluated independently and then combined using the AND operator. This means that a record from the LEAD table will only be selected if it satisfies both conditions: the number of locations is greater than three, and the country is the USA. This is an example of how multiple conditions can be combined to refine a query, allowing for more specific and targeted data retrieval in SQL.

Setting an Expression to Compare Two Fields

To this point, we have been manually setting the condition values in our SQL query. What if we want to evaluate how one field is related to another? Let’s say we want to check to see if the Lead Owner is equal to the same user who last modified the lead. Using the following two fields, please try to write the SQL query:  OWNERID and LASTMODIFIEDBYID. 

  1. Add a comment – OWNERID is equal to the LASTMODIFIEDBYID.
  2. Enter below the comment a new SQL query that filters the results where the OWNERID is equal to the LASTMODIFIEDBYID.
  3. Click your mouse cursor right before the SQL.
  4. Click on the Run button to execute the SQL.

Result

If you came up with the following, you are right. Technically you could have swapped the ordering of the two fields. Because we imported the data, technically for all 22 records, we are the lead owner and last person to modify the record.


Recap

We covered a lot of ground in this section to explain in SQL code how we can apply the same filtering concepts available in Salesforce reports. The concepts we learned in building expressions as conditions will serve us well when we start exploring some conditional expression functions next.

4.1.2 Standard Logical Operators

Filter by Value

We will continue our lab in the same worksheet adding additional queries. Let’s explore a WHERE clause to see how many leads have more than three locations using the field “NUMBEROFLOCATIONS__C”. How do you think we would represent this SQL?

  1. Try within your existing workbook to code the SQL for the requirement.
  2. Click your mouse cursor right before the SELECT statement.
  1. Press the Run button.

Correct SQL WHERE clause should look like the following:

SELECT * FROM SFDC.STAGE.LEAD

WHERE "NUMBEROFLOCATIONS__C" >3;

The result should return 19 records.

Standard Logical Operators

Salesforce and Snowflake share the same Logical operators as any programming language.

Salesforce EquivalentSQL Operator ValueExample Syntax
Equals=a = b
Not Equal To!= a !=b
Not Equal To<>a <> b
Less Than<a < b
Greater Than>a > b
Less or Equal<=a <= b
Greater or Equal>=a >= b
Matches both expressions (a and b).ANDa AND b
Matches either expression.ORa OR b
Does not match the expression.NOTNOT a

3.5 Create your Own Select Statement Lab

Select Individual Columns

To select individual columns, you can type the names of the columns wrapped in double quotes. It is important to note that databases vary in their syntax for selecting field names. Snowflake in particular requires double quotes when column names are case-sensitive or have spaces. For the purpose of our training, I will always use double quotes for consistency. Other databases like SQL Server allow you to use brackets [] or double quotes. Let’s explore this scenario together in lab 3:

Lab 3: Query Leads Table

  1. Name your tab Lab 3- Select Id and Name.
  2. Paste the following SQL SELECT statement into the query editor.

SELECT "ID","NAME" FROM LEAD;
  1. Include a semicolon at the end of your SQL Statement – You may notice that this example is terminated by a semi-colon. This is how you designate the end of your SQL statement. This is important particularly if you are including multiple SQL statements inside of one workbook. We will visit that later.
  2. Press run button in upper right to execute your query.

TIP: As a shortcut to clicking “run” you can press CTL + SHIFT + ENTER if you prefer keyboard shortcuts.

3. Add a couple more columns to your data set. As you will notice, Snowflake will attempt to provide fields available to.

Table, Schema, and Table Case Sensitivity

Similar to field names, your database, schema, or table names could be case-sensitive or have spaces. In these scenarios, they will need to be wrapped in double quotes like this. 

SELECT "ID","NAME" FROM “SFDC”.”STAGE”.”LEAD”;

As a general best practice I tend to keep all tables and schemas in all caps. Certainly, you shouldn’t have spaces, but it does happen. For our labs moving forward, I will continue to provide our database, schema, and table names without quotes.

Aligning Query and Analysis to Native Salesforce Reporting

If you have gotten to this point and are thinking to yourself, “that was a lot of work just to look at a table with Salesforce data,” you are correct. At this point, we have created the equivalent of a Salesforce Report vertical table report.

In fact, I created the exact same table in my Salesforce instance to get the same result.

As we dig deeper into SQL and start exploring the breadth and depth of functions and structures we will start to build a clear delineation between standard reporting in Salesforce and the advanced functions and structures using a data warehouse.