Skip to main content

6.4 COALESCE

COALESCE returns the first non-NULL value from a list of expressions. It’s like using a formula in Salesforce that checks multiple fields and returns the first field that has a value. This is particularly useful for data that might be spread across multiple fields, ensuring you get a result even if some fields are empty.

Salesforce in particular as a system of record for customer data has numerous scenarios where similarly typed data could exist across multiple fields. Phone, email, and address fields are great examples.


For example, let’s say you want to prepare a list of customers by phone number so you can deliver a call list. COALESCE is a great way to accomplish this task.


Let’s explore a specific SQL case together:

COALESCE(MobilePhone, HomePhone, WorkPhone, ‘No Phone Number Available’)


In this case, COALESCE will return the first non-NULL value. If all fields are NULL, we have ‘No Phone Number’ returned.

Mastering conditional expressions in SQL, such as CASE, IFF, and COALESCE, equips you with a toolkit for dynamic data manipulation and analysis, akin to the capabilities provided by Salesforce formula fields. 

6.3 IFF

IFF is a shorthand version of the CASE statement, available in some SQL dialects (like T-SQL and of course Snowflake). It’s a straightforward way to write a conditional expression, similar to the “IF” function in Excel or Salesforce formulas. It evaluates a condition and returns one value if the condition is true and another if it’s false.

SELECT 

IFF ("COUNTRY" =  "NUMBEROFLOCATIONS__C" >1 AND "COUNTRY" = 'USA' ,'US Muli-Location','Other') "Location Classification"

FROM SFDC.STAGE.LEAD

6.2 Real-world Salesforce Formula vs SQL – Where Should Logic Reside?

We just explored a function that could be implemented inside of Salesforce. So, if you are wondering why we wouldn’t simply implement this kind of CASE statement inside of Salesforce, your line of thinking is correct. Several considerations determine when and where to apply logic as you explore the world of SQL and Snowflake.

  • Use case/application: Data science and analytics is the number 1 reason why data gets manipulated outside of Salesforce. For example, if we wanted to build a Lead Score model, you could engineer a dozen features. Similarly, data cleansing and preparation for analytics are typically required but could be re-used to help with master data management. 
  • Who manages the business definition /logic and change management: This is likely the primary driver for where to apply your expression.
  • Use within business process: Salesforce core data model is very flexible and powerful, but when over-taxed with formulas for reporting use cases, you can find yourself in a precarious situation as you grow. Generally, I advise pushing analytics, data science, and data blending/data management to the data warehouse. Another red flag where you may find yourself wandering into a data warehouse is when you replicate the same data across multiple objects simply to accommodate reporting edge cases.
  • Frequency of change: When business logic changes frequently a well-intended Salesforce formula turns into a change management nightmare.

6.1 – CASE

The CASE statement in SQL is like a series of “IF-THEN” statements in Salesforce formula fields. It lets you set conditions and outcomes within your query. For example, you might use CASE to categorize accounts based on their revenue, much like you might use a formula field to categorize leads in Salesforce.

The good news is if you have created a CASE statement within a Salesforce formula, you can re-use that knowledge in SQL.

WHEN <condition1> THEN <result1>

  [ WHEN <condition2> THEN <result2> ]

  [ ... ]

  [ ELSE <result3> ]

END

LAB:

Let’s work on an existing SQL project from an existing lab and enhance it with a case statement:

In a previous lab, we used the WHERE clause to filter our leads by country and number of locations. What if we wanted to classify and label these locations alongside others instead of filtering our data? A WHERE clause is a simple and efficient way to label data when there is a small number of labels.

SELECT *,

CASE

 WHEN  "NUMBEROFLOCATIONS__C" >1 AND "COUNTRY" = 'USA' THEN 'US Muli-Location'

  WHEN "NUMBEROFLOCATIONS__C" =1 AND "COUNTRY" = 'USA' THEN 'US Single Location'

ELSE 'Other' END "Location Classification"

 FROM SFDC.STAGE.LEAD;

5.3 Profiling & Analyzing Joined Data

Now that you have joined two tables together, there is some profiling, analysis, and critical thinking required to ensure your result will accurately reflect your desired output. There are numerous issues that can arise where your results will mis-represent your desired results. In theory, you wanted to understand Leads that are converted. An inner join will ensure only records with L.”ConvertedAccountId” and A.”Id” that match are returned.

What about leads without ConvertedAccountId record?

Leads that are missing a field ConvertedAccountId are not returned.

What about joining 2 tables containing fields with NULL values?
Snowflake (or any SQL database) will not create a Cartesian product due to NULL values in the join conditions. Instead, it will ignore rows in LEAD where ConvertedAccountId is NULL, because those NULL values won’t find any matching Id in the ACCOUNT table (since NULL does not equal anything, not even another NULL). The result set will only include leads that have a non-null ConvertedAccountId that matches an Id in the ACCOUNT table.

How can you determine how many Leads are missing from Accounts?

The simple way to make this determination is to check for the existence of a foreign key:

SELECT COUNT(*) AS " Leads Count" FROM LEAD WHERE "ConvertedAccountId" IS NULL;

This works for simple examples. Over time as you learn and find better ways to solve problems you will likely accumulate your own library of SQL scripts to help you quickly solve problems. When I profile and determine record counts, I use SQL like this:

How can you determine how many Accounts are missing from the Leads Table?

5.5 Combining two data sources with UNION

In SQL, a UNION is used to combine and stack the result sets of two or more SELECT queries. When you create a union, each SELECT query must have the same number of columns in the result sets with similar data types. This operation allows you to stack and consolidate data from different queries into a single result, which is particularly useful when dealing with disparate data sources or different activities that need to be analyzed together.

In Salesforce there isn’t an equivalent of a union that you would normally work with. However, in the world of reporting and analytics, Unions are extremely powerful for connecting and understanding activity data. For example, if you wanted to understand and report total leads created, total opportunities created, and deals closed this week you would need to run 3 different reports. Each report would require a different date/time field and filter criteria. To view these statistics together you could display them in a dashboard together, but they are effectively disconnected. A union can easily combine these 3 data sets into a single structure that can be loaded, pivoted and analyzed.

SQL Union Syntax

With this conceptual understanding, the union SQL syntax follows the same pattern where each query is created and simply combined using the UNION keyword. Here is a basic union syntax to combine two SELECT statements is as follows:

SELECT column1, column2, ...
FROM table1
UNION
SELECT column1, column2, ...
FROM table2;

5.6 Enhancing UNION Results

Labeling each query results with a Constant

When you use a union to combine two query results, you lose context where each query originated. Additionally if you want to use this data result to pivot and analyze data this context is very important. Let’s enhance each query with a “SEGMENT” constant. We learned about adding constants in our first lab so this should be a refresher. 

Add a constant to query 1 called “Segment” with the value ‘New Leads’

Add a constant to query 2 called “Segment” with the value ‘Closed Won Deals’

Your new SQL query should look like this:

SELECT ID,CREATEDDATE,STATUS,'New Leads' AS SEGMENT FROM LEAD
WHERE CREATEDDATE >= '2021-04-01' AND CREATEDDATE <= '2021-04-30'

UNION ALL

SELECT ID,CLOSEDATE,STAGENAME,'Closed Won Deals'as SEGMENT FROM OPPORTUNITY
WHERE CLOSEDATE >= '2021-04-01' AND CLOSEDATE <= '2021-04-30' AND STAGENAME = 'Closed Won';

Assigning Consistent Column Names
Snowflake will utilize the column names from your first query. The ordering of the columns and NOT the column names dictates how data is combined within your union. 

It’s a best practice to use aliases to rename your columns for consistency. It ensures you are thoughtfully merging data while retaining the original table column name as context.

SELECT ID,CREATEDDATE as "ACTIVITY DATE",STATUS,'New Leads' AS SEGMENT FROM LEAD
WHERE CREATEDDATE >= '2021-04-01' AND CREATEDDATE <= '2021-04-30'

UNION ALL

SELECT ID,CLOSEDATE as "ACTIVITY DATE",STAGENAME,'Closed Won Deals'as SEGMENT FROM OPPORTUNITY
WHERE CLOSEDATE >= '2021-04-01' AND CLOSEDATE <= '2021-04-30' AND STAGENAME = 'Closed Won';

Column Data Types

It is important that you align the column data types between tables. For example a VARCHAR (text) column shouldn’t be unioned to an INTEGER (number).

Column Count Mismatches

If one query has a mismatch in columns you could get an error like the following. 

UNION VS UNION ALL

A UNION will automatically remove exact duplicates from your result. A UNION ALL will leave all records in your result. Our lab pulled data from two different tables with distinct stage and status fields so a duplicate would have been impossible.


To further understand UNION vs UNIONALL, let’s say we had these two tables that we wanted to union together.

TableA

IDName
1Alice
2Bob

TableB

IDName
2Bob
3Charlie

Our data has a duplicate record for 2 | Bob. A UNION will remove the duplicate record. A UNION ALL will return both instances for 2 | Bob.  If you start using Union to create data from the same table, the UNION vs UNION ALL distinction will be more important for duplicate removal.

Combined Result Set (with UNION)

IDName
1Alice
2Bob
3Charlie

5.7 Extra Credit – Putting Union Data to Work

To this point, we have been writing code and viewing results. To explain the value of the data set you just created, I have exported the data and pulled into Excel to pivot and aggregate the data. As you can see, in 1 data set we can view two independent activities together in a funnel. When you look at Salesforce pipeline data, this activity based view is extremely powerful to understand in-month activities.

From what you have learned so far, we can quickly get to this same summarized count by month and segment by wrapping our UNION query in a SELECT statement with GROUP BY. Here is an example that you can test and experiment with:

-- Adding the to_char function changes the detailed date format into a month/year text field.

SELECT  to_char("ACTIVITY DATE", 'yyyy-mm') "ACTIVITY DATE" ,SEGMENT,COUNT(ID) "COUNT" FROM

--Wrap the previous lab Union Query in opening and closing parentheses which creates a derived table for you to select from.

(
SELECT ID,CREATEDDATE as "ACTIVITY DATE",STATUS,'New Leads' AS SEGMENT FROM LEAD
WHERE CREATEDDATE >= '2021-04-01' AND CREATEDDATE <= '2021-04-30'

UNION ALL

SELECT ID,CLOSEDATE as "ACTIVITY DATE",STAGENAME,'Closed Won Deals'as SEGMENT FROM OPPORTUNITY
WHERE CLOSEDATE >= '2021-04-01' AND CLOSEDATE <= '2021-04-30' AND STAGENAME = 'Closed Won'
)

-- Adding the to_char changes the detailed date format into a month/year grouped value.

GROUP BY to_char("ACTIVITY DATE", 'yyyy-mm'),SEGMENT

5.4 Normalized vs Denormalized Data

Understanding Normalized vs Denormalized Data

While working in Salesforce, you are working within a relational data model that is “normalized”. When you join these tables together you could be denormalizing your data which helps for data analysis but without care can change the meaning of your data by introducing what would be perceived as “duplicate” records.

Additional Profiling Tips

When you join two data sources together in Salesforce, you have to consider one to many relationships between objects and the resulting information / data that you are delivering.

A quick way to review and check the presence of duplicate data as a result of a join is to count and compare Record counts by ID from distinct record counts.

For example, you can use the previous RIGHT join and perform an aggregate calculation of COUNT and COUNT DISTINCT to give you visibility to any introduce duplicate records in your data, thus denormalizing your data.

SELECT

  COUNT(*) AS "Total Joined Records",

  COUNT(DISTINCT A."Id") AS "Distinct Account IDs"

FROM LEAD AS L

RIGHT JOIN ACCOUNT AS A ON L."ConvertedAccountId" = A."Id";

Guidance on working with JOINs

Understanding SQL joins is crucial for combining data from multiple tables in a database is a critical building block for working with Salesforce data because the analysis and analytics output often relies on multiple table of data and lookups in Salesforce. Whether you’re working with Salesforce reports or querying a database directly, knowing how to join tables enables you to retrieve comprehensive datasets that provide valuable insights.

By grasping the concepts of inner joins, left joins, right joins, and full outer joins, you gain control over how data is merged, allowing you to tailor your queries to specific requirements. Moreover, mastering join conditions ensures that the combined data is relevant and accurately reflects the relationships between the tables.

As you continue your SQL journey, remember to consider the implications of different join types on your query results and how they align with your data analysis goals. With practice and proficiency in SQL joins, you’ll be equipped to navigate complex datasets and extract meaningful information to drive informed decision-making.

There are intermediate to advanced concepts for dealing with junction tables, one to many relationships, and deduping records that we don’t cover in our introductory training but you should be aware of. I recommend that you spend some time practicing joining data extensively because it will become a staple for your SQL skills advancement.