Skip to main content

5.1.1 Characteristics of a Key

When working with any relational database you have two methods to relate two tables.

Primary Key: Uniquely identifies each row within a table.

Salesforce for example as a business app, has a primary key on every field called “Id.”

Foreign Key: References a primary key in another table, establishing a relationship between tables.

Is ConvertedAccountId a a key in the Lead Table?

  • ConvertedAccountId is not a primary key because it does not uniquely identify each record in the Lead table. It’s possible for multiple leads to point to the same account.
  • ConvertedAccountId acts as a foreign key if it references the primary key (likely AccountId) in the Account table. This makes it a relational link between leads and accounts.

In relational databases, particularly business applications keys are required and enforced to perform CRUD (CREATE, READ, UPDATE and DELETE) operations and to index and performance tune your database. Snowflake by design does not require you to formally define primary keys when you build a table, nor does it enforce any constraints on the data you load into your tables. This course does not cover data warehouse design, but if you are interested you can view the following documents on Snowflake constraints.

While we are on the topic of keys, one realted topic is indexing. Snowflake does not use traditional indexes commonly found in relational database management systems (RDBMS). Snowflake uses its own proprietary micro-partitions, keys, an metadata to auto-optimize query performance.

8.6 Opportunity Query

With our Lead Query created, next, we will create an Opportunity query. Create a SELECT statement for SFDC.STAGE.OPPORTUNITY with a table alias of “O” and include the following fields:

Id, CreatedDate, Revenue, Stage, AccountId, CloseWonDate

If you find that the first attempt does not work perfectly, use the errors to help guide you along the way. Here are some tips:

  1. Did you include commas after the field name?
  2. Did you add the table Alias correctly?
  3. Did you declare you get single vs double quotes correct?
  4. Did you remember to alias your table and fields?

Opportunity Query Result:

SELECT 
O.Id,
O.CreatedDate,
O.ExpectedRevenue,
O.StageName,
O.AccountId,
O.CloseDate,
O.Type
FROM SFDC.STAGE.OPPORTUNITY O;

Revenue Pipeline

NameObject & FIeld OriginDate DimensionDescriptionDefinition
Revenue PipelineOpportunity.RevenueOppotunity.CreatedDateOpportunity RevenueTotal Revenue pipeline (open and closed opportunities)

Our next metric is called “Revenue Pipeline.” The distinction of pipeline indicates that we are measuring revenue at any Opportunity stage. This is a great example where having well-written definitions and a universal understanding of metrics and facts are very important. Because there are no implied constraints or calculations required, you can simply use the Opportunity Revenue field in your query. As it turns out we don’t need to change. As it turns out, the standard field as it stands within the opportunity is sufficient and already included in our query.

Result

O.ExpectedRevenue

New Customer Won Opportunities

NameObject & FIeld OriginDate DimensionDescriptionDefinition
New Customer Won OpportunitiesOpportunity.StageOppotunity.CloseWonDateOpportunity COUNT WHERE Stage = ‘Closed Won’ AND Type = ‘New Customer’Count of won opportunities for new customers

The next metric is “New Customer Won Opportunities” where you get to put your conditional expression skills to use again. This time, you have two conditions. Give it a shot on your own.

  1. Did you declare you get single vs double quotes correctly?
  2. Did you utilize AND / OR to add two conditions?

Result

CASE
WHEN O.StageName = 'Closed Won' AND O.Type = 'New Customer' then 1
ELSE 0 END "New Customer Won Opportunities"

Customers

NameObject & FIeld OriginDate DimensionDescriptionDefinition
Customers Opportunity.Id,
Opportunity.AccountId,Opportunity.Stage
Oppotunity.CloseWonDateDISTINCT Accounts with  WHERE Opportunity Stage = ‘Closed Won’Distinct count of Accounts where Closed-Won Opportunities > 0 

Our final metric is designed to measure the total number of distinct customers in Salesforce. A “Customer” in this case has a very specific definition that requires an Account to have at least 1 closed-won opportunity. Unlike the other metrics, I will share how we achieved this metric in Salesforce. I would need to build an Opportunity report that filters data for Closed-Won deals and then calculate Unique Account IDs.

Creating this metric in SQL is no different. In the spirit of breaking down a problem into its smallest parts, we need to:

  1. Isolate AccountIds that have a Closed-Won opportunity.
  2. Aggregate and COUNT the distinct AccountIds to determine  our “Customers” metric.

At this point, while writing our SQL for an Opportunity we can only perform 1 of the steps. Create a field that returns the AccountIDs for Opportunities that have reached a Closed-Won stage. 

Hints

  1. You have technically built a nearly identical field.
  2. Rather than a 0/1, you want to return the AccountId field.
  3. If the condition is false (meaning that the opportunity was not closed won), you should just return a NULL value.

Result

CASE
WHEN O.StageName = 'Closed Won' THEN O.AccountId
ELSE NULL END "Closed Won AccountIDs",

Or you can use

IFF  (O.StageName = 'Closed Won' ,O.AccountId, NULL)
"Closed Won AccountIDs2"

As a recap for this field, we have not calculated the final metric, but we have created a calculated field that we need to accurately measure “Customers”. We will visit this aggregate metric at the same time that we work on the Qualified Lead to Won Opportunity Ratio.

Final Opportunity Query Result

SELECT
O.Id,
O.CreatedDate,
O.ExpectedRevenue,
O.StageName,
O.AccountId,
O.CloseDate,
O.Type,
CASE
WHEN O.StageName = 'Closed Won' AND O.Type = 'New Customer' then 1
ELSE 0 END "New Customer Won Opportunities",
CASE
WHEN O.StageName = 'Closed Won' THEN O.AccountId
ELSE NULL END "Closed Won AccountIDs",
FROM SFDC.STAGE.OPPORTUNITY O;

9.3 Concluding your Learning

We have concluded the hands-on learning for introduction to SQL. We covered a lot of ground to give you a strong foundation to build from. There are several intermediate to advanced topics that you can explore on your own:

Additional Reading and Traditional Reading and Learning Paths

This course was purposefully built for the Salesforce ecosystem but there are other incredible resources that cover Snowflake and general data engineering.

Other Functions to Expand Your SQL and Data Skills

Together we covered the basics for query and analysis of your Salesforce data. To further expand your arsenal of SQL and data abilities you can go much deeper into the following topics:

Administration: Creating and managing tables, schemas, and databases learning Database Definition Language functions.

Advanced-Data Functions: Create SQL stored procedures and custom user-defined functions with Python or Javascript.

Expanded Data Types: Work with semi-structured data like JSON and XML or unstructured data.

Statistical Functions: Perform advanced analytics and statistical analysis with a large collection of statistical functions

Advanced Data Modeling: Understanding complex data modeling techniques such as star schemas, snowflake schemas, and normalization/denormalization processes can significantly improve how you structure and access data. Window Functions are part of this learning path.

  • Data Warehousing Best Practices: Learning about partitioning strategies, clustering data for performance, and managing large datasets can be crucial for optimizing your Snowflake environment.

Performance Tuning and Optimization: This involves learning how to write efficient SQL queries, use caching effectively, and optimize data storage and retrieval in Snowflake to ensure high performance and cost efficiency.

  • Data Security and Compliance: Understanding roles, access controls, data masking, encryption, and compliance standards within Snowflake is key to maintaining data security and privacy.
  • Data Governance and Quality: Learn about implementing data governance policies, data lineage, cataloging, and ensuring data quality within Snowflake to maintain reliable and accurate data for analysis.
  • Time Travel and Zero-Copy Cloning: Explore Snowflake’s unique features like Time Travel for accessing historical data and Zero-Copy Cloning for duplicating databases, schemas, or tables without additional storage costs. This is particularly important with new Salesforce DataCloud and Snowflake integrations.
  • Continuous Data Protection: Learn about Snowflake’s data protection features that allow for recovery from accidental data changes or deletions, ensuring data resilience and continuity.

Snowpipe: Delve into using Snowpipe for continuous, automated loading of data into Snowflake, enabling near-real-time processing of data streams.

9.2 Best Practices

Nested Queries: Use for small, intermediate calculations or filters.

CTEs: Use for breaking down complex queries into manageable parts, especially when the same subquery logic is needed multiple times within the query.

Derived Tables: Use for creating temporary tables within a query for intermediate results, particularly useful when the derived results are only needed for a single query and not multiple queries.

Views: Use for encapsulating complex logic and creating reusable query interfaces.

Performance Considerations

  • Ensure subqueries and CTEs are optimized to avoid performance issues.
  • View views are running slow and data does not change often, consult with your data team about structuring your view as a “materialized view”. A materialized view runs your SQL as your underlying data changes instead of waiting for an end user to execute the query.

Maintaining Readability and Simplicity

  • Keep queries simple and modular.
  • Use meaningful names for CTEs and views.
  • Document complex logic as comments to avoid understanding and maintenance.
  • Lean on AI bots and co-pilot to review your code, help document logic, and suggest enhancements.

9.1 Abstracting and Reusing Queries in SQL

Nested Queries (Subqueries)

A nested query, or subquery, is a query within another query. We explored our first sub-query as extra credit in our union. can be used to perform intermediate calculations or to filter results based on complex criteria. There are a couple kinds of sub-queries:

Inline Subqueries

These subqueries are used in the SELECT clause to calculate values on the fly as individual fields.

Example:

SELECT EmployeeID, 

       (SELECT MAX(Salary) 

        FROM Salaries 

        WHERE EmployeeID = Employees.EmployeeID) AS MaxSalary

FROM Employees;

Correlated Subqueries

These subqueries reference columns from the outer query and are evaluated once for each row processed by the outer query. Correlated subqueries are very powerful but can be challenging to understand, so I recommend adding comments to explain the outcome like the following:

Example:

–The WHERE EXISTS clause ensures that only employees who have at least one entry in the Salaries table with a matching EmployeeID and a salary greater than 100,000 are included in the result.

SELECT EmployeeID, EmployeeName

FROM Employees E

WHERE EXISTS (SELECT 1 

              FROM Salaries S 

              WHERE S.EmployeeID = E.EmployeeID AND S.Salary > 100000);

Derived Tables

Definition and Purpose

A derived table is a subquery that appears in the FROM clause of an SQL statement. The result set of the subquery is treated as a table, which can be queried by the outer query. Derived tables are defined using a subquery within parentheses in the FROM clause.

SELECT DepartmentID, AvgSalary

FROM (

    SELECT DepartmentID, AVG(Salary) AS AvgSalary

    FROM Salaries

    GROUP BY DepartmentID

) AS AvgSalaryByDept;

Common Table Expressions (CTEs)

A CTE provides a way to create temporary result sets that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. Instead of a Derived table for example, you can isolate and abstract multiple layers of complexity into CTEs. CTEs are defined using the WITH keyword. 

In the following CTE, we create an employees table that we SELECT from.

WITH EmployeeCTE AS (

    SELECT EmployeeID, EmployeeName, DepartmentID

    FROM Employees

    WHERE DepartmentID = 1

)

SELECT EmployeeID, EmployeeName

FROM EmployeeCTE;

Using this method dramatically simplifies complex joins and subqueries. By breaking down complex queries into manageable parts via CTEs it also helps readability and maintainability.

Database Views

Building upon the abstraction concepts for CTEs and Derived Tables, you can permanently save your SQL code for re-usability using Views. You may or may not have permission to create views in your database. A view is a virtual table based on the result set of a SELECT query. Views do not store data themselves but provide a way to simplify complex queries and enhance reusability.

Creating and Using Views

CREATE VIEW ActiveEmployees AS

SELECT EmployeeID, EmployeeName, DepartmentID

FROM Employees

WHERE IsActive = 1;

Using views encapsulates complex SQL logic and provides a consistent and reusable query interface. Because views exist as an object in your database, security and permissions can be granted to the view, thus potentially eliminating the need to give direct access to the source tables.

8.11 Documentation and Delivery

In any data analysis, data science, or data engineering project, clear and thorough documentation is as crucial as the technical execution itself. Proper documentation ensures that your SQL queries, data models, and analytical insights are accessible, understandable, and usable by others, including your future self. This section will cover the importance of documenting SQL queries and their intended outcomes, along with best practices for effective documentation and delivery of your work.

Knowledge Sharing: Documentation acts as a knowledge base, allowing team members to learn from existing work. It helps new team members get up to speed and supports cross-functional collaboration.

Reproducibility and Consistency: Documenting the context, assumptions, and parameters used in your queries ensures that results can be reproduced and verified, which is critical for maintaining data integrity and consistency across reports and analyses.

Maintenance and Scalability: As data projects evolve, queries and data models need to be updated or optimized. Comprehensive documentation makes maintenance easier and supports the scalability of your data infrastructure.

Best Practices for Documenting SQL Queries

Commenting Within Queries: Use comments liberally within your SQL scripts to explain complex logic, the purpose of specific clauses, and the rationale behind choosing certain approaches over others. Comments can be inline or block comments, depending on the complexity of the explanation.

Query Descriptions: At the beginning of each SQL script or stored procedure, include a descriptive header that outlines the script’s purpose, author, creation date, and any modifications along with their dates and reasons.

Data and Metrics Dictionary: Maintain a data dictionary or schema documentation that describes each table and column used in your queries, including data types, constraints, and any relationships between tables. This is especially important in complex databases with numerous tables and fields.

Assumptions and Context: Document any assumptions made during the query development process, such as data cleanliness, uniqueness of key fields, or time zone considerations. Include context about the business logic or analytical goals driving the query’s design.

Result Interpretation: Guide how to interpret the results of your query, especially for complex analyses that might not be immediately intuitive. Include any caveats or limitations of the analysis.

Version Control: Use version control systems like GIT to manage your SQL scripts. This allows for tracking changes over time, understanding the evolution of your queries, and facilitating collaboration among team members.

External Documentation: For complex projects, supplement in-code comments with external documentation, such as Wiki pages, README files, or data analysis reports. These documents can provide broader context, detailed methodology, and analytical insights derived from the queries.

8.9 Pulling Leads and Opportunities into Your Joined Query

The queries from this lab have been re-organized logically before we combine them:

LEADS QUERY

SELECT  
L.Id, 
L.CreatedDate, 
L.ConvertedOpportunityId, 
L.ConvertedAccountId, 
L.Status, 
L.ConvertedDate, 

--CALCULATED FIELDS

'1' as "Lead Count", 
CASE WHEN L.Status = 'Open - Not Contacted' then 0 ELSE 1 END "Qualified Leads" 
FROM 
  SFDC.STAGE.LEAD L;

OPPORTUNITY QUERY

SELECT  
O.Id, 
O.CreatedDate, 
O.ExpectedRevenue, 
O.StageName, 
O.AccountId, 
O.CloseDate, 
O.Type, 
CASE WHEN O.StageName = 'Closed Won' 
AND O.Type = 'New Customer' then 1 ELSE 0 END "New Customer Won Opportunities", 
CASE WHEN O.StageName = 'Closed Won' THEN O.AccountId ELSE NULL END "Closed Won AccountIDs" 
FROM 
  SFDC.STAGE.OPPORTUNITY O;

FULL OUTER JOIN ON LEAD AND OPPORTUNITY

SELECT 
  L.Id as "Lead Id", 
  L.ConvertedAccountId as "Converted Lead Id", 
  O.Id as "Opportunity Id", 
  O.AccountId "AccountId" 
FROM 
  SFDC.STAGE.LEAD L FULL 
  OUTER JOIN SFDC.STAGE.OPPORTUNITY AS O ON L.ConvertedAccountId = O.AccountId;

Step 1: Attempt to merge and append the Lead fields into the full outer Join query.

Click to Reveal Results

Results

RESULT: Your result will appear and run as the following where we have added the calculated lead fields.

SELECT 
  L.Id as "Lead Id", 
  L.ConvertedAccountId as "Converted Lead Id", 
  L.CreatedDate, 
  L.ConvertedOpportunityId, 
  L.Status, 
  L.ConvertedDate, 
  O.Id as "Opportunity Id",   
  O.AccountId "AccountId",
  '1' as "Lead Count", 
  CASE WHEN L.Status = 'Open - Not Contacted' then 0 ELSE 1 END "Qualified Leads" 

FROM 
  SFDC.STAGE.LEAD L FULL 
  FULL OUTER JOIN SFDC.STAGE.OPPORTUNITY AS O ON L.ConvertedAccountId = O.AccountId;

Step 2: Attempt to merge the Opportunity fields into the full outer Join query.

Tips:

  1. You only need to include the Opportunity fields and calculated fields. Do not repeat the SELECT statement.
  2. You may have errors and that is Okay. We will troubleshoot and resolve them together!
Click to Reveal Results

Results

RESULT: Your result should look like the following where you have appended only the Opportunity fields

SELECT 
  L.Id as "Lead Id", 
  L.ConvertedAccountId as "Converted Lead Id", 
  L.CreatedDate, 
  L.ConvertedOpportunityId, 
  L.Status, 
  L.ConvertedDate, 
  '1' as "Lead Count", 
  CASE WHEN L.Status = 'Open - Not Contacted' then 0 ELSE 1 END "Qualified Leads", 
  O.Id as "Opportunity Id", 
  O.AccountId "AccountId", 
  O.CreatedDate, 
  O.ExpectedRevenue, 
  O.StageName, 
  O.CloseDate, 
  O.Type, 
  CASE WHEN O.StageName = 'Closed Won' 
  AND O.Type = 'New Customer' then 1 ELSE 0 END "New Customer Won Opportunities", 
  CASE WHEN O.StageName = 'Closed Won' THEN O.AccountId ELSE NULL END "Closed Won AccountIDs" 
FROM 
  SFDC.STAGE.LEAD L FULL 
  OUTER JOIN SFDC.STAGE.OPPORTUNITY AS O ON L.ConvertedAccountId = O.AccountId;

Resolving SQL Runtime Errors

In this particular example, and in many joined Salesforce tables you will run into conflicting column names. As a result, you will want to get into the habit of aliasing field names that are common across objects.

8.7 Joining Leads and Opportunities

Join on Lead ConvertedOpportunityId

With our two queries defined, it’s time to figure out how we will bring them together via Join. We are going to explore a simpler query to explore our join options in a practical way.

We want to join the Lead and Opportunity tables. To do so, we will first explore joining Lead and Opportunity ON  Lead.ConvertedOpportunityId = OpportunityId.

Before watching the video, trying this exercise on your own.

Click to Reveal Results

Results

SELECT
L.Id,
L.ConvertedOpportunityId,
O.Id  
FROM SFDC.STAGE.LEAD L
LEFT JOIN SFDC.STAGE.OPPORTUNITY O
ON L.ConvertedOpportunityId = O.Id;

Explore How many Converted Opportunities exist?

To determine the total number of opportunities you can count the number of DISTINCT ConvertedOpportunityId records. Snowflake will save you some code by simply clicking on the column and then clicking into the analysis. We see that 4 of 23 records are converted opportunities.

This highlights a real-world consideration for Salesforce in particular that not every Opportunity originates from a Lead unless it is enforced and controlled by your business. Salesforce supports workflows that allow leads to be converted to accounts without an opportunity, and accounts and opportunities to exist without leads. Because this training is designed as a practical application of SQL, let’s first adjust our SQL:

Alter the Join on Lead ConvertedAccountId

Business case: In Salesforce, when you convert a lead to an account, there are many enterprises that do so without an opportunity. To design our query to accommodate this scenario, we are going to modify our query to use the ConvertedAccountId.

Requirement: Instead of joining on ConvertedOpportunityId, lets join on ConvertedAccountId to O.AccountId

Also, let’s alias the “Id” field for the lead as “Lead Id” and the ConvertedAccountId as “Converted Lead Id”

Click to Reveal Results

Results

SELECT
L.Id as "Lead Id",
L.ConvertedAccountId as "Converted Lead Id",
FROM SFDC.STAGE.LEAD L
LEFT JOIN SFDC.STAGE.OPPORTUNITY AS O
ON L.ConvertedAccountId = O.AccountId

As you can see, in our lab data, we get the same result. Our lab data is designed around leads that are converted to opportunities. In a real-world Salesforce org, you could have multiple leads that are convert to an accounts because of lead merging.

Finalizing our Model with a Full Outer Join

Using the ConvertedAccountId, let’s see what happens when we apply a FULL outer Join. This is something we performed earlier in lesson five.

Requirements

  1. Change the query to a full outer join
  2. Add the Opportunity Id field and alias it as “Opportunity Id” so we have visibility to opportunities that did not originate from a lead.
  3. Add the “AccountId” from the Opportunity table.
Click to Reveal Results

Results

If your query looks like the following, you are on track!


SELECT
L.Id as "Lead Id",
L.ConvertedAccountId as "Converted Lead Id",
O.Id as "Opportunity Id",
O.AccountId "AccountId"
FROM SFDC.STAGE.LEAD L
FULL OUTER JOIN SFDC.STAGE.OPPORTUNITY AS O
ON L.ConvertedAccountId = O.AccountId

8.5 Create New Lead Fields with Expressions

To produce some of the metrics in our glossary we will use conditional expressions we learned about in chapter 6.

Lead Count

NameObject & FIeld OriginDate DimensionDescriptionDefinition
Lead CountLead.IdLead.CreatedDateAll LeadsAll leads loaded into Salesforce

Let’s create our first metric which takes us to our very first lab. If you wanted to represent the number ‘1’ for every record and call the field “Lead Count”, how would you format your SQL?

Result: 

'1' as "Lead Count"

Qualified Leads

NameObject & FIeld OriginDate DimensionDescriptionDefinition
Qualified Leads
Lead.LeadStatusLead.CreatedDateLeads WHERE LeadStatus NOT = ‘Open – Not Contacted’Qualified leads are those that are no longer Open-Not Contacted

For our second metric, we want to count “Qualified Leads”, which are represented as leads where the LeadStatus is not “Open – Not Contacted”. We will use the work you performed in lesson 6 to create a conditional expression. There are several approaches to solving this problem. 

Result: 

CASE
WHEN L.Status = 'Open - Not Contacted' then 
ELSE 1 END "Qualified Leads"

You can also use:

IFF (L.Status = 'Open - Not Contacted', 0 ,1) "Qualified Leads2"

Qualified Lead to Won Opportunity Ratio

NameObject & FIeld OriginDate DimensionDescriptionDefinition
Qualified Lead to Opportunity RatioLead.ConvertedOpportunityId
Opportunity.StageName
Lead.CreatedDateWon Opportunities / Total Qualified LeadsRatio of Won Opportunities divided by  Qualified Leads

Our third metric is a ratio which requires a numerator and denominator to successfully create the metric. We have already created the denominator, Total Qualified Leads. To create the numerator, “Won Opportunities” we need data from the Opportunity Object which will come later.

Additionally, the ratio we are looking for is a rollup/ aggregated value, so that will require some additional SQL that we will create together. We will come back and revisit this Lead to Won opportunity ratio metric later in this lesson!

Final Lead Query Result

SELECT 
L.Id,
L.CreatedDate,
L.ConvertedOpportunityId,
L.ConvertedAccountId,
L.Status,
L.ConvertedDate,
'1' as "Lead Count",
CASE
WHEN L.Status = 'Open - Not Contacted' then 0
ELSE 1 END "Qualified Leads",
FROM SFDC.STAGE.LEAD L;

Other query considerations based on previous lessons:

Alias

As you may have noticed, we only have alias fields that originate directly from the Leads table and not newly created fields.

WHERE (filter Data)

For the lead object, we have not defined any constraints that would filter and remove lead records from our result. In other words, we do not need a WHERE clause.

GROUP BY (group and aggregate)

At this stage, the lead record does not need to be aggregated. In other rods, we do not need a GROUP BY clause.