Skip to main content

 About Ryan Goodman

Ryan Goodman has been in the business of data and analytics for 20 years as a practitioner, executive, and technology entrepreneur. Ryan recently created DataTools Pro after 4 years working in small business lending as VP of Analytics and BI. There he implanted an analytics strategy and competency center for modern data stack, data sciences and governance. From his recent experiences as a customer and now running DataTools Pro full time, Ryan writes regularly for Salesforce Ben and PactBub on the topics of Salesforce, Snowflake, analytics and AI.

Salesforce Document Management And Attachment Export Made Easy

Document Madness

As Salesforce has expanded its reach beyond CRM over the past 15 years, its document management capabilities have kept pace. Many businesses use Salesforce to attach various types of documents to specific records, right out of the box. With built-in support and compatibility with third-party tools like Docusign, Salesforce simplifies the process of uploading and streaming documents for end users.

But what if your document library grows and suddenly you find yourself needing to batch-process, transfer, or migrate these files? You might end up down a technical rabbit hole, exploring Salesforce’s data models, APIs, and a plethora of third-party tools, all without finding a straightforward way to extract your documents from Salesforce.com.

We had 4 key requirements that led us to create our own solution:

  1. We needed to select and tag a finite number of salesforce records for which documents existed
  2. We needed to select specific file types
  3. We required only the latest version of documents
  4. We needed to rename and prefix the files with data from the salesforce record and organize the files into sub folders.

How do we make Exporting Salesforce Documents Easy?

In my mind, simplicity is the level of effort and friction to get my desired outcome. In my case, I chose Azure DataFactory due to my experience and success building data pipelines. Configuring document migration in Azure Data Factory is still a few hours of work, but the level of effort to execute the migration is a single click. This article explores the complexities you need to understand before embarking on a Salesforce document migration.

Understanding Salesforce Attachment vs ContentDocument

When you’re in the trenches of Salesforce’s data architecture, trying to extract documents, you’ll encounter two main objects: Attachment and ContentDocument. These objects function differently when it comes to extraction, and understanding their nuances is crucial for a smooth operation.

Attachments are straightforward but limited. If your documents are stored as Attachments, you’ll likely need to perform a record-by-record extraction via Salesforce’s API. This is because each Attachment is directly tied to a single Salesforce record. It’s a one-to-one relationship, which makes extraction easier.

In contrast, ContentDocument is part of Salesforce’s more modern and robust Files architecture. It allows for file versioning and can be associated with multiple records via the ContentDocumentLink junction object. However, it’s not all smooth sailing here either; Salesforce restricts your ability to perform bulk queries and downloads of ContentDocument objects. You may have to employ more sophisticated methods including code or third party tools.

Overcoming Key Limitations for Managing Salesforce Document Metadata

Query All Files in Salesforce

Salesforce by default, limits your ability to query and extract metadata for all of your documents. To query all files, your salesforce admin needs to add a permission set. This article explains step by step how to Query All FIles to obtain a complete list of your documents.

With this data, you technically have all of the medata you need to start downloading files. The problem is that unless you have configured the ParentId to associate your documents with another object, you lose context to what that document is related to. In other words, you could download your vendor invoice but have no data to know what customer or deal that invoice belongs to.

ContentDocumentLink is the Missing Link

To connect your Salesforce documents to the parent record, Salesforce has a junction object called ContentDocumentLink. The problem is salesforce does not allow you to query and download all of the records in bulk.

To solve this problem, I employed Azure data factory to obtain all of my documents, select only the document records I care about, then one by one query and fill a database table with all of the ContentDocumentLink records. At this point, to make my life simple, I appended the additional data points I would use to ultimately rename my files and sub folders from the parent Opportunity record.

Bulk Download Salesforce Documents

With a database table containing all of my DocumentAttacmentLinks + ContentDocument records, built an Azure DataFactory flow that used the Salesforce REST API to GET each file 1 by 1 and loaded it into Azure Blob Storage. I could have loaded it into another storage solution like Google Drive but opted to keep it in Azure.

Are you Bulk Exporting Documents and Attachments?

We are always looking for better and faster ways to get data in and out of Salesforce. If you have another third party tool or process makes this faster and easier we would love to work with you!

If you need help offloading the effort to get your documents and attachments out of Salesforce, feel free to book a meeting with us.

Ultimate Salesforce and Snowflake Guide on Salesforce Ben

Salesforce Ben

This week Ryan released a guide for Salesforce and Snowflake on Salesforce Ben. Salesforce Ben is the leading independent Salesforce.com community and authority on all things Salesforce.com.

Snowflake and Salesforce is a perfect marriage of cloud business applications and cloud data platform to turn data into information. Salesforce has built a powerful first-class integration within Salesforce Data Cloud that is the most advanced of any third party connectivity

If you are currently using Salesforce Data Cloud or Salesforce Tableau CRM this article is for you. Additionally, while at SnowflakeSummit2023, we saw some incredible Salesforce Data Cloud enhancements for Snowflake that will be game changing for enterprise cusetomers.

We can’t wait to write about upcoming zero copy feature from Salesforce to Snowflake. Included in our article is step by step tutorials how to integrate Salesforce with Snowflake to day. Should you have any questions how these capabilities apply to your enterprise or how Snowflake can advance your Salesforce analytics, we are here to help!

Snowflake and Microsoft Expand their Data and AI Partnership

Microsoft and Snowflake Logos

Snowflake and Microsoft, announced a press release at Snowflake Summit 2023 that they are expanding their partnership promising substantial advancements for data scientists and developers. This enhanced collaboration is set to seamlessly merge Snowflake’s Data Cloud with Microsoft’s Azure ML, extending its capabilities through the potent combination of Azure OpenAI and Microsoft Cognitive Services.

This strategic alliance means that Snowflake and Microsoft Azure shared customers will gain access to the cutting-edge frameworks of Azure ML, a streamlined process for machine learning development right through to production, along with integrated continuous integration and continuous deployment (CI/CD) processes.

But this partnership doesn’t stop there. Snowflake is setting its sights on creating even more meaningful integrations with a host of Microsoft offerings, aiming to elevate the user experience even further. These plans include closer ties with Purview for advanced data governance, Power Apps & Automate for simplified, low code/no code application development, Azure Data Factory for efficient ELT processes, and Power BI for intuitive data visualization, among others.

The end goal? To foster a seamless ecosystem that capitalizes on the synergies between Snowflake and Microsoft’s product suites, unlocking new possibilities and delivering unparalleled value to users.

At DataTools Pro, we couldn’t be more excited to see our favorite data platform, Snowflake, with new enhancements that make data management easier. Azure balances powerful data management with scalable cost that makes sense for our clients. Additionally PowerBI continues to advance its dominance for Business Intelligence. We have been working with Snowflake and Microsoft together for years and have built a toolkit that can help you jumpstart Snowflake and Azure integration.

Learn how to use Azure Data Factory and Snowflake Together

We have created free interactive step by step tutorials to help you get started!

Create a Snowflake Data Source in Azure Data Factory

Create a Data Pipeline to Connect Salesforce to Snowflake

Publish your ADF Pipeline, Data Sets, and Triggers

Create an ADF Scheduled Trigger

VIEW ALL TUTORIALS

Azure Data Factory for Snowflake Articles

More Getting Started Tutorials

Snowflake Warehouse Management with ROI in Mind

Snowflake Cost Management

If you’re new to Snowflake, you might be confused by the term “Warehouse”. Don’t let it fool you, because in Snowflake’s context, Warehouse refers to virtual compute resources rather than a physical storage place. Snowflake Warehouse management for small BI and analytics teams is fairly straight forward if you start off on the right foot.

A majority of Snowflake’s cost is based on warehouse (compute) utilization. Therefore, it’s crucial to be thoughtful about how you design and deploy your Warehouses to optimize your usage and minimize your cost.

Segmentation of Warehouses

One of the key factors in optimizing your Snowflake Warehouse is segmentation by use case and spend categorization. For instance, our Snowflake instance currently consists of 5 warehouses, with each one serving a specific purpose. We started with X-Small or Small instances that can process thousands up to tens of millions of records, and gradually scaled up as needed.

However, over-segmenting and creating too many warehouses is not recommended. This can lead to unnecessary concurrent warehouse instances and significantly increase your spend. Additionally, detailed spend tracking can become very expensive and difficult to manage. Therefore, it’s important to strike a balance between segmentation and cost optimization to achieve the best outcome for your Snowflake usage.

Warehouse Segments and Lessons Learned…

Read more on our Medium Blog

Webinar: Streamlining Data Migration to Salesforce in 2024 using Datameer and Snowflake

Datameer Webinar

View our recorded webinar with Datameer and Data Tools Pro discover how you can optimize your Salesforce data migration process in 2023 using a reverse ETL process powered by Datameer and Snowflake. This webinar will be presented by Datameer and Ryan Goodman, creator of Data Tools Pro. Ryan will showcase how Datameer has been the secret sauce to accelerate a reverse ETL data stack to effectively prepare, transform, and analyze data.

Whether you’re a Salesforce administrator, a data analyst, or data professional, this webinar will equip you with practical insights to streamline your data migration. Don’t miss out on this opportunity to learn and ask live questions how to enhance your data migration practices.

What you will Learn?

During the webinar, we’ll cover , real-world examples of successful reverse ETL scenarios for Snowflake specifically for Salesforce. Additionally, Ryan will share best practices and pitfalls to avoid during a typical Salesforce.com data migration.

On Demand Recording

Name(Required)
This field is for validation purposes and should be left unchanged.

How my Snowflake Powered Lead Distro Test Turned Out to be Reverse ETL

Snowflake Cloud Data Pipelines for Reverse ETL

A year ago, I worked on a small project to help us improve our data driven funnel. I learned what I called “Snowflake to Salesforce analytics sync” had a more buzzworthy term called “Reverse ETL.” This article shares some of the lessons learned along the way and some thoughts about where reverse ETL is headed.

Low Level of Effort Solution

All of the data and metrics were already available and calculated in Snowflake for reporting, so the process to push those measurements back into a Salesforce object using Azure Data Factory was quite simple.

The transformation work was prepared using Datameer on top of Snowflake which I had previously written about: Slice Through your Snowflake Data like a Buzzsaw with Datameer

Creating Snowflake UDFs with ChatGPT: A Guide for Analysts

Chat GPT Snowflake UDF Developer Chat GPT Bot

As data analysts, we often find ourselves needing specialized functions in Snowflake. Working in Financial services, there are specific Excel functions that provide significant value. I had a need but developer resources were not readily available… Until ChatGPT changed everything!

Now, with the help of ChatGPT even non-developers can prototype and experiment and contribute powerful capabilities. For Snowflake User Defined Functions (UDFs) in particular, ChatGPT is a game changing resource for self paced learning, debugging, and translating existing concepts and patterns you know into Snowflake.

Datameer is a Cutting Edge Solution for Snowflake Data Preparation

Slice through your Snowflake data with Datameer

Snowflake has helped democratize the data platform eliminating layers of technology and administrative traditionally required to enable data workers. The next generation of data preparation tools arrived in recent years and continues to accelerate the process for preparing business ready data assets.

In my previous role managing data and analytics, I had a big problem. All of my data was staged in Snowflake but data engineering was backlogged with requests and my analysts were stuck between SQL and the last generation of desktop BI data preparation tools. I solved this problem and tripled my team’s throughput with Datameer.

Datameer is a native Snowflake data preparation and analysis solution that does not require extracting any data out of Snowflake. I used for all of my BI reporting and dashboard projects, allowing me to roll out 3 times more business ready data assets in 2 months than the first 5 months of my Snowflake initiative.

There are a few key features of Datameer that helped us wrangle data faster. Enterprise data is imperfect, so you need the right tools to profile, explore, and understand imperfections while you build.

Data-Driven Join Analysis

One of these features is Join Analysis, which offers unmatched Rows that enable me to quickly see which records fall out of the left and right side of the join. With this feature, I can easily identify records that are missing IDs or recognize that I didn’t fully understand the grain of data before I joined. The Join Keys analysis feature also identifies duplicate records and highlights which data source is causing duplicates or potential cartesian products as a result of duplicate keys. These features enable me to understand my data both inside and out of each join, allowing me to move forward more efficiently in my data flow.

Tutorial: Learn how to join data intelligently with Datameer

Explore and Share Data in One Place

Another useful feature of Datameer is inline no-code Data Exploration. This feature is essential when exploring data and validating it with collaborators. Datameer provides intuitive and fast exploration capabilities so you can create many cuts of data through your data pipeline. You can employ filtering, aggregation, binning, and sorting. It only takes about 5 minutes to master this feature, and it has enough functionality to cover most real-world slicing and dicing. For repeatable or reusable scenarios, the exploration nodes feature enables me to make my exploration view available or deploy it as its own view back to Snowflake for recurring validation.

Tutorial: Explore and share data in Datameer

One Click Field Profiling

Field Exploration is yet another useful feature of Datameer, as it prepares a summary profile for each field and provides a visual reference point for quickly identifying outliers, NULLs, district values, and unique records. This feature is similar to Snowflake and helps me quickly and efficiently understand my data.

Tutorial: Field Profiling and Exploration

Datameer is No-code where you want it.. SQL Coding where you need it

Datameer offers a no-code user experience that will technically allow you to build and deploy business intelligence views and tables without writing a line of code. There are conversely many experienced SQL developers who are more proficient writing SQL than using no-code interfaces. Datameer is best of both worlds because you can visually abstract your SQL code into a flow and have all Snowflake SQL functions on hand. This way you can still beneifit from the aforementioend features while coding. Datameer with generate your SQL as a CTE that runs natively on Snowflake.

Change Tracking and Revisions

In addition to a rich meta and tagging, Datameer offers deployment history and version control natively, allowing you to comment revisions, single click restoration to previous deployments in Snowflake, and full access to the SQL code.

Overall, I am impressed with Datameer’s capabilities and look forward to every release with incremental updates focused on bringing data teams and data analysts together in a practical solution.