As Salesforce has expanded its reach beyond CRM over the past 15 years, its document management capabilities have kept pace. Many businesses use Salesforce to attach various types of documents to specific records, right out of the box. With built-in support and compatibility with third-party tools like Docusign, Salesforce simplifies the process of uploading and streaming documents for end users.
But what if your document library grows and suddenly you find yourself needing to batch-process, transfer, or migrate these files? You might end up down a technical rabbit hole, exploring Salesforce’s data models, APIs, and a plethora of third-party tools, all without finding a straightforward way to extract your documents from Salesforce.com.
We had 4 key requirements that led us to create our own solution:
- We needed to select and tag a finite number of salesforce records for which documents existed
- We needed to select specific file types
- We required only the latest version of documents
- We needed to rename and prefix the files with data from the salesforce record and organize the files into sub folders.
How do we make Exporting Salesforce Documents Easy?
In my mind, simplicity is the level of effort and friction to get my desired outcome. In my case, I chose Azure DataFactory due to my experience and success building data pipelines. Configuring document migration in Azure Data Factory is still a few hours of work, but the level of effort to execute the migration is a single click. This article explores the complexities you need to understand before embarking on a Salesforce document migration.
Understanding Salesforce Attachment vs ContentDocument
When you’re in the trenches of Salesforce’s data architecture, trying to extract documents, you’ll encounter two main objects: Attachment and ContentDocument. These objects function differently when it comes to extraction, and understanding their nuances is crucial for a smooth operation.
Attachments are straightforward but limited. If your documents are stored as Attachments, you’ll likely need to perform a record-by-record extraction via Salesforce’s API. This is because each Attachment is directly tied to a single Salesforce record. It’s a one-to-one relationship, which makes extraction easier.
In contrast, ContentDocument is part of Salesforce’s more modern and robust Files architecture. It allows for file versioning and can be associated with multiple records via the ContentDocumentLink junction object. However, it’s not all smooth sailing here either; Salesforce restricts your ability to perform bulk queries and downloads of ContentDocument objects. You may have to employ more sophisticated methods including code or third party tools.
Overcoming Key Limitations for Managing Salesforce Document Metadata
Query All Files in Salesforce
Salesforce by default, limits your ability to query and extract metadata for all of your documents. To query all files, your salesforce admin needs to add a permission set. This article explains step by step how to Query All FIles to obtain a complete list of your documents.
With this data, you technically have all of the medata you need to start downloading files. The problem is that unless you have configured the ParentId to associate your documents with another object, you lose context to what that document is related to. In other words, you could download your vendor invoice but have no data to know what customer or deal that invoice belongs to.
ContentDocumentLink is the Missing Link
To connect your Salesforce documents to the parent record, Salesforce has a junction object called ContentDocumentLink. The problem is salesforce does not allow you to query and download all of the records in bulk.
To solve this problem, I employed Azure data factory to obtain all of my documents, select only the document records I care about, then one by one query and fill a database table with all of the ContentDocumentLink records. At this point, to make my life simple, I appended the additional data points I would use to ultimately rename my files and sub folders from the parent Opportunity record.
Bulk Download Salesforce Documents
With a database table containing all of my DocumentAttacmentLinks + ContentDocument records, built an Azure DataFactory flow that used the Salesforce REST API to GET each file 1 by 1 and loaded it into Azure Blob Storage. I could have loaded it into another storage solution like Google Drive but opted to keep it in Azure.
Are you Bulk Exporting Documents and Attachments?
We are always looking for better and faster ways to get data in and out of Salesforce. If you have another third party tool or process makes this faster and easier we would love to work with you!
If you need help offloading the effort to get your documents and attachments out of Salesforce, feel free to book a meeting with us.