The Salesforce Data Management Platform (DMP) allows publishers and marketers to collect and organize all of their consumer web data in one central "big-data" warehouse. Using Salesforce DMP, organizations can consolidate and reconcile their first-party "web-behavior" data that is generated by consumers as they engage with media and/or content with data from various third-party data providers like eXelate, DataLogix, Targus etc. Organizations can also integrate data from their user registration and subscription databases (hereon referred to as “first-party data”) into Salesforce DMP and join that data with web behavior and third-party data.
User data stored in Salesforce DMP is keyed off a global Krux User ID. The Krux User ID is a third-party cookie and as such is specific to a browser (and corresponding device too). All data that is collected by Salesforce DMP from online environments is automatically keyed off the Krux User ID.
Salesforce DMP supports the ingestion of first-party data available in client systems like registration or subscription databases and other CRM systems. This data is keyed off the client’s first-party User ID. A User Matching process that maps the client first-party User ID to the corresponding Krux User ID for every applicable user facilitates the onboarding and ingestion of this data.
After the user matching process has been setup clients can send first-party data corresponding to the matched users. Salesforce DMP has a strict data format for the first-party data outlined below.
In this document, we describe the user matching process between Salesforce DMP and the client and provide details on the standard data format for first-party data that is supported by Salesforce DMP. The rest of this document is organized as follows: Section 2 describes the user matching process, Section 3 provides details on the first-party data formats and Section 4 describes the steps required to configure and activate first-party datasets in Salesforce DMP.
The Salesforce DMP Control Tag will create a User Attribute for the first-party User ID. That User ID will be passed to Salesforce DMP’s data collection servers via the “pixel.gif” beacon call that sends data from the page to Salesforce DMP for every page view. This User Attribute is marked by Salesforce DMP as a special “User-ID” attribute and is not visible in the Attributes report or in the Segment Builder. However, the User Match Report provides details on how many users have been matched over a rolling 90 and 30-day window.
The main output of the First-Party User Match process is a User Match table that maps the client’s first-party User ID to its corresponding Krux User ID for a given browser/device. This User Match table has 2 columns:
- Client User ID: this is the client’s first-party User ID for the user
- Krux User ID: this is the Krux User ID for the user
The User Match table is constructed in 2 steps:
- First, we create a daily user match table that has the following columns:
- Client User ID
- Krux User ID
- Timestamp: this represents the latest time we saw a mapping from the client’s first-party User ID to the corresponding Krux User ID
- We then combine this daily data with the “current” user match table and pick out all the unique combinations of Client User ID and Krux User ID. The current table will be empty for the first day and the data for day N will be constructed using the “current” table from day N – 1 and the daily data for day N
The Client User ID will have a one-to-many relationship with the Krux User ID in the User Match table. Note that in general, there may be multiple Krux User IDs mapped to a given Client User ID because the Krux User ID is browser specific whereas the Client User ID will be the same across multiple browsers (assuming that it is indeed a persistent user ID as opposed to a first-party cookie).
First-Party Data Format
Salesforce DMP implements a “schema-on-read” model and as such does not have a fixed schema for its underlying User Data model. This results in the following benefits for our clients:
- There is no “data-mapping” process that maps the client’s data schema to the Salesforce DMP schema. This reduces the time it takes to process and load the data into Salesforce DMP.
- Each client effectively gets a custom schema in Salesforce DMP that is represented as a collection of attributes and attribute values.
- Columns from the client’s data schema are treated as attributes and can be created on the fly. The Salesforce DMP ETL platform automatically identifies new attributes (or columns) and attribute values in the input data and load them into Salesforce DMP.
While the Salesforce DMP ETL system can work with almost any data format that is provided by the client’s systems, The preferred format for the first-party data is LZO but Salesforce DMP also accepts CSV and GZIP. LZO format and some additional requirements are described below:
- Since Salesforce DMP uses Hadoop for big data processing, Salesforce DMP recommends that data files should be compressed using LZO and each LZO file should have a corresponding index file. Please see
https://github.com/tcurdt/lzo-indexfor more details on LZO compression and how to use it with Hadoop.
- Each line in the data file for first-party data must contain all the data for a given user and should be delimited using the CARET (^) character.
- Each line in the data file should contain 2 columns:
- The first column represents the first-party User ID and must match the client first-party User ID that is used in the user matching process described in section 2
- The second column contains all the data associated with the user and should be in the following format:
- Each row can contain an arbitrary number of attribute value combinations
- ! It is not mandatory for values to specified for all attributes for a given user
- ! If a user is a member of multiple values, they can be comma separated:
Note: In the event your attribute value contains a comma or a semicolon in the value and you want it to appear as so in the UI then you must wrap the attribute value in quotes. Example:
attribute3:attribute_value1,"attribute value, with a comma","attribute value; with a semicolon"
For example, if there are 3 columns called Age Group, Gender, and Interest in the client’s registration database that need to be imported into Salesforce DMP, then the following represents a valid data file that can be ingested by Salesforce DMP:
When Salesforce DMP processes this file, we will create 3 attributes called Gender, Age, and Interest in the client’s Salesforce DMP account. We will also automatically create the corresponding attribute values: male and female under gender and 18-24, 35-44 under age, and fishing and boating under interest.
Salesforce DMP supports full-refresh.
Configuration and Setup
Your Salesforce DMP account manager will work with you to activate the Control Tag required to setup the user matching process. The account manager will also work with you to setup the secure data transfer process for the first-party data feed.
Salesforce DMP will create a bucket for the client in Krux’s Amazon S3 account and the client can upload the data files to S3 as per a mutually agreed upon schedule.
File Compression: (Specific to MAC users)
- Install Homebrew: http://brew.sh/
- Paste the below line into your computer’s Terminal:
ruby –e “(curl –fsSl https://raw.githubusercontent.com/Homebrew/install/master/install)"
- In Terminal: type
brew install lzop
- Confirm that LZOP has been installed: Type
- Program details will appear with available Commands and Options
- Use LZOP to compress your file
- Type in the path to the folder that your file lives in and hit ‘enter’
lzop [file name]and hit ‘enter’
- The file will appear in the same folder as the original file as a compressed LZO file
Additional First-Party Data Import Instructions
You can send a sample of the data to your Salesforce DMP Implementation team to validate before sending to the Amazon S3 bucket (S3 access information will be provided securely via Box.com).
Data Privacy and Security Considerations
Salesforce DMP does not accept PII data and it is a breach of the MSA to send to us. Onboarding might be a better option if you have data connected to email addresses and are concerned with offline data import user matching. Usage and set up fees apply.
The file should be complete. All fields and active users should be present. If you import a file with fields “Registration – Age” and “Registration - Cable Provider” values on 10/12 and then for the next import on 11/12 you just provide a file with “Registration – Age”, then “Registration – Cable” data will be removed from the platform. The full refresh will only overwrite the standard import file, not any other data from any other source (i.e. website). A full refresh of users means that every user you want Salesforce DMP to be aware of needs to be included in the file on every update.
Attribute Display Names
User friendly display names for each attribute and the code they correspond to have to be sent separately from the import file. The system takes the code in the file import and automatically creates attributes, so we have to name them separately on the backend.
The date directory should be the day you send over the data, where YYYY-MM-DD is the date format. If you send the data on 10/12 but the date directory is 2015-10-11 then the file will not be imported. For each individual attribute, you can view the most recent date of the import in the Attributes tab to ensure the import worked properly.
File Drop Considerations
Omit any headers in the first row of the file.
The name of the file cannot have spaces or special characters. - or _ can be used.
Make sure your upload file name is as short and precise as possible. If the name of the file uploaded is too long, it might exceed the max length cap (approximately 32 characters), and AWS will truncate the name and cause a “file not found” error.
- Example file directory format + file name:
- The date should be the day you upload
- File name example =
Salesforce DMP stores any data collected through online mechanisms (including user matching data) for a variable period as defined in the contractual agreement between Salesforce and the client. First-party imports remove any data not included in subsequent imports (full refresh). For best user match results, files should be uploaded frequently, or daily, even if the data output has no change.
Amazon S3 Bucket Information
Here is a general website regarding connecting to the Amazon S3 bucket via Cyberduck (though, Terminal can also be used): https://trac.cyberduck.io/wiki/help/en/howto/s3#ConnectingtoAmazonS3
- Download https://cyberduck.io
- Once installed, please follow these instructions to test your credentials supplied
- Open Cyberduck and click Open Connection in the top left.
- Select S3 (Amazon Simple Storage Service) from the drop down box at the top (if the server is not showing as s3.amazonaws.com and port is not showing as 443, the wrong connection was selected in the drop down)
- Username: (Paste the Access Key ID supplied)
- Password: (Paste the Secret Access Key supplied)
- Select the arrow by more options and Path:
- Select Connect
- She should see a file called _SUCCESS
- Click on Action Wheel and she can select upload
- The file needs to be UTF-8 encoding (or us-ascii)
- The file cannot have line breaks. Here is an article on how to fix line breaks when a file is saved via a PC/Mac vs Linux/Unix. Please refer to problem 5. If you would like to upload a test file, your Salesforce DMP representative will view it to verify there are no line breaks. Your IT department should be able to assist with formatting in Terminal.
- Please let us know once the file is uploaded. Once it is uploaded, our engineers have to deploy the pipeline. A safe timeline would be 2 weeks for this to be completed.
- Moving forward, the file can be dropped to the S3 by you and the Salesforce DMP system will automatically ingest it. The date should be the day you send the file so the system will ingest it properly. Please make sure the format is correct.
- Reminder: we only accept full refresh files. Existing data within Salesforce DMP that is not included on the most recent file will be removed from the system