Mobile Log Import

The Salesforce Data Management Platform (DMP) allows publishers and marketers to collect and organize all of their consumer data in one central "big-data" warehouse. Using Salesforce DMP, organizations can consolidate and reconcile their data that is generated by consumers as they engage with mobile application content.

Mobile log data stored in Salesforce DMP is keyed off of the IDFA or AAID, depending on the mobile OS. The IDFA or AAID acts as a Krux User ID (KUID) within mobile applications and is a unique identifier. All data that is collected by Salesforce DMP from mobile applications is automatically keyed off the IDFA.

Mobile Log Data Format

Salesforce DMP implements a "schema-on-read" model and as such does not have a fixed schema for its underlying User Data model. This results in the following benefits for our clients:

  • There is no "data-mapping" process that maps the client's data schema to the Salesforce DMP schema. This reduces the time it takes to process and load the data into Salesforce DMP.
  • Each client effectively gets a custom schema in Salesforce DMP that is represented as a collection of attributes and attribute values.
  • Columns from the client's data schema are treated as attributes and can be created on the fly. The Salesforce DMP ETL platform automatically identifies new attributes (or columns) and attribute values in the input data and loads them into Salesforce DMP.

While the Salesforce DMP ETL system can work with almost any data format that is provided by the client's systems, the preferred format for the mobile log data is LZO but Salesforce DMP also accepts CSV and GZIP. LZO format and some additional requirements are described below:

  • Since Salesforce DMP uses Hadoop for big data processing, Salesforce DMP recommends that data files should be compressed using LZO and each LZO file should have a corresponding index file. Please see https://github.com/twitter/hadoop-lzo and https://github.com/tcurdt/lzo-index for more details on LZO compression and how to use it with Hadoop.
  • Each line in the data file for mobile log data must contain all the data for a raw Android (AAID) or iOS (IDFA) device, referred to as Mobile ID in this document.  By raw, the Mobile ID should be case sensitive between Android and iOS.
  • Delimit the columns below using the CARET (^) character.
  • Each line in the data file may contain up to 6 columns:
    1. The first column contains the Mobile ID (AAID or IDFA) of the device (required)
      • Example: 7D92078A-8247-5BA4-AE5B-76104561E7EC
    2. The second column (_kx_ts) contains the timestamp (required)
      • Example: _kx_ts:1488222389789
    3. The third column (_kx_ip) contains the ip address (required)
      • Example: _kx_ip:192.168.23.24
    4. The fourth column (optional) (_kx_fpid) contains the first party id variables
      • Can have multiple variable names, semi-colon delimited
      • Ex: _kx_fpid:cid=id1;cid2=id
    5. The fifth column (optional) (_kx_page) contains the Page Attributes associated with the mobile ID being passed in and should be in the following format:
      • _kx_page:attribute1=attribute_value1;attribute2=attribute_value2
      • Each row can contain an arbitrary number of attribute value combinations
      • It is not mandatory for values to be specified for all attributes for a given ID
      • If an ID is a member of multiple attribute values, the values can be comma separated:
        _kx_page:attribute1=attribute_value1,attribute_value2
    6. The sixth column (optional) (_kx_user) contains the User Attributes associated with the mobile ID being passed in and should be in the following format:
      • _kx_user:attribute1=attribute_value1;attribute2=attribute_value2
      • Each row can contain an arbitrary number of attribute value combinations
      • It is not mandatory for values to specified for all attributes for a given ID
      • If an ID is a member of multiple attribute values, the values can be comma separated:
        _kx_user:attribute1=attribute_value1,attribute_value2

The overall format will be:

idfa^_kx_ts:value^_kx_ip:value^_kx_fpid:cid=id1;cid2=id2^_kx_page:a1=v1;a2=v3,v4^_kx_user:a3=v3;a4=v4;a5=v5

For example, if there are 2 page columns called App Section and App Subsection, and 3 user columns called Age Group, Gender, and Interest in the client's mobile logs that need to be imported into Salesforce DMP, then the following represents a valid data file that can be ingested by Salesforce DMP:

6E12078A-8246-4BA4-AE5B-76104861E7FC^_kx_ts:Fri Feb 26 22:01:14 UTC 2016^_kx_ip:10.50.2.24^_kx_fpid:uaId=user123^_kx_page:app_section=maps;app_subsection=activitity^_kx_user:gender:male;age:18-24;interest:fishing
8D92078A-8246-4BA4-ZT5B-76104861E7DA^_kx_ts:Fri Feb 26 22:01:14 UTC 2016^_kx_ip:10.23.12.74^_kx_fpid:uaId=user234^_kx_page:app_section=maps^_kx_user:gender:female
6H52078A-3421-8XA7-OE5B-78121861E7DJ^_kx_ts:Fri Feb 26 22:01:14 UTC 2016^_kx_ip:10.70.2.66^_kx_fpid:uaId=user345^_kx_page:app_section=maps;app_subsection=activitity^_kx_user:age:35-44
2B52078A-8765-2MA3-OX1Q-42327860A0LI^_kx_ts:Fri Feb 26 22:01:14 UTC 2016^_kx_ip:10.60.11.43^_kx_fpid:uaId=user456^_kx_page:app_section=maps;app_subsection=activitity^_kx_user:gender:male;interest:fishing,boating
8D92078A-8246-4BA4-ZT5B-76104861E7DA^_kx_ts:Fri Feb 26 22:01:14 UTC 2016^_kx_ip:10.23.12.74^_kx_fpid:uaId=user678^_kx_page:app_section=maps^_kx_user:gender:male
l6aq7ebc-kll2-82z0-809d-wwmk7sa876em^_kx_ts:Fri Feb 26 22:01:15 UTC 2016^_kx_ip:10.55.2.4^_kx_fpid:uaId=user789^_kx_page:app_section=maps;app_subsection=activitity^_kx_user:age:35-44
96bd03b6-defc-4203-83d3-dc1c730801f7^_kx_ts:Fri Feb 26 22:01:15 UTC 2016^_kx_ip:10.31.40.7^_kx_fpid:uaId=user012^_kx_page:app_section=maps;app_subsection=activitity^_kx_user:gender:male
5eb10xb2-cmek-1282-93sz-io7cp348a1e1^_kx_ts:Fri Feb 26 22:01:15 UTC 2016^_kx_ip:10.66.71.71;^_kx_fpid:uaId=user999^_kx_page:section=maps^_kx_user:gender:female

When Salesforce DMP processes this file, we will create 2 page attributes called app_section and app_subsection and 3 user attributes called Gender, Age, and Interest in the client's Salesforce DMP account. We will also automatically create the corresponding attribute values: maps under app_section, activity under app_subsection, male and female under gender and 18-24, 35-44 under age, and fishing and boating under interest.

Mobile logs can be loaded in 3 ways:

  1. Full Refresh: The file contains all users and all data for those users.  All data for the attributes in this file will be removed and replaced with the data in the upload.
  2. Incremental Refresh with Cumulative Data:  Each day's data can contain full or partial data for a user that will be used for creating audiences. The values collected for that day will be used in segment calculations in addition to any other values passed in on previous days (typically a rolling 90 day window).
  3. Incremental Refresh with Full Data: These loads are treated as incremental loads: Each day's data should contain all data for a user that will be used for creating audiences, but does not have to contain all users in the day's data. For the users passed in, all data will be replaced with the data in this file.

File Compression: 

Salesforce DMP's big data processing is Hadoop based, therefore it is highly preferably to send data files using LZO compression with index files.  

If using gzip compression, the max file size should not be over 1 GB.

(Specific to MAC users with data files)

  • Install Homebrew: http://brew.sh/
  • Paste the below line into your computer’s Terminal:
    ruby –e "(curl –fsSl https://raw.githubusercontent.com/Homebrew/install/master/install)"
  • In Terminal: type brew install lzop
  • Confirm that LZOP has been installed: Type lzop
    - Program details will appear with available Commands and Options
  • Use LZOP to compress your file
    - Type in the path to the folder that your file lives in and hit ‘enter’
    - Type:lzop [file name] and hit ‘enter’
  • The file will appear in the same folder as the original file as a compressed LZO file

Additional Instructions

You can send a sample of the data to your Salesforce DMP Solutions Engineering team to validate before uploading to the Amazon S3 bucket (S3 access information will be provided securely by Salesforce DMP).

Data Privacy and Security Considerations

Salesforce DMP does not accept PII data and it is a breach of the MSA to send to us. For more on PII see our Opt-out Guide.

Attribute Display Names

User-friendly display names for each attribute and the code they correspond to have to be sent separately from the import file. The system takes the code in the file import and automatically creates attributes, so we have to name them separately on the back end.

For each individual attribute, you can view the most recent date of the import in the Attributes tab to ensure the import worked properly.

File Drop Considerations

This integration requires mobile log files to be sent to Salesforce DMP using Amazon S3. The Client can make the files available on a specified bucket in Krux's Amazon S3 account for ingestion. 

  • The directory structure should be: client-<client>/uploads/mobilelogs/YYYY-MM-DD/
    • Example:  krux-partners/client-abc/uploads/mobilelogs/2017-03-01/abc_mobilelogdata_20170301.csv.gz
  • Data is processed for the "last day".  For example on March 2 we will process data in the March 1 folder (/2017-03-01/)
  • The date in the folder path should be the day the file is uploaded (YYYY-MM-DD)
  • Make sure your upload file name is as short and precise as possible. AWS will truncate the name (approximately 32 characters) and cause errors if it is too long.
  • Files may be zipped or unzipped (multiple files in each daily folder are also accepted)
  • A header row should not be included in the file
  • The name of the file cannot have spaces or special characters. - or _ can be used.

Amazon S3 Bucket Information

Here is a general website regarding connecting to the Amazon S3 bucket via Cyberduck (though, Terminal can also be used): https://trac.cyberduck.io/wiki/help/en/howto/s3#ConnectingtoAmazonS3

Cyberduck Instructions

      • Download https://cyberduck.io
      • Once installed, please follow these instructions to test your credentials supplied
      • Open Cyberduck and click Open Connection in the top left.
      • Select S3 (Amazon Simple Storage Service) from the drop down box at the top (if the server is not showing as s3.amazonaws.com and port is not showing as 443, the wrong connection was selected in the drop down)
      • Username: (Paste the Access Key ID supplied)
      • Password: (Paste the Secret Access Key supplied)
      • Select the arrow by more options and Path:
         krux-partners/client-yyy/uploads/
      • Select Connect
      • She should see a file called _SUCCESS
      • Click on the Action Wheel and select upload
      • The file needs to be UTF-8 encoding (or us-ascii)
      • The file cannot have line breaks. Here is an article on how to fix line breaks when a file is saved via a PC/Mac vs Linux/Unix. Please refer to problem 5. If you would like to upload a test file, your Salesforce DMP representative will view it to verify there are no line breaks. Your IT department should be able to assist with formatting in Terminal.
      • Please let us know once the file is uploaded. Once it is uploaded, our engineers have to deploy the pipeline. A safe timeline would be 2 weeks for this to be completed.
      • Moving forward, the file can be dropped to the S3 by you and the Salesforce DMP system will automatically ingest it. The date should be the day you send the file so the system will ingest it properly. Please make sure the format is correct.
      • Reminder: we only accept full refresh files. Existing data within Salesforce DMP that is not included on the most recent file will be removed from the system

Need Help?

If you need help with mobile log imports, don't hesitate to contact the Salesforce DMP Solutions team at kruxhelp@krux.com.

Have more questions? Submit a request

0 Comments

Article is closed for comments.