What types of questions can the Custom Index Tool help answer?
The Custom Index Tool (CIT) is a report in Salesforce Audience Studio that is primarily used to compare two different segments (aka Populations). The CIT enables you to see how certain segments (or features) are represented in one population vs. another population. The CIT is best used when trying to answer questions about how one population differs from another, or what makes a specific high-value segment different from another group of users.
As such, it’s important to remember the results of the Custom Index Report are always going to be relative to the two populations you are comparing: the local population & the reference population. See Key Terminology section below for more definitions.
The CIT is great for answering:
“How do users in segment X differ from users in segment Y?”
Example segment comparisons:
- Online Converters vs. All Site Visitors
- Regular Visitors vs. Occasional Visitors
- Subscribers vs. Non-Subscribers
- Users exposed to Campaign A vs Users exposed to Campaign B
- My Site Visitors vs. General Internet Population (aka large extension segment)
Results from the CIT can be used for the following use cases:
- Informing marketing strategies
- Audience Segmentation
- Advanced Persona Building
Custom Index Tool - Key Terminology:
Local Population (segment of interest):
A single segment selected when submitting the report that represents the population of interest for your analysis.
Reference Population (comparison segment):
This is the segment you are comparing the local population to.
Feature Segments & Feature Sets:
These are segments which represent the things you want to learn about your local and reference populations. For example, gender, age, lifestyle, etc. Best practice is to use
EXHAUSTIVE groups of feature segments, aka “Feature Sets.”
Each individual segment is a feature, and each colored group is a feature set.
Calculated value which compares the relative overlaps of a given feature segment with the local and reference populations.
- A positive number (index) indicates that the feature is overrepresented in the local population when compared to the reference population. This means the feature has a higher likelihood of existing in the local population vs. the reference population.
- A negative number indicates that the feature is underrepresented in the local population when compared to the reference population. This means the feature has a lesser likelihood of existing in the local population vs. the reference population.
For more detail on interpreting indexes, see the “Normalized vs. Denormalized Calculations” section below.
The denormalized view should only be used to understand the overall coverage of any features with each population.
The normalized view should be used to interpret results and articulate the meaning of each Index as described in this document.
For more detail on Normalized vs. Denormalized see the “Normalized vs. Denormalized Calculations” section below.
How do I submit a CIT report?
To generate a Custom Index Report navigate to: Insights > Manage Reports > Create Report > Population Index
You’ll need to define the following criteria:
- Report Name: What you’d like to title the report. Best practice is to include the names of the populations being compared for easy reference later on.
- Local Population: The segment of interest you’ve identified.
- Reference Population: The segment you’d like to compare the local pop against.
- Feature Population: Pull all feature set segments you’d like into the analysis.
- Note: A maximum of 100 feature segments can be included per job. If there are 100+ feature segments you’d like to include you’ll have to break them into groups and run multiple jobs. If you need to run multiple jobs for the same local & reference populations, make sure entire feature sets are included within the same report.
Once you submit the CIT job the report will process overnight and become available to under Insights > Manage Reports’ or Insights > Custom Index Tool the following day.
NOTE: You can save time when submitting reports by copying a previous report and modifying it. Especially useful when you have a set of feature segments already defined in one report which you want to re-use for a different Local/Reference populations.
Best practice: Reduce overlap between Local and Reference
In general, having high overlap between your Local and Reference populations can make it harder to get clear insights. Best practice is to create a new segment which removes the Local population from your intended Reference population segment with a NOT rule.
Feature Segments and Feature Sets
The CIT requires some upfront work of creating sets of feature segments to include in each analysis. The feature segments you build and include when submitting your CIT jobs should address the types of things you want to learn about your Local and Reference populations. Defining and creating good feature sets is an important first step when beginning to work with the CIT.
What features should I include as characteristics in my persona?
When determining what features you’d like to include as part of your persona think about what information is going to be the most valuable. To build a well rounded and multidimensional picture of who your audience is, we recommend creating feature sets that satisfy three core pillars:
- Who they are
- What they do
- How they think
What data can I use to build out feature sets?
You can use 1st, 2nd and/or 3rd-party data when building out features as well as the local and reference populations dependent upon the type of insights you would like to generate.
As a best practice, we encourage you to explore what options you have to leverage your 1P data for feature sets.
If I want to use 3rd-Party data, how can I determine which 3PD provider to use?
To gain perspective on which 3PD Providers may be the best option to use when creating 3P segments for analytics, use the Data Providers report under the Insights tab to determine which providers have the greatest overlap with your users.
Once you navigate to the Data Providers report, a report will automatically generate showing the audience overlap between your organization and each 3P Provider active in your instance.
The greater the overlap, the higher the likelihood your users will exist within segments you build using the specific 3P Provider’s attributes.
What are some best practices for building useful feature segments?
When building segments to be used as feature sets in Custom Index Reports, there are a few important rules/guidelines you can follow:
- Use a unique identifier in the naming convention of each feature segment at the set level.
- Having a searchable identifier in the segment name is critical for saving time when generating the report & interpreting the results.
Example: “Gender_Male” and “Gender_Female” instead of just “Male” and “Female”
- Having a searchable identifier in the segment name is critical for saving time when generating the report & interpreting the results.
- Make sure you create an individual segment for each feature within a feature set.
Feature Set = Gender
- For mutually exclusive feature sets (feature sets where users will fall into only one of the available options - ex: Age, each user will only fall into one age bracket) make sure the set is exhaustive, and that you build 1 segment for each possible option.
- For all feature sets, use the same data provider for each feature in a set to avoid contradictory overlap between providers. Note: it is okay to use different providers for different feature sets.
- Don’t include features that are explicitly included or excluded within the local population.
Interpreting CIT Results
See the example file linked below in “Exporting the data” for a full description of each field and a sample output of CIT results.
How do I find the results of my CIT reports?
Once you’ve built your feature sets and submitted your CIT job, you can find the results by navigating to Insights >Custom Index Tool or Insights > Manage Reports and selecting the specific Job Name you used when submitting the report.
Normalized vs. Denormalized calculations
Denormalized indexes are calculated by the CIT, but should rarely, if ever, be used for interpreting results. The reason is that the denormalized overlap % and indexes do not account for differences in the relative sizes of the Local/Reference populations and differences in the coverage of that specific feature set for both audiences.
The normalized view considers the relative distribution of the currently selected feature segments among both populations. Normalized overlap percentages are calculated by dividing the actual % overlap of each feature, by the total % overlap for all currently selected features. This gives you a % which represents the relative distribution of each feature segment within the local and reference populations. This is why it’s important to select just one feature set at a time when interpreting results in the normalized view.
Local Population = Purchasers_1x30 (population = 10,000)
Reference Population = AllSiteVisitors_1x30 (population = 100,000)
Abbreviated CIT results below:
The CIT computes the relative Male/Female distribution based on the 8.5% of Purchasers and the 49% of Site Visitors which were covered by this feature set.
The normalized view should be used to interpret results and articulate the meaning of each Index as described in this document. The denormalized view should only be used to understand the overall coverage of any features with each population.
How do I interpret the results?
First, select the report you’d like to view. Next, select the first feature set you’d like to view.
*Note, it’s important when viewing the results of the CIT, to analyze the results of one feature set at a time because of how the tool calculates the normalized index. For example, when looking at the index results for Age, only select age feature segments. Alternatively, you can export the data to excel and group your feature sets and calculate the normalized indexes there.
In addition, make sure you are viewing the report in the normalized view.
Once you select the report & feature set you’d like to view, results will be available in two views:
The results of each CIT will display 4 data points for each feature name.
- Percent of Local (The percentage that feature made up of the entire local population)
- Percent of Reference (The percentage that feature made up of the entire reference population)
- Index (how much more/less likely a user in the local population is than the reference population to be part of that feature segment)
- Feature Size (Number of users in the feature)
Interpreting index values
Normalized index numbers within the CIT can be read using the following format:
Users in the LOCAL POPULATION were INDEX NUMBER times MORE/LESS likely to be in FEATURE SEGMENT than users in the REFERENCE POPULATION.
For example, if our local population in the Data Table screenshots above was All Converters 1x30 and the reference population was All Site Visitors 1x30, results for ‘Age 36-45’ and ‘Age 46-55’ features would read as follows:
An index value of 1 means that the feature is equally present in both the Local and Reference.
NOTE: When building an audience persona, it is best practice to include only features with an index score greater than or equal to the |1.1| (absolute value). Features with an index score lower than |1.1| are less statistically relevant.
NOTE: In addition to the index value, take into consideration the size of the local/feature overlap. While certain features may be significant from an index perspective, the size of the feature/local overlap may not be large enough to influence your marketing strategy.
Determining Statistical Significance
Before making final conclusions or assessments based on your CIT index values, it is best practice to confirm that both the local and reference population overlaps (the shared audience found between a feature set and sample population) meet the sufficient sample sizes to be statistically significant, which ensures that inferences from those overlaps can be upheld to a certain level of confidence.
This calculator template includes instructions and formulas for validating whether your CIT results are statistically significant at different confidence levels:
Statistical Significance: Helps quantify whether a result, up to a certain level of confidence is due to chance or to a real difference between two distinct populations.
Statistically Significant Overlap Size: The minimally required population overlap that can be used to make statistically significant inferences based on the total local or reference population size, % margin of error and % level of confidence.
For example, for a local population with 100,000 total devices. You would need an overlap with an individual feature of 370 devices to be 95% confident that the results were statistically significant.
Exporting the data
Salesforce Audience Studio UI provides several options for exporting CIT results.
- Confirm that “Select Features” is set to contain the relevant features you’d like to download.
- “Select” the data table to download by clicking on any number in the table and then hit “Esc” key on your keyboard.
- Select “Download” at the top right corner of the report.
- Choose either Crosstab or Data (more info below).
The crosstab view contains the normalized indexes, and for that reason is the recommended view to use for exporting most results into an excel friendly format.
Data (Full Data)
For advanced users interested in calculating their own index values, or utilizing CIT results in other ways, the “Full Data” csv download includes some additional data columns. There are a few additional steps:
- Upon seeing the “Download” summary screen appear, select “Data” as the file format
- After seeing a separate “View Data” window, select “Full Data” at the top left corner to move from the “Summary” tab and then select “Download all rows as a text file” to download the CIT data table in CSV
The Data>Full Data download option only contains Denormalized indexes, regardless of what view you selected when you download, which as mentioned above is less useful for interpreting results. In order to get the normalized indexes, use Download>Crosstab.
A sample of the exported CSV file with descriptions of each field can be found here.
Building The Persona
Once you’ve run the CIT for all feature sets you’d like to include in the persona, piece the results together to build an image of what the audience persona looks like.