What is the overlap index?
The overlap index shows the representation of the data provider attribute in the base segments vs the organization’s users.
If the index is positive this means that the data provider attribute is more prominent in the base segment comparing to the organization’s users.
If converse, then the index is negative.
How is overlap index actually calculated?
The index can be thought of as a ratio of percentages:
The numerator of that ratio represents the percentage of the base segment* with a certain data provider attribute, while the denominator represents the percentage of the organization’s users** that have that data provider attribute.
* This is actually the intersection of the base segment with the organization’s users (the “reference population”)
** This is the organization’s users over the last 30 days
For example, if 30% of the base segment has a data provider attribute, while only 10% of the organization has that data provider attribute, the index is 30% / 10% = 3. The index is positive confirming that the data provide attribute is more prevalent in the base segment.
Why wouldn't overlap between base segment and organization always 100%?
Not all users in a segment are necessarily part of that organization’s first-party users. Also, the organization’s users are taken only for the last 30 days, while many segments have a longer lookback.
Are there any specific requirements for the base segment to run the report?
Not really, but there are some caveats:
- If the segment is very small, there may be very few (significant) overlaps.
- If the segment is very similar to the organization’s users, the results may not be interesting.
What other factors might skew the overlap index?
A Base Segment with very small population.
How often will the segment level reach report be refreshed?
It is not refreshed.
What are some best practices for using this product?
- First have a look at the Data Provider Report in Console.
- Ideally, select a data provider with high overlap with the organization.
What is included as "active" in an organization? Does it refer to active KUIDs exposed to client media/site within the past 30 days?
Yes - it is denoting active KUIDs in the past 30 days. Active KUIDs include KUIDs from all data collection methods (i.e. media, site, DT2.0 ingestion, onboarded data, etc.)
If you select a base segment with a lookback of 30 days, the overlap between the base segment and organization should be close to 100%, aside from any timing discrepancies
Why is the report limited to a 30 day lookback? Is it related to 3P data provider contracts? Processing power?
If populations > 30 days were considered, the 3P attributes that “pop” may not be due to the most recent active users and may not reflect your most up-to-date audiences available for activation.
Are the users included as "Untapped Devices Inside Organization" considered active users? I.e. been seen across client data collection events in the past 30 days?
Are the users included as "Untapped Devices Outside Organization" considered active users? I.e. been seen in the past 30 days across DP universe?
30 days to 400 days depending on the data provider attribute. Some DP attributes have a greater lookback across the Salesforce Audience Studio universe and will extend past the typical 30 days. Often times these attributes will have a greater count of untapped devices outside the organization.
Why do some reports populate with ALL negative overlap indices?
The data provider attribute is under-represented in the base segment compared to the organization population. Overlap index works the same way as the custom index tool. We compare the overlap of the attribute in the base segment vs the overlap of the attribute in the organization population. One way to check overlap coverage is to check the data provider overlap as a whole with organization. If this is very low, then the likelihood that an attribute from this data provider will have a big overlap with the base segment which is a subset of the client's population will be very small.
Why would a report populate with the same exact indices across every attribute?
There are multiple reasons why this might happen: You have to look at the base segment. If the base segment is generic this means that there will be some overlap across all the data provider attributes. Additionally, if your base segment makes up the majority of your organization's active KUIDs over the past 30 days, then not only will the indices be very similar across attributes, but the indices will be very close to 1.
Can a base segment with 3P data be used when running the report? If not, are there any other restrictions that should be considered?
No 3P/2P segments (including extension segments). No real time segments. Only segments developed from 1P attributes will be considered. Segments that contain onboarded 1st-Party attributes (appearing as 3rd-Party attributes in segment builder) CAN be used as base segment.
If a client uses an onboarded CRM segment as the local segment, will the report be skewed due to the restrictions with the lookback window?
We will filter to get the KUIDs in the segment that have been active in the past 30 days.
Is there a minimum population size that we should take into consideration before and after the base segment has been limited to the 30 day lookback?
There is currently no hard "minimum" threshold set. Recommended best practice is a minimum of 5000. If the population is too small it can skew the results.
How many total attributes are meant to be surfaced?
The top 100 highest indexing attributes.
Is there any filtering of data provider features?
Only features with 10% overlap with the base segment OR at least 5k user overlap will qualify to show up in the report. Additionally, only features that have untapped devices in organization of at least 10% of base segment or 5000 users will qualify. Then, of qualified features, just the top 100 indexing get surfaced in the final report.
What does it mean if less than 100 attributes show up in the results?
<100 attributes met the minimum overlap requirement with base segment
Are there best practices around how many/which attributes should/can be included in the base segment?
1st-party attributes only. This includes onboarded 1P attributes appearing as 3P in segment builder.
Can you articulate what the index in the results means as a sentence, like we do with the Custom Index Tool?
Overlap index indicates how the data provider attribute is represented in the base segment vs the organization population in the past 30 days. If the overlap index is positive, this means that the data provider attribute is more common in the base segments vs org population. If it is negative, then the data provider attribute is more common in the organization than the base segment.
Is there a way to "normalize" the results so we don't see all negative indices for certain jobs?
No, currently the results within the report cannot be "normalized".
How do I know if the data is "reliable" or not "reliable"? I.e. if I'm seeing 100% negative indices, can I assume something is wrong? What steps can a client take to get better results?
Negative indices are to be expected on occasion. This means that the client organization has greater coverage across the negative indexing attributes when compared to the base segment. A ticket should not be opened. Next steps would be to try a different data provider with better coverage by checking the data provider report.
What are the limitations of this report compared to the custom index tool?
SLR is designed to increase reach, whereas CIT offers a means to profile two audiences. As such, the report selection should align with the current goals for the audience.
Common Segment Level Reach uses:
- SLR can also be used to identify 2nd-party data providers that may be of interest to your organization
- SLR can be beneficial for clients with little-to-no 3rd-party segments built-out and don't know where to start. In this instance, the SLR report facilitates quicker time-to-value for clients
- SLR should not be used if you are trying to compare two specific audiences against each other
- SLR should not be used if you want to see the normalized profile results across a specific 3rd-party provider
Custom Index Tool uses:
- CIT can be used when you have two populations and specific attributes (segments already built out) you want to look at in both a normalized and a denormalized capacity
Can onboarded 1st-party data appearing as 3rd-party be used as a base segment?
The untapped devices inside of an organization count does not include the base segment devices?
How should I interpret conflicting attributes that index highly, i.e. Male and Female
Currently, this means the base segment has really high coverage from the DP as a whole. If you are looking to discern normalized coverage for your base segment, please reach out to a CIA analyst who can help you via the usage of overlaps.
If I run two reports on the same day with two different Data Providers, or two different base segments, should I expect the organization count to be identical?
Yes- for the most part; however, some time-synchronization discrepancies may occur.
Is the 10% or 5000 users overlap threshold applied to the base segment, the organization, or both?
There are two thresholds applied:
- Overlap Threshold (we filter data provider attributes that have):
- At least 10% overlap with base segment.
- If not, then we check if the overlap with the base segment has a population of at least 5000
- Untapped Devices Threshold:
- Untapped devices population within org is at least 10% of the base segment population.
- If not, we check that the untapped devices within org has a minimum population of 5000
We discussed the exclusion of 3P data from the base segment- does this include lookalike segments (as these almost always contain 3P data)?
For now, we allow segments that include LAL in their rule.
** If the filtering conditions mentioned above returns 0 attributes, we will just return the data providers’ attributes that have an overlap with base segment; you will still be able to see some data provider’s attributes with a small overlap or limited untapped devices.