Originally posted by IAB on 4/30 here.
The recently released IAB guide “Defining the Data Stack” provides a framework for both advertisers and publishers to build or enhance their data stack, depending on where they are in their data capabilities journey. As addressed in the guide, choosing the best data sets for your stack is first and foremost an exercise in prioritizing needs. If you want to grow your second and/or third-party data, you’ll find yourself with a crucial task at hand: evaluating data providers.
While it often feels overwhelming, the evaluation process is crucial. Once you select a data source, you associate your brand not only with the quality of the product you are using but also with the integrity of the partner you choose. As not all data is created equal — whether you are a brand, an agency, or a publisher interested in vetting new data sources — it is paramount to thoroughly assess vendors and data sets for both quality and accountability.
Below are some key questions to ask in your assessment. It is helpful to structure your outreach in a formal RFI and ask prospective partners to provide as much detail as possible so that you can compare and contrast each provider’s offering and effectively score them.
How was the data collected?
Methodology matters. Good data produces accurate and actionable insights, while unreliable data can generate misleading insights — which are often more dangerous than no insights at all. Depending on the type of data you are evaluating, you may find that the data collection methodology, which impacts the overall quality and consistency of the data, varies by vendor.
Understanding the pros and cons of each data collection methodology will yield valuable insights into the quality and scale of the data sets you are evaluating. Throughout your evaluation process, make sure to compare results, screen for inconsistencies, and seek additional information to help you along in the process.
As part of your methodology analysis, it is also valuable to understand what the original purpose of the data collection was — think of survey data, for example — in order to screen for potential bias.
What sources were used?
You may find that certain data sets were created by combining multiple data sources, often to achieve scale. In this case, you’d want to understand what “multiple sources” really means. Scale is typically one of the drivers for securing second and third-party data, but it is important to determine the following:
- Whether you are getting a diverse and large universe representative of the US population — in which case, great!
- Whether the data was collected using different methodologies, thus generating inconsistencies that require cleansing and manipulation, and/or precluding you from being able to ascertain the origin, quality, and compliance of the data itself.
This is why it is also important to ask, “Who collected the data?” Was it the partner you were evaluating or was it brokered from other parties? The closer you are to the origin of the data, the more control you’ll have regarding its quality and integrity.
Is the data accurate?
The term “accuracy” carries different meanings based on the specific type of data, but in its simplest form, it should do what it purports to do. For example, if you are evaluating demographic data, a segment classified as Male 18-34 would have to contain users that fit that demo in order to be “accurate.” However, in the location data space, an “accurate” representation of “visits to an AMC theater” would take into account only users who spend 60+ minutes at the movie theater — rather than any data point seen in proximity of the theater, regardless of time spent.
Accuracy often goes hand in hand with scale, because you’ll likely want high-quality data with great reach. You might find it helpful to set clear benchmarks for your evaluation, based on both your specific needs and industry best practices.
Is the data fresh?
The longevity of the data depends on the specific data set. Some data sets, such as certain demo characteristics, can be considered “static” because they remain constant or rarely change. In contrast, other data sets, such as location data, are more “dynamic” in nature because they constantly evolve based on user behavior. When evaluating “dynamic” data sets, you can ask questions such as “when was the data collected?” and “how often is the data refreshed?” This will ensure that you get access to the most accurate and relevant information available.
What is the vendor’s privacy framework?
Does the data contain personally identifiable information (PII)? Did the users, from whom the data was collected, opt into the data collection? How is the vendor ensuring privacy compliance? These are only a few of the many questions on this very important topic, and based on the specific type of data you are looking for, you’ll want to clearly understand the vendor’s framework to ensure that your partner stands by solid principles.
As I mentioned in a recent post, user privacy has moral and ethical implications, which should be key drivers for all players in the ecosystem. Yet it is also apparent that user privacy has become a business imperative for brands, agencies, and publishers as they identify the data sets and data partners for their stacks. In fact, in today’s data-driven landscape, brand safety is no longer just about the environment in which ads run, but it is also tied to the origin of the data utilized. For this reason, it is paramount to be aware of and screen for partners’ data collection practices, to ensure that they themselves are in a safe position. And in today’s landscape, users are asking — rightfully so — for practices that may go beyond existing regulations and grant them the transparency, control, and access to data that they deserve, along with data provider accountability.
Can I evaluate a data sample?
Trying is believing. If you have the in-house resources, evaluating a data sample is a powerful way to get a better sense of what you should expect after you sign the contract.