Missing Data and AAP/EEO Compliance

On the face of it, missing data seems more of a mundane nuisance than a problem. This is particularly the case with applicants’ voluntary self-reporting of race and gender data. Based on our experience, however, missing data is one of the most overlooked threats to AAP compliance. Unfortunately, most contractors do not realize this until they are deep in the middle of an OFCCP audit. The goal of this article is to help federal contractors better understand missing data, with a discussion of:

What is Missing Data?
Consequences of Missing Data – Inaccurate AAP/EEO analyses.
Best Practices – How to manage this threat.

Section 1: What is Missing Data?

Typically, analysts consider missing data as simply blank information in a dataset. In practice, however, not all missing data is the same. In AAP/EEO, there are two forms of missing data:

Missing at Random (MAR)
Not Missing at Random (nMAR)

Missing At Random (MAR)

Research suggests that data is rarely missing at random. While it is difficult to prove otherwise, analysts are cautioned against assuming missing data as MAR without a basic investigation.

In practice, MAR data is the least problematic in AAP/EEO analyses. Since data is randomly missing, there is no systemic pattern as to how data is missing and all groups (gender, race) are equally impacted. Consider the following example:

Missing Data	Analysis	Male	Female	Total
No Missing	Count (#)	70	30	100
	Pct (%)	70%	30%
MAR	Count (#)	35	15	50
	Pct (%)	70%	30%

When there is no missing data, we see that there are 100 total applicants with 70 Males (70%) and 30 Females (30%). When data is randomly missing, the impact is equal on Male and Female applicants. We see that the MAR data is comprised of 70% Male and 30% Female applicants.

As this example shows, the biggest result of MAR data is smaller sample size. Analytically, smaller sample size lowers statistical power. Defense-oriented practitioners may delight at the inability of an analysis or OFCCP to detect significance, but that is shortsighted and wrong. Section 2 will discuss this problem in more detail.

Not Missing At Random (nMAR)

To varying degrees, there are systemic patterns in missing data in the real world. nMAR can cause bias and unpredictably distort analyses. For example, African American applicants may feel inclined to not voluntarily disclose their race because of concerns that it may somehow be linked to their application.

This is not surprising, especially when one considers the research on minority applicants who feel the need to whitewash their resumes. On the flip-side, White applicants may not voluntarily disclose their race given the heightened focus on diversifying the workforce. Depending on the job and the social climate, nMAR can take on different forms – the directionality of this systemic impact is unpredictable. Taken together, missing applicant information is most likely nMAR data to varying degrees.

Understandably, nMAR can have serious consequences in audit situations. The next section will detail three (3) threats to analytical validity by MAR and nMAR data.

Section 2: Inaccurate AAP/EEO Analyses

Researchers have long recognized the problem of missing data. It can artificially bias analyses, which results in misleading findings and conclusions. For contractors, these effects may unnecessarily draw OFCCP attention and scrutiny. When applicant data suffers from missing data, here are three (3) common problems a contractor may encounter:

Artificially significant analysis
Artificially non-significant analysis
One-to-One Applicants to Hires

Problem 1: Artificially Significant Analysis

Significant adverse impact, underutilization, and shortfalls are major pain points for contractors in an audit situation. In the following example, applicant data suffers from nMAR affecting Male applicants. When no data is missing, the selection rate for Male and Female applicants are the same (70%).

However, when there is a systemic pattern in missing data affecting male applicants, we are only able to identify 12 Male applicants who were not hired; the remaining 18 Male applicants are Unknown. As a consequence, in an audit, the observed selection rate for Males (85%) is significantly higher than Females (70%), (SD = 2.00).

Missing Data	Group	Hired Status		Statistical Std. Dev.
Missing Data	Group	Yes	No	Statistical Std. Dev.
No Missing	Male	70	30	0.00
	Female	35	15
nMAR	Male	70	12	2.00
	Female	35	15

Note: As a matter of best practice, it is possible to identify demographic data for almost all unknown applicants who are hired.

Clearly, this finding of statistical significance is artificial and due entirely to the problem of nMAR data affecting Male applicants.

Problem 2: Artificially Non-Significant Utilization Analysis

On the other hand, nMAR can lead to findings of non-statistical significance. This can be due to two effects: 1) change in selection rate and/or 2) lowered statistical power to detect significance. While defense-oriented practitioners may rejoice, this is, as suggested earlier, short-sighted. Consider the following example where nMAR is affecting Female applicants:

Missing Data	Group	Hired Status		Statistical Std. Dev.
Missing Data	Group	Yes	No	Statistical Std. Dev.
No Missing	Male	70	30	4.60
	Female	15	35
nMAR	Male	70	30	1.95
	Female	15	15

Note: As a matter of best practice, it is possible to identify demographic data for almost all unknown applicants who are hired.

When no data is missing, the selection rate for Male applicants (70%) is significantly higher than Female applicants (30%), (SD = 4.60). However, when there is a systemic pattern in missing data affecting Female applicants, we are only able to identify 15 Female applicants who were not hired; the remaining 20 Female applicants are Unknown. As a consequence, the observed selection rate for Males (70%) is not significantly higher than Females (50%), (SD = 1.95).

If these results were obtained as part of an annual AAP, the contractor would ignore this particular Job Group. In an audit situation, however, with additional data refinement and data cleaning, one additional female applicant (not-hired) was recovered. In this instance, with the addition of only one Female applicant, the contractor realizes much too late that the Female selection rate is significantly lower than Male’s. In fact, any and all data cleaning effort to recover data would only serve to magnify or worsen the situation because the nMAR effect was against Female applicants since only Female non-hire counts would increase.

Missing Data	Group	Hired Status		Statistical Std. Dev.
Missing Data	Group	Yes	No	Statistical Std. Dev.
nMAR	Male	70	30	2.13
	Female	15	16

Problem 3: One-to-One Applicants to Hires

It is not uncommon for contractors to find that they have hires but no applicant data. In some instances, the applicant data was not captured, but in most instances, the applicant count was low and no one voluntarily self-identified their demographic information. In such instances, contractors can only “recover” applicant data records. In practice, this means that they can only recover demographic data for individuals that were hired. Here is a good example:

Missing Data	Group	Hired Status		Statistical Std. Dev.
Missing Data	Group	Yes	No	Statistical Std. Dev.
No Missing	Male	2	0	1.98
	Female	0	5
nMAR	Male	2	0	0.00
	Female	0	0

Note: As a matter of best practice, it is possible to identify demographic data for almost all unknown applicants who are hired.

In this example, there were two Male hires and no Female hires. If 100% of applicant data is missing, it is possible to “recover” two (2) Male applicants, simply because they were hired and all their demographic data is available from the employee datafile. For the five (5) Female applicants, they are unrecoverable because they were not hired and are not in the employee datafile. Consequently, the applicants-to-hires analyses are referred to as “One-to-One” hires – everyone who applied was hired.

In practice, OFCCP is fairly understanding of contractors’ struggles to capture quality applicant data. However, if there are excessive One-to-One hires, it will easily stand out and receive OFCCP scrutiny.

Section 3: Best Practices

Missing information in applicant data is to be expected because self-disclosure of race, gender, and other demographic data are voluntary. Consequently, contractors can only make an effort to ensure their applicant data is as complete as possible. Here are two best practices for contractors:

Require applicants to select “I choose not to disclose” if they do not wish to self-identify.
“Recover” demographic data from employee files among Hired individuals.

Requiring Applicants to Choose

Although it is not possible to force applicants to disclose their race and gender, it is possible to properly capture their refusal to self-identify. A best practice is to require applicants to choose a category of race or gender, or “I choose not to disclose”. This Applicant Tracking System (ATS) setup has two important benefits:

1. If the applicant refuses to select any of the available options (including “I choose not to disclose”), they are unable to move forward, and technically cannot complete the application, thus removing them from the selection process. This effectively eliminates missing data and properly captures individuals who refuse to self-identify as “I choose not to disclose”.

This method documents the contractor’s efforts to capture demographic data from applicants. With proper documentation, the risk of compliance violation related to missing data in an audit situation is significantly mitigated. Unfortunately, many contractors settle with conciliation agreements because they lack the documentation to properly explain the extraordinarily high number of applicants with missing demographic information.

Recover Applicant Information

In some instances, missing applicant information can be recovered. An applicant may choose to not self-identify their demographic information during the applicant stage, but once they become an employee, they are (for the most part) required to disclose their demographics (e.g., race and gender). If an applicant refuses to self-identify during the application process but is hired, it is acceptable to update the applicant data with information from their employee file (e.g., race and gender). This can have a significant impact on the overall results, especially if there is a large number of unknown applicants who are hired.

Summary and Conclusion

For most contractors, missing data is anything but innocuous. Hopefully, this article helps to heighten contractors’ awareness and understanding of missing data. To begin with, not all missing data are alike. Some are considered missing at random (MAR) and some are considered not missing at random (nMAR). In practice, missing data is generally nMAR, which can bias the data, and result in unreliable analytical findings. When there is significant missing data, the phenomenon of One-to-One hires are unavoidable, which can draw OFCCP scrutiny on data unreliability. Ultimately, applicant demographic information is not compulsory and OFCCP understands that. Contractors simply need to document their process, so it is clear that appropriate efforts were made to allow applicants to self-identify. Researchers have struggled with missing data for many, many years. There are no easy answers. Hopefully this article helps to shed light on an often overlooked, but serious, problem about which many contractors are unaware.

For more information on missing data and general AAP/EEO assistance, please feel free to contact Biddle Consulting Group, Inc. by emailing [email protected] or calling 800-999-0438.

[email protected]

Section 1: What is Missing Data?

Section 2: Inaccurate AAP/EEO Analyses

Section 3: Best Practices

Summary and Conclusion

Author