The Office of Federal Contract Compliance Programs (OFCCP) published multiple FAQs on July 23, 2019 to provide further guidance to federal contractors in their efforts to understand and comply with anti-discrimination regulations. 'Practical Significance in EEO Analysis' attempts to provide clarity on when a statistically significant disparity may or may not be practically significant and, therefore, actionable. While the FAQs indicate that the content applies to employment outcomes in general, the primary focus appears to be on employment selections. However, the same principles apply to other employment outcomes such as compensation disparities. While discussions regarding practical versus statistical significance are not new, the issue has become more important as companies have access to increasingly large pools of applicants, in particular, from which to make hiring decisions. As the pool size increases so does the likelihood that small differences in selection rates (or test scores) will be statistically significant and, therefore, subject to legal challenge. The following article will define practical and statistical significance, explain the practical significant 'tests' described in the OFCCP's FAQs, and provide recommendations for employers.
The Uniform Guidelines on Employment Selection Procedures (UGESP, 1978) provides information on examining the adverse impact of employee selection procedures. The UGESP describes the 4/5ths or 80% rule, which stipulates that a finding that the selection rate for the disfavored group is less than 80% of the selection rate of the favored group is evidence of adverse impact. The 4/5ths rule provides a relatively easy method for employers to evaluate the adverse impact of their selection procedures. However, the UGESP acknowledges that the 4/5ths rule is not a statistical test. Since that time, the proliferation of technology and software has facilitated the ease of calculating statistical significance of differences in selection rates. With employers tracking and maintaining large applicant datasets over multiple years and the OFCCP's movement to request and analyze multiple years of data, the likelihood of at least one statistical indicator has increased.
Statistical significance indicates that some measured outcome (e.g., differences in selection rates between males and females) did not occur by chance. If the difference in selection rates between males and females, for example, is too large to attribute to chance, the difference is statistically significant and an indication that the difference can be attributed to gender or something related to gender. In terms of tests or assessments, either the difference in test scores or the difference in pass rates is too large to attribute to chance. Common thresholds for statistical significance include a two (or three) standard deviation difference or a p-value (i.e. probability value) less than .05. While a statistically significant difference seems like a reasonable indicator for further investigation of a selection procedure, statistical significance depends not only on the size of the difference but also on the size of the group under consideration. That is, smaller groups require larger differences to trigger the 2 s.d. or p<.05 threshold, and larger groups require smaller differences to yield a statistically significant result. For example, for pools of 1,000 applicants, selection rate differences of as little as 7.9% will be statistically significant and for pools of 5,000 applicants, selection rate differences of as little as 2.1% will be statistically significant1.
Practical significance examines the meaningfulness of an observed difference. If the difference in selection rates is 5%, is that meaningful? If the difference between the expected number who passed a test and the actual number who passed (i.e., the shortfall) is 10 out of 10,000 people who passed the test, is that meaningful? The OFCCP's FAQs specifically identify several practical significance measures, all apparently based on legal precedent, the UGESP, or publications on the topic. In particular, the FAQs mention the impact ratio, the odds ratio, the flip-flop rule, the Apsley v. Boeing ratio, and Cohen's h as example measures.
The Impact ratio refers to the disadvantaged group rate divided by the advantaged group rate. 80% is the standard to evaluate the impact ratio. For example, if 90% of Whites pass a test and 85% of Hispanics pass a test, the impact ratio is 94.4% (i.e., 85% divided by 90%). Even if the difference in selection rates exceeds 2 standard deviations, an impact ratio greater than 80% suggests that this difference is not practically significant. While the 4/5ths rule examines the ratio of selection rates, other practical significant measures, which OFCCP referenced in its FAQs, evaluate these same rates using alternative methods. Cohen's H, for example, calculates the difference between the selection rates for the favored and disfavored groups after performing a statistical transformation, which allows for an evaluation of a standardized difference. Cohen's h values greater than .80 are considered large2. Going a step further, the odds ratio resembles the impact ratio but instead takes into account both the selection and rejection rates of the groups in question.
According to the UGESP, the flip-flop rule examines the result of selecting one additional disfavored group member (and rejecting one additional favored group member). For the example above, suppose the selection of one additional Hispanic applicant and one less White applicant results in the selection of 90% of Hispanics and 85% of Whites. Hispanics are now favored, and the flip-flop rule suggests that the original difference was not practically significant. While the UGESP specifically describes the flip-flop rule in the context of the favored group changing, courts have expanded on this concept by considering the actual number of disfavored group members it would take for the difference to fall below the threshold for statistical significance. Courts have ruled that significant disparities were not practically significant when the selection of a small number (which varies by decision) of disfavored group members results in a difference that is not statistically significant3.
The Apsley v Boeing ratio refers to a specific practical significance measure from a specific court ruling. This matter involved a claim of age discrimination in the selection of employees for rehire by Spirit Airlines. Spirit Airlines acquired several Boeing facilities, laid off all Boeing workers, and then selected among the laid off workers to fill the positions at the acquired facilities. While the shortfall (i.e., the difference between the actual number of older workers selected and the expected number of selections given the composition of the selection pool) was statistically significant, the overall selection rate of older workers was extremely high at 99%. That is, while the employer selected significantly fewer protected class members (i.e, older workers in this case) than expected in a neutral selection process, the court considered the high overall selection rate, 99% of the expected number of protected class members, and ruled that the disparity was not practically significant.
Give the OFCCP's recent FAQ, recommendations from the social sciences, and court decisions, all recognizing the importance of measuring both statistical and practical significance, employers should do just that. While some of the practical measures may be unfamiliar to employers, employers should consider looking at the measures that are easiest to compute and interpret, namely the 4/5ths rule, the actual shortfall, and the number of additional selections of the disfavored group required for the difference to fall below thresholds for statistical significance. When both practical and statistical measures of adverse impact suggest meaningful and statistically significant differences, the OFCCP, other enforcement agencies, or plaintiffs' attorneys have a much stronger case for alleging employment discrimination. Where practical and statistical measures disagree, the employer should consider looking for any anecdotal evidence of discrimination and continue to closely monitor the employment practice in question.
1. Jacobs, R., Murphy, K. and Silva, J. (2012). Unintended Consequences of EEO Enforcement Policies: Being Big is Worse than Being Bad. Journal of Business Psychology. Online Publication.
2. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates. New York.
3. U.S. v. Commonwealth of Virginia (1978) 620 F.2d 1018; Waisome v. Port Authority of New York & New Jersey (1991) 948 F.2d 1370.