Risks (Bias and Bad Data)

Skewed Data Collection

Past Policies Influence Data: When data is collected, it is often influenced by previous policies that were in place. For example, if a city implemented a policy to reduce traffic congestion, the data collected on traffic flow post-policy may show improvement. However, the data does not necessarily reflect the actual baseline before the policy was implemented or account for unintended consequences, like an increase in pollution or displacement of traffic to other areas. This feedback loop—where the policy affects the data, which then informs new policy—can introduce a bias that limits objectivity.

Example: In networking decisions, Bartulovic et al. (2017) pointed out that trace-driven evaluations (where previous policy outcomes are used as data inputs for future policy) can create biases. Policies enacted during the data collection phase could skew results, meaning the data does not fully capture reality, especially for underserved or underrepresented populations (Bartulovic et al., 2017).

Underrepresentation of Subpopulations

Insufficient Data for Certain Groups: Data-driven approaches are often limited by the availability and scope of data. If certain populations or regions are underrepresented in the data, policies created from these datasets may not effectively address their needs. For instance, urban data collection may focus heavily on densely populated areas, leading to rural underrepresentation in transportation or healthcare policies. This creates policies that benefit the majority but neglect minorities or marginalized communities.

Example: Bartulovic et al. (2017) also discussed how data-driven frameworks can overlook specific subpopulations due to insufficient data. This can result in policies that do not adequately reflect the needs of these groups, leading to inequitable outcomes in policy implementation.

See more:

Challenges and Bias in Data-Driven Policy