Harnessing data for unbiased Learning of Machines
In Machine Learning, the input data reflects stereotypes and biases of the broader society, then the output of the learning algorithm also captures these stereotypes.
Here, we will discuss the gender stereotypes in word embedding.
word-embedding encode semantic information, they also exhibit hidden biases inherent in the dataset they are trained on associations such as:
father:doctor :: mother:nurse
man:computer programmer :: woman:homemaker
The prejudices and stereotypes in these embedding reflect biases implicit in the data on which they were trained. The embedding of a word is typically optimized to predict co-occurring words in the corpus.
Identifying Stereotypes
Bias Removal Methods
- Quantify how words, such as those corresponding to professions, are distributed along the direction between embedding of “he” and “she” both.
- An algorithm for generating analogy pairs from an embedding given two seed words and we then use crowd-workers to quantify whether these embedding analogies reflect stereotypes.
A simple approach to explore how gender stereotypes manifest in embedding is to quantify which words are closer to “he” versus “she” in the embedding space.
a. Make a list of common profession names.
b. Remove names that are associated with one gender by definition (e.g. waitress, waiter).
For each name in list (v), compute it’s projection onto axis:
Several professions are closer to the “he” or “she” vector and this is consistent across the embedding, suggesting that embedding encode gender stereotypes.
Words(profession) closest to “he”, “she” and in “between the two” are colored in red and shown in the plot.
Now, let’s automate the process:
- Generate analogous word pairs by embedding for “he” and “she” both, by crowd-sourcing evaluate the degree of stereotype of each pair.
- The desired analogy (he:she :: w1:w2) has the following properties:
> the direction of (w1-w2) has to align with he-she.
> (w1 and w2) should be semantically similar. - Based on this, given a word embedding E, score to analogous pairs can be obtained by the following formulation:
where, d is gender direction as calculated above and delta is a threshold for similarity. Every word pair is judged, use the pairs as stereotyped to quantify the degree of bias of this analogy.
Reducing stereotypes
To reduce these stereotypes while preserving the desirable geometry of the embedding:
Inputs:
- A word embedding stored in a matrix, E [- R(n, r).
where n is the number of words and r is the dimension of the latent space. - A matrix B [- R(n(b), r), where each column is a vector representing a direction of stereotype.
Here, B = v(he) — v(she)
But, in general B can contain multiple stereotypes including gender, racism etc. - A matrix P [- R(n(p), r) whose columns correspond to set of seed words that we want to de-bias.
- A matrix A [- E whose columns represent a background set of words. We want the algorithm to preserve their pairwise distances.
The goal is to generate a transformation matrix, which has the following properties:
- The transformed embedding are stereotypical-free.
That is every column vectors in PT should be perpendicular to column vectors in BT. - The transformed embedding preserve the distances between any two vectors in the matrix A.
we can capture these two objectives as the following semi-positive definite programming problem.
Where, X = TT^T and || F is Frobenius norm.
*The first term ensures that the pairwise distances are preserved, and the second term induces the biases to be small on the seed words. The user-specified parameter λ balances the two terms.
After SVD,
To validate de-biasing algorithm, collect words that are likely to reflect gender stereotype (e.g. manager, nurse). Use some for training as the columns of the P matrix. The remaining are used for testing.
*The blue circles are the 88 gender-stereotype words which form our held-out test set.
*The green crosses are a random sample of background words that were not suggested to have stereotype.
*Most of the stereotype words lie close to the y = 0 line, consistent with them lies near the midpoint between he and she. In contrast the background points were substantially less affected by the de-biasing transformation.
Validation
Use variances to quantify this result. For each test word (either gender-stereotypical or background), project it onto the he — she direction.
Then compute the variance of the projections in the original embedding and after the de-biasing transformation. For the gender-stereotype test words, the variance in the original embedding is 0.02 and the variance after the transformation is 0.001.
For the background words, the variance before and after the transformation was 0.005 and 0.0055 respectively.
This demonstrates that the transformation was able to reduce gender stereotype.
Verification
Test the transformed embedding on several standard benchmarks that measure whether related words have similar embeddings as well as how well the embedding performs in analogy tasks.
Vivek Gupta: https://www.linkedin.com/in/vivekg-/
Follow me on Quora: https://www.quora.com/profile/Vivek-Gupta-1493
Check out my legal space here: https://easylaw.quora.com