Tuesday, October 29, 2024

LLMs are biased: 1. datasets, 2. unconscious biases of creators

we knew that the choice of datasets (eg. the highly biased, and intentionally so with paid 'editors' sources such as wikipedia) are already making LLMs extremely biased against all but woke western perspectives. indian knowledge systems, already marginalized, will be rendered completely immaterial by this process. 

however, here is proof that the unconscious (and conscious) biases of the developers of the neural network are being reflected in the 'weights' that the self-learning networks impose on various factors it considers.

India cannot just pick up western LLMs and use them. it needs to build and train its own on specialized datasets. or even if the western LLMs are 'open-source' (not really) need to try to train them better on narrow topics of interest to indians. for example, machine translation in real time between indian languages. 

https://arxiv.org/pdf/2410.18417


  

No comments: