Statistics reveal a significant challenge in AI language models, particularly in non-English contexts. Evaluations show that AI systems, primarily trained on English data, struggle with local languages and cultural nuances. For instance, an assessment of ChatGPT’s responses to health-care queries indicated far worse performance in Chinese and Hindi compared to English and Spanish. This issue extends to social media content moderation, where AI fails to detect gender-based violence in diverse regions like India, South Africa, and Brazil. The problem is compounded as these poorly moderated AI models are used to moderate other content, leading to a cycle of errors. In response, there is a growing interest in developing community-driven AI solutions tailored to specific languages and cultural contexts. These include small language models and specialized data sets aimed at improving recognition of local slang, mixed language usage, and reclaimed language.
Source: www.technologyreview.com















