In the United States, only 0.02% of the population, or about 100,000 out of 334.9 million people, are diagnosed with sickle cell disease. This creates a significant class imbalance in datasets used for medical classification problems. When predicting diseases like sickle cell, where abnormal hemoglobin levels (6-11 g/dL) are a strong indicator, models struggle to identify meaningful patterns due to the rarity of the condition. To address this, the Synthetic Minority Oversampling Technique (SMOTE) is employed. SMOTE generates synthetic samples of the minority class to balance the dataset, allowing machine learning models to better learn from and predict rare events.
Source: towardsdatascience.com
