R-Squared of 0.27: The Journey to Deploying Data Science Models

The role of a data scientist is evolving, requiring skills beyond traditional data analysis. A recent study shows that data scientists must now engage in model deployment, a critical step in the data science lifecycle. In an example, a simple linear regression model was built to predict data professionals’ salaries based on features like experience and job title, achieving an R-squared of 0.27. This model, despite its low performance, was used to illustrate the deployment process. The deployment involved transforming categorical data into numerical data using encoding techniques and then deploying the model via a REST API using FastAPI. This API allows for interaction with the model, enabling predictions to be made through simple HTTP requests. However, deploying on multiple systems can be challenging due to varying tech stacks and dependencies. Docker was introduced as a solution, packaging the application and its dependencies into containers for easy distribution and execution across different environments. This approach simplifies the deployment process, making it accessible for teams to run the model on their machines without compatibility issues.

Source: towardsdatascience.com