Demand forecasting in retail involves several steps from data extraction to model deployment. This guide uses the Darts library in Python to simplify the process. The dataset used is from the Stallion Kaggle Competition, focusing on monthly sales of beer products. Key steps include:
- Data Preprocessing: Handling zero-price transactions and missing values, particularly in the discount column.
- Product Segmentation: Forecasting only high-rotation products to manage costs effectively. Products sold 75% of the year are considered high-rotation.
- Feature Extraction: Including time-based features like day of the week, month, and moving averages of past sales.
- Model Selection: A baseline model, NaiveMovingAverage, was compared with advanced models like Temporal Fusion Transformer (TFT) and Time Series Deep Encoder (TiDE). TiDE showed a 6.11% improvement over the baseline in terms of Mean Absolute Error (MAE) and was also superior when considering monetary costs of stockouts and overstocking.
- Evaluation: Beyond typical metrics, incorporating domain-specific KPIs like cost savings from avoiding stockouts or overstocking is crucial.
The tutorial emphasizes the importance of backtesting and the potential of using cloud services for model deployment and monitoring. It also highlights the need for ongoing model performance checks due to possible data or concept drift.
Source: towardsdatascience.com
