The solution to One of Kaggle's Datasets: https://www.kaggle.com/datasets/yasserh/walmart-dataset
An interactive Jupyter Cell for finding Weekly Sales based on input!
Accuracy: 92.677%
- Data Preprocessing:
- Made numpy array of features from Holiday_flag to Unemployment. Shape: (m, 5)
- Store is a categorical data and cannot be quantized as an ordinal number and hence used one-hot encoding to form a 45 column ndarray for each shop, 1 representing presence and 0 for absence. Shape: (m, 45)
- Made another ndarray which extracted month, year as dates from data
- Concatenated important features, dates, one hot encoded shops and stored as X_train.
- Seperate ndarray for storing output of each examples and stored as y_train. Reshaped it into 2d array for matrix multiplications.
- Training:
- Standardized using z score: X_train and y_train for feature scaling
- Mean Squared Error
- Gradient Descent
- r2 score for accuracy metrics
- Interactive cell for predicting new weekly sales based on respective features.
- Used numpy matrix multiplications for faster computations and cleaner code.