Coffee Shop Dataset Example
A real-world dataset from Kaggle with daily revenue data for a coffee shop, used for testing LSTM on actual financial data.
Dataset Overview
For this example, we use a real-world dataset from Kaggle containing daily revenue data for a coffee shop. Unlike the synthetic dataset, this real-world data contains natural patterns, noise, and irregularities that make prediction more challenging.
The dataset includes daily revenue records along with various temporal features such as day of week, month, and special events like holidays. This allows us to test how well LSTM networks perform on actual financial data with natural patterns and variability.
Dataset Characteristics
Time Period
Approximately 2 years of daily data
Features
Date, day of week, revenue, holidays, special events
Patterns
Weekly seasonality, monthly trends, holiday effects
Purpose
Testing LSTM performance on real-world financial data
Coffee Shop Business Patterns
Coffee shops typically exhibit several business patterns that make them interesting for time series prediction:
- Weekly Patterns: Higher revenue on weekends, with specific peak days (often Saturday)
- Daily Patterns: Morning rush hours, lunch breaks, and after-work peaks
- Seasonal Effects: Weather impacts (cold weather may increase hot drink sales)
- Holiday Effects: Reduced revenue on major holidays, but potential increases before holidays
- Special Events: Local events can drive unexpected traffic
- Promotional Impact: Marketing campaigns and promotions can cause temporary spikes
Step-by-Step Process
Preprocessing the coffee shop data, engineering relevant features, and training an LSTM model optimized for this specific dataset. The approach differs slightly from the synthetic dataset to account for real-world data characteristics.
Evaluating the trained model on test data, calculating performance metrics, and visualizing predictions against actual values. We also analyze how the model performs on different days and during special events.
Differences from Synthetic Dataset
Data Characteristics
- Natural Noise: Real-world data contains natural variability and noise that wasn't artificially generated
- Unexpected Events: The dataset includes unexpected events and anomalies that weren't planned
- Missing Patterns: Some expected patterns may be weak or absent in the real data
- External Factors: Revenue may be influenced by external factors not captured in the data
- Data Quality Issues: Potential missing values, outliers, or recording errors
Modeling Approach
- Feature Engineering: More emphasis on domain-specific features like holidays and events
- Model Architecture: Simpler architecture with bidirectional LSTM but no self-attention
- Training Strategy: Standard Adam optimizer instead of cyclic learning rate
- Evaluation Focus: Greater emphasis on analyzing performance during special events and anomalies
- Practical Interpretation: Focus on business insights rather than just technical performance
Expected Outcomes
With this real-world coffee shop dataset, we expect the LSTM model to:
- Capture the weekly revenue patterns (weekday vs. weekend differences)
- Identify the impact of holidays on revenue
- Learn seasonal trends if present in the data
- Show higher error rates compared to the synthetic dataset due to real-world complexity
- Demonstrate both the strengths and limitations of LSTM for real business forecasting
This real-world example provides valuable insights into how LSTM models perform when faced with the complexities and unpredictability of actual business data, offering a more realistic assessment of their practical utility for financial forecasting.