Coffee Shop Dataset Example

A real-world dataset from Kaggle with daily revenue data for a coffee shop, used for testing LSTM on actual financial data.

Dataset Overview

For this example, we use a real-world dataset from Kaggle containing daily revenue data for a coffee shop. Unlike the synthetic dataset, this real-world data contains natural patterns, noise, and irregularities that make prediction more challenging.

The dataset includes daily revenue records along with various temporal features such as day of week, month, and special events like holidays. This allows us to test how well LSTM networks perform on actual financial data with natural patterns and variability.

Dataset Characteristics

Time Period

Approximately 2 years of daily data

Features

Date, day of week, revenue, holidays, special events

Patterns

Weekly seasonality, monthly trends, holiday effects

Purpose

Testing LSTM performance on real-world financial data

Coffee Shop Business Patterns

Coffee shops typically exhibit several business patterns that make them interesting for time series prediction:

  • Weekly Patterns: Higher revenue on weekends, with specific peak days (often Saturday)
  • Daily Patterns: Morning rush hours, lunch breaks, and after-work peaks
  • Seasonal Effects: Weather impacts (cold weather may increase hot drink sales)
  • Holiday Effects: Reduced revenue on major holidays, but potential increases before holidays
  • Special Events: Local events can drive unexpected traffic
  • Promotional Impact: Marketing campaigns and promotions can cause temporary spikes

Step-by-Step Process

Model Training

Preprocessing the coffee shop data, engineering relevant features, and training an LSTM model optimized for this specific dataset. The approach differs slightly from the synthetic dataset to account for real-world data characteristics.

Model Testing

Evaluating the trained model on test data, calculating performance metrics, and visualizing predictions against actual values. We also analyze how the model performs on different days and during special events.

Differences from Synthetic Dataset

Data Characteristics

  • Natural Noise: Real-world data contains natural variability and noise that wasn't artificially generated
  • Unexpected Events: The dataset includes unexpected events and anomalies that weren't planned
  • Missing Patterns: Some expected patterns may be weak or absent in the real data
  • External Factors: Revenue may be influenced by external factors not captured in the data
  • Data Quality Issues: Potential missing values, outliers, or recording errors

Modeling Approach

  • Feature Engineering: More emphasis on domain-specific features like holidays and events
  • Model Architecture: Simpler architecture with bidirectional LSTM but no self-attention
  • Training Strategy: Standard Adam optimizer instead of cyclic learning rate
  • Evaluation Focus: Greater emphasis on analyzing performance during special events and anomalies
  • Practical Interpretation: Focus on business insights rather than just technical performance

Expected Outcomes

With this real-world coffee shop dataset, we expect the LSTM model to:

  • Capture the weekly revenue patterns (weekday vs. weekend differences)
  • Identify the impact of holidays on revenue
  • Learn seasonal trends if present in the data
  • Show higher error rates compared to the synthetic dataset due to real-world complexity
  • Demonstrate both the strengths and limitations of LSTM for real business forecasting

This real-world example provides valuable insights into how LSTM models perform when faced with the complexities and unpredictability of actual business data, offering a more realistic assessment of their practical utility for financial forecasting.