Synthetic Dataset Example

A 3-year synthetic cash flow dataset with controlled patterns like linear trends and seasonal spikes for testing LSTM under controlled conditions.

Dataset Overview

For this example, we've created a synthetic dataset that simulates 3 years of daily cash flow data starting from January 1, 2022. The dataset includes controlled patterns such as:

  • Linear income trends with gradual growth
  • Weekly seasonality with higher income on weekends
  • Monthly seasonality with peaks at the beginning of each month
  • Quarterly seasonality with end-of-quarter spikes
  • Annual seasonality with holiday season increases
  • Random noise to simulate real-world variability

This controlled environment allows us to test the LSTM model's ability to learn and predict various patterns that commonly occur in financial data, while having ground truth knowledge of the underlying patterns.

Dataset Characteristics

Time Period

January 1, 2022 - December 31, 2024 (1095 days)

Features

Date, Day of Week, Income, Expenses, Net Cash Flow

Patterns

Linear trends, weekly, monthly, quarterly, and annual seasonality

Purpose

Testing LSTM performance under controlled conditions

Step-by-Step Process

Data Generation

Creating a synthetic dataset with controlled patterns including linear trends, seasonal factors, and random noise to simulate real-world cash flow data.

Model Training

Preprocessing the data, engineering features, and training an LSTM model with self-attention and bidirectional layers to capture temporal patterns.

Model Testing

Evaluating the trained model on test data, calculating performance metrics, and visualizing predictions against actual values.

Expected Outcomes

With this synthetic dataset, we expect the LSTM model to:

  • Learn the underlying linear trends in income and expenses
  • Capture weekly seasonality patterns (weekend vs. weekday differences)
  • Identify monthly and quarterly patterns
  • Recognize annual seasonal effects
  • Filter out random noise to focus on meaningful patterns

The controlled nature of this dataset allows us to evaluate exactly how well the model learns each type of pattern, providing insights into the strengths and limitations of LSTM networks for financial forecasting.