How neural networks read price
Price forecasting with deep learning isn't a single model — it's a decision about architecture, data pipeline, and what "error" actually costs you.
These visuals map out the key ideas: from raw time-series to trained recurrent networks, from feature engineering to evaluation metrics that matter in practice.
Architecture choices and what they cost
LSTM and GRU networks handle sequential dependencies well, but they're slow to train on long windows. Transformer-based models like Temporal Fusion Transformer handle mixed-frequency inputs more flexibly.
Choosing between them isn't about which is "better" — it's about your sequence length, available compute, and whether interpretability matters in your use case.
Enough depth to build and critique a forecasting pipeline from scratch.
ARIMA baseline through attention-based architectures — with honest comparisons.
Data pipeline
Normalization, windowing, and train-test split without leakage
Feature engineering
Lag features, rolling statistics, and external regressors
Model evaluation
MAE, RMSE, directional accuracy and their trade-offs
Deployment
Serving forecasts with FastAPI, monitoring drift in production
What makes this hard
- Price series are non-stationary — the statistical properties shift over time
- Signal-to-noise ratio is low, especially at daily and hourly granularity
- Overfitting is easy to hide if you don't walk-forward validate properly
- Most benchmark datasets have already been overfit by published papers
- Production models degrade — monitoring matters as much as training
Learning path through the curriculum
The sequence below reflects how we structured the material at Konpaterud — starting from where most learners actually get stuck, not from first principles.
Time-series fundamentals
Autocorrelation, seasonality decomposition, stationarity tests — the baseline every model should beat.
Recurrent architectures
LSTM and GRU from scratch in PyTorch, including gradient flow visualizations and gating mechanics.
Attention and transformers
Self-attention applied to sequences, positional encoding, and when transformers outperform RNNs.
Uncertainty quantification
Probabilistic forecasting, conformal prediction intervals, and calibration — because point estimates alone mislead.
"One student, Petra Valkonen, described the walk-forward validation lab as the first time a backtest result felt honest. That's the kind of shift we're aiming for."