Case studyForecasting / Time series / Evaluation

Climate CO2 Forecasting ML

Atmospheric CO2 forecasting needs leakage-safe time-series evaluation rather than a visually impressive model trained on future information.

GitHub

CO2 Forecast Lab dashboard showing model comparison and atmospheric trend.

Project type: Data and ML
Core stack: Python, pandas, statsmodels
Delivery: Case study

Case Study

The problem, implementation decisions, measured evidence, and next improvements.

Overview

An end-to-end forecasting and anomaly-analysis system with reproducible data preparation, model comparison, API serving, and dashboard evidence.

Problem

The weekly Mauna Loa CO2 series contains missing values, long-term trend, and strong annual seasonality. A credible project must compare simple and complex models on the same chronological holdout.

Solution

Built a reproducible pipeline for validation, monthly resampling, chronological splits, feature engineering, baseline/statistical/scikit-learn/PyTorch training, shared metrics, residual analysis, anomaly signals, FastAPI inference, and a React dashboard.

Outcome

Exponential Smoothing and SARIMA clearly outperformed the more complex machine-learning models. The project reports that result directly and keeps anomaly detection framed as exploratory rather than verified climate-event detection.

What It Proves

Time-series methodology, leakage prevention, honest model evaluation, statistical forecasting, PyTorch experimentation, FastAPI, React, and reproducibility.

Key Features

Chronological train, validation, and test splits prevent future leakage.
All models share the same rolling one-step evaluation and residual analysis.
Generated artifacts feed both the FastAPI service and React dashboard.

Architecture

01
statsmodels CO2 dataset
02
Validation and resampling
03
Chronological split
04
Forecasting models
05
Shared evaluation
06
Anomaly analysis
07
API and dashboard

Tech Stack

Python
pandas
statsmodels
scikit-learn
PyTorch
FastAPI
React
Recharts
Docker

Challenges & Trade-offs

2,284 weekly rows become 526 monthly observations after resampling and interpolation.
Exponential Smoothing achieved MAE 0.237 and RMSE 0.298 on rolling one-step evaluation.
The two-epoch LSTM debug run underperformed, demonstrating that complexity is not automatically better.

Future Improvements

Run a fully tuned LSTM experiment separately from the fast debug pipeline.
Add probabilistic forecast intervals and backtesting windows.
Deploy the API and dashboard together for a public interactive demo.

Repository README, generated metrics, and public GitHub presentation reviewed June 13, 2026.