Simplified DRAW Model for Generative Image Modeling
Implementation of a simplified DRAW-style recurrent variational autoencoder for iterative image reconstruction and generation on Fashion-MNIST.
This work investigates recurrent generative modeling through a simplified implementation of the DRAW (Deep Recurrent Attentive Writer) architecture, applied to the Fashion-MNIST dataset.
The model is formulated as a recurrent variational autoencoder (VAE) that iteratively refines image reconstructions via an encoder–decoder LSTM architecture and latent variable sampling. The study focuses on understanding structured latent representation learning and the trade-off between reconstruction fidelity and latent regularization under computationally efficient settings.
Model Design
- Encoder–Decoder architecture based on LSTM modules
- Latent variable sampling via reparameterization trick
- Iterative canvas refinement for progressive reconstruction
- Simplified variant without attention mechanism for efficiency
Experiments
- Latent dimension study: 8, 16, 32
- KL regularization: β = 0.5, 1.0, 2.0
- Evaluation of reconstruction vs. latent structure trade-off
Results
- Stable convergence with consistent reduction in reconstruction loss
- Learned structured latent representations
- Generated samples capture meaningful Fashion-MNIST patterns
- Expected smoothing artifacts typical of VAE-based models
Key Insights
- Latent dimensionality significantly affects representation capacity
- β-VAE regularization introduces a clear reconstruction–structure trade-off
- Recurrent generative models can learn meaningful structure even without attention
Repository
Full implementation and reproducible experiments:
https://github.com/md-naim-hassan-saykat/draw-fashion-mnist-generative-model