<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Deep Learning | Home - Jyothi Swaroop</title><link>https://kjyothiswaroop.github.io/tag/deep-learning/</link><atom:link href="https://kjyothiswaroop.github.io/tag/deep-learning/index.xml" rel="self" type="application/rss+xml"/><description>Deep Learning</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sun, 01 Jun 2025 00:00:00 +0000</lastBuildDate><image><url>https://kjyothiswaroop.github.io/media/icon_hue06d895dbab0d0c9b72b1a0534685c49_26160_512x512_fill_lanczos_center_3.png</url><title>Deep Learning</title><link>https://kjyothiswaroop.github.io/tag/deep-learning/</link></image><item><title>Grayscale Image Colourization</title><link>https://kjyothiswaroop.github.io/project/conditional-diffusion/</link><pubDate>Sun, 01 Jun 2025 00:00:00 +0000</pubDate><guid>https://kjyothiswaroop.github.io/project/conditional-diffusion/</guid><description>&lt;hr>
&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>A &lt;strong>conditional diffusion model&lt;/strong> for grayscale image colorization, built entirely from scratch in PyTorch — forward noising process, noise schedule, U-Net, EMA, and reverse diffusion loop. The grayscale image acts as the conditioning signal, concatenated as an additional channel to the noisy RGB image at each denoising step.&lt;/p>
&lt;hr>
&lt;h2 id="dataset">Dataset&lt;/h2>
&lt;p>Source images come from the &lt;strong>CelebA-HQ&lt;/strong> dataset (&lt;code>korexyz/celeba-hq-256x256&lt;/code>). Each image is resized to &lt;strong>128×128&lt;/strong> and paired with its grayscale version. The resulting dataset is pushed to HuggingFace (&lt;code>kjswaroopNU/celebahq-128-gray&lt;/code>) and loaded directly during training.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Split&lt;/th>
&lt;th>Samples&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Train&lt;/td>
&lt;td>28,000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Validation&lt;/td>
&lt;td>1,000&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Test&lt;/td>
&lt;td>1,000&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;hr>
&lt;h2 id="forward-process">Forward Process&lt;/h2>
&lt;p>The forward noising process uses an &lt;strong>offset cosine schedule&lt;/strong> over T = 1000 timesteps. Signal and noise rates satisfy the identity signal² + noise² = 1:&lt;/p>
&lt;p>$$x_t = \text{signal_rate}(t) \cdot x_0 + \text{noise_rate}(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$$&lt;/p>
&lt;p>The schedule interpolates angles between a &lt;code>max_signal_rate&lt;/code> of 0.95 and &lt;code>min_signal_rate&lt;/code> of 0.02, keeping signal-to-noise well-behaved at both ends.&lt;/p>
&lt;hr>
&lt;h2 id="u-net">U-Net&lt;/h2>
&lt;p>The U-Net is written from scratch with the following structure:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Block&lt;/th>
&lt;th>Details&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>DownBlock&lt;/strong>&lt;/td>
&lt;td>2× ResidualBlock + AvgPool2d&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Bottleneck&lt;/strong>&lt;/td>
&lt;td>2× ResidualBlock&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>UpBlock&lt;/strong>&lt;/td>
&lt;td>Bilinear upsample + 2× ResidualBlock with skip connections&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Activation&lt;/strong>&lt;/td>
&lt;td>SiLU throughout&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The noise variance is injected via a &lt;strong>sinusoidal embedding&lt;/strong> (log-spaced frequencies, sin + cos concatenated), upsampled to 128×128 and concatenated after the first convolution. The grayscale conditioning is concatenated to the noisy RGB input, giving the network 4 input channels.&lt;/p>
&lt;hr>
&lt;h2 id="training">Training&lt;/h2>
&lt;p>The network is trained to predict the noise $\epsilon$ added at each timestep (MSE loss). An &lt;strong>EMA copy&lt;/strong> of the U-Net (decay = 0.999) is maintained throughout and used exclusively at inference time for smoother outputs.&lt;/p>
&lt;p>A subtle bug was caught during development: BatchNorm &lt;strong>buffers&lt;/strong> (running mean/variance) are not touched by &lt;code>model.parameters()&lt;/code>, so the EMA network was computing fresh batch statistics at inference instead of using the accumulated training stats. The fix copies buffers explicitly alongside weight averaging each step.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/project/conditional-diffusion/result_hu28a7587743642c4d8ca51a3af053489c_665629_347784f7e79da2651b2934a55cac1c3a.webp 400w,
/project/conditional-diffusion/result_hu28a7587743642c4d8ca51a3af053489c_665629_47b4ccadeace374dc8d2f12a724ccb54.webp 760w,
/project/conditional-diffusion/result_hu28a7587743642c4d8ca51a3af053489c_665629_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://kjyothiswaroop.github.io/project/conditional-diffusion/result_hu28a7587743642c4d8ca51a3af053489c_665629_347784f7e79da2651b2934a55cac1c3a.webp"
width="760"
height="460"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;hr>
&lt;h2 id="gradio-ui">Gradio UI&lt;/h2>
&lt;p>Running &lt;code>python src/eval.py&lt;/code> launches a &lt;strong>Gradio web app&lt;/strong> for interactive inference. Upload any grayscale image, set the number of diffusion steps (10–100), and the number of colorized samples to generate (1–8). The EMA network runs the full reverse diffusion loop and returns the colorized outputs in a gallery — no code required.&lt;/p>
&lt;hr>
&lt;h2 id="mlops">MLOps&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Hydra&lt;/strong> — all hyperparameters live in &lt;code>configs/config.yaml&lt;/code> and are overridable from the CLI (&lt;code>python src/train.py lr=0.0001 batch_size=64&lt;/code>), making every run reproducible without touching source code.&lt;/li>
&lt;li>&lt;strong>Weights &amp;amp; Biases&lt;/strong> — per-step loss, per-epoch loss, and sample grids (grayscale / generated / ground truth) logged every 10 epochs.&lt;/li>
&lt;li>Checkpoints saved every 50 epochs; training resumes from any checkpoint via &lt;code>resume=true&lt;/code>.&lt;/li>
&lt;/ul></description></item></channel></rss>