Unlock the Power of Python for Deep Learning with Generative Adversarial Networks (GANs)

Deep learning algorithms work with almost any kind of data and require large amounts of computing power and information to solve complicated issues. Now, let us, deep-dive, into one of the most famous deep learning algorithms: Generative adversarial networks (GANs).

Generative adversarial networks (GANs) are an exciting (and relatively) recent innovation in machine learning and deep learning. GANs are generative deep learning algorithms that create new data instances that resemble the training data. GAN has two components: A generator, which learns to generate fake data, and a discriminator, which learns from that false information.

GANs are also the engine behind DALL-E, a recent breakthrough from OpenAI that can generate images from any text description.

If you are interested in learning how to use Python for deep learning with generative adversarial networks (GANs); which are a powerful technique for creating realistic and diverse synthetic data, this article is perfect for you.

Before we begin, let’s see the remarkable ability of DALL-E (GAN) to generate a seemingly scientific photo of an atom:

first ever photo of an atom electron microscope by dall e — Prompt First ever photo of an atom electron microscope photo Created by Niko × DALLE

Table of Contents

What is Deep Learning?

Deep learning, a branch of machine learning, addresses intricate problems through the utilization of artificial neural networks. These networks consist of interconnected nodes organized in multiple layers, extracting features from input data. Extensive datasets are employed to train these models, enabling them to identify patterns and correlations that might be challenging or impossible for humans to perceive.

The impact of deep learning on artificial intelligence has been substantial. It has paved the way for the development of intelligent systems capable of independent learning, adaptation, and decision-making. Deep learning has led to remarkable advancements in various domains, encompassing image and speech recognition, natural language processing, machine translation, text generation, image generation (as would be reviewed in this article), autonomous driving, and numerous others.

thispersondoesnotexist GANs outputs — Examples of AI generated realistic human faces images using deep learning GAN that generated using thispersondoesnotexistcom

Why Python for Deep Learning?

Python has gained widespread popularity as a programming language due to its versatility and ease of use in diverse domains of computer science, especially in the field of deep learning. Thanks to its extensive range of libraries and frameworks specially tailored for deep learning, Python has emerged as a top choice among many machine learning professionals.

Python has emerged as the language of choice for deep learning, and here are some of the reasons why:

1. Simple to learn and use:

Python is a high-level programming language that is easy to learn and use, even for those who are new to programming. Its concise and uncomplicated syntax makes it easy to write and understand. This allows developers to concentrate on solving problems without worrying about the details of the language.

2. Abundant libraries and frameworks:

Python has a vast ecosystem of libraries and frameworks that cater specifically to deep learning. Some of these libraries include TensorFlow, PyTorch, Keras, and Theano. These libraries provide pre-built functions and modules that simplify the development process, reducing the need to write complex code from scratch.

3. Strong community support:

Python has a large and active community of developers contributing to its development, maintenance, and improvement. This community offers support and guidance to beginners, making it easier to learn and use Python for deep learning.

4. Platform independence:

Python is platform-independent, which means that code written on one platform can be easily executed on another platform without any modification. This makes it easier to deploy deep learning models on different platforms and devices.

5. Easy integration with other languages:

Python can be easily integrated with other programming languages, such as Delphi, C++, and Java, making it ideal for building complex systems that require integrating different technologies.

Overall, Python’s ease of use, an abundance of libraries and frameworks, strong community support, platform independence, and ease of integration with other languages make it an indispensable tool for machine learning practitioners. Its popularity continues to soar as a result.

What is DALL-E? A revolutionary image generation model

DALL-E is a remarkable GAN model that can generate images from text descriptions (even an intricate one), such as “a cat wearing a bow tie” or “a painting of a landscape in the style of Van Gogh”. It is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities.

Example of DALL E generated images — Prompt a cat wearing a bow tie

Example of Van Gogh style AI generated images — Prompt a painting of a landscape in the style of Van Gogh

DALL-E can create plausible and diverse images for a wide range of concepts, such as animals, objects, scenes, and transformations, and control their attributes, viewpoints, and perspectives. DALL-E can also combine multiple concepts, such as “an armchair in the shape of an avocado” or “a snail made of a harp”, and generate novel and creative images that do not exist in the real world. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.

Example of AI generated images — Prompt an armchair in the shape of an avocado

Applications beyond DALL-E: GANs in various domains

DALL-E is based on a large-scale dataset of text-image pairs, and a transformer architecture that can encode both text and image modalities. DALL-E demonstrates the power and potential of GANs for image synthesis and multimodal understanding.

However, DALL-E is not the only application of GANs. GANs have been successfully applied to various domains and tasks, such as computer vision, natural language generation, audio synthesis, video prediction, and more. Some of the examples of GAN applications are:

1. Image generation

GANs can generate realistic and diverse images of objects, scenes, faces, animals, and more, from random noise or text descriptions.

2. Image-to-image translation

GANs can transform images from one domain to another, such as changing the style, season, or content of the images.

3. Image enhancement

GANs can improve the quality and resolution of images, such as super-resolution, deblurring, denoising, inpainting, and colorization.

4. Text generation

GANs can generate realistic and diverse texts, such as stories, poems, reviews, captions, and more, from random noise or keywords.

5. Text-to-speech

GANs can synthesize natural and expressive speech from text, such as voice cloning, style transfer, and emotion modulation.

6. Speech enhancement

GANs can improve the quality and intelligibility of speech, such as noise reduction, dereverberation, and bandwidth extension.

7. Video generation

GANs can generate realistic and diverse videos, such as animations, simulations, and future predictions, from random noise or text descriptions.

8. Video-to-video translation

GANs can transform videos from one domain to another, such as changing the style, content, or viewpoint of the videos.

What is Generative Adversarial Networks (GANs)?

Generative Adversarial Networks (GANs) are a breakthrough innovation in deep learning that can generate realistic and diverse data from random noise or text descriptions. GANs have many applications in various domains, such as computer vision, natural language generation, audio synthesis, and more. GANs can also enable creativity, accessibility, and fairness by generating novel and inclusive data that do not exist in the real world.

GANs consist of two neural networks that compete with each other in a game-like scenario:

1. Discriminator

A discriminator that tries to distinguish between real and fake data.

More formally, given a set of data instances X and a set of labels Y: Discriminative models capture the conditional probability p(Y | X).

How an AI discriminator works — Backpropagation in discriminator training Image source Reference

Illustration of the discriminative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):

Training samples — Image source Reference

2. Generator

A generator that tries to create fake data.

More formally, given a set of data instances X and a set of labels Y: Generative models capture the joint probability p(X, Y), or just p(X) if there are no labels.

00 gandiagram generator 2079912 — Backpropagation in generator testing Image source Reference

Illustration of generative model in the handwritten digits generation use cases (we will explore the hands-on of it in the next sections):

00 figgenerative 9731713 — Image source Reference

The discriminator and the generator are trained simultaneously, in an adversarial manner, until they reach an equilibrium, where the generator can fool the discriminator about half the time. GANs can learn to produce high-quality and diverse data, such as images, text, audio, and video, by leveraging large-scale datasets and advanced network architectures.

Here’s a picture of the whole GAN system:

00 gandiagram wholesystem 8931350 — Image source Reference

What are Python tools and libraries needed for GAN development?

Python is one of the most popular and widely used programming languages for machine learning and artificial intelligence, especially for developing generative adversarial networks (GANs). Python offers a rich set of tools and libraries that can help you implement and train GAN models with ease and efficiency.

Some of the most useful and popular Python tools and libraries for GAN development are:

1. PyTorch

PyTorch is an open-source deep learning framework that provides a flexible and dynamic way of building and running GAN models. PyTorch supports automatic differentiation, GPU acceleration, distributed training, and various GAN architectures and loss functions. PyTorch also has a large and active community that contributes to the development and improvement of the framework^[10].

2. TensorFlow

TensorFlow is another open-source deep learning framework that offers a comprehensive and scalable platform for building and deploying GAN models. TensorFlow supports eager execution, graph optimization, tensor operations, and various GAN architectures and loss functions. TensorFlow also has a high-level API called Keras, which simplifies the process of creating and training GAN models^[10].

3. PyGAN

PyGAN is a Python library that implements GANs and its variants, such as conditional GANs, adversarial auto-encoders, and energy-based GANs. PyGAN allows you to design generative models based on statistical machine learning problems and optimize them using various algorithms and metrics^[10].

4. TorchGAN

TorchGAN is a Python library that provides a collection of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. TorchGAN enables you to easily create and customize your own GAN models, as well as reproduce the results of existing GAN papers^[10].

5. VeGANs

VeGANs is another Python library that provides a variety of GAN models, loss functions, and evaluation metrics, built on top of PyTorch. VeGANs aims to make GAN development accessible and user-friendly, by offering a simple and consistent interface, as well as tutorials and examples^[10].

Without further ado, let’s get our hands dirty with the hands-on GANs with Python, with three different use cases: Numerical mathematics (approximate a plot of a sine function), generating handwritten digits, and generating realistic human faces.

Hands-On GAN 1: Generate random numbers using GAN, to approximate sine plot

In this section, we will explore how GANs can be used to generate data that follows a simple sine function, between interval 0 and 2π. We will implement a GAN using PyTorch and show how the generator and the discriminator networks interact and improve over time. We will also demonstrate the results of our GAN by comparing the generated data with the original sine function data.

The following is the complete Python code to automatically generate random numbers using GAN, to approximate sine plot:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

import torch

from torch import nn

import math

import matplotlib.pyplot as plt

torch.manual_seed(111)

train_data_length = 1024

train_data = torch.zeros((train_data_length, 2))

train_data[:, 0] = 2 * math.pi * torch.rand(train_data_length)

train_data[:, 1] = torch.sin(train_data[:, 0])

train_labels = torch.zeros(train_data_length)

train_set = [

(train_data[i], train_labels[i]) for i in range(train_data_length)

]

# Plot training data

plt.plot(train_data[:, 0], train_data[:, 1], ".")

plt.show()

# Create a PyTorch data loader

batch_size = 32

train_loader = torch.utils.data.DataLoader(

train_set, batch_size=batch_size, shuffle=True

)

# Implementing the Discriminator

class Discriminator(nn.Module):

def __init__(self):

super().__init__()

self.model = nn.Sequential(

nn.Linear(2, 256),

nn.ReLU(),

nn.Dropout(0.3),

nn.Linear(256, 128),

nn.ReLU(),

nn.Dropout(0.3),

nn.Linear(128, 64),

nn.ReLU(),

nn.Dropout(0.3),

nn.Linear(64, 1),

nn.Sigmoid(),

)

def forward(self, x):

output = self.model(x)

return output

# Instantiate a Discriminator object

discriminator = Discriminator()

# Implementing the Generator

class Generator(nn.Module):

def __init__(self):

super().__init__()

self.model = nn.Sequential(

nn.Linear(2, 16),

nn.ReLU(),

nn.Linear(16, 32),

nn.ReLU(),

nn.Linear(32, 2),

)

def forward(self, x):

output = self.model(x)

return output

generator = Generator()

# Training the models

lr = 0.001

num_epochs = 300

loss_function = nn.BCELoss()

# Create the optimizers using Adam optimizer

optimizer_discriminator = torch.optim.Adam(discriminator.parameters(), lr=lr)

optimizer_generator = torch.optim.Adam(generator.parameters(), lr=lr)

# Implement a training loop in which training samples are fed to the models, and their weights are updated to minimize the loss function:

for epoch in range(num_epochs):

for n, (real_samples, _) in enumerate(train_loader):

# Data for training the discriminator

real_samples_labels = torch.ones((batch_size, 1))

latent_space_samples = torch.randn((batch_size, 2))

generated_samples = generator(latent_space_samples)

generated_samples_labels = torch.zeros((batch_size, 1))

all_samples = torch.cat((real_samples, generated_samples))

all_samples_labels = torch.cat(

(real_samples_labels, generated_samples_labels)

)

# Training the discriminator

discriminator.zero_grad()

output_discriminator = discriminator(all_samples)

loss_discriminator = loss_function(

output_discriminator, all_samples_labels)

loss_discriminator.backward()

optimizer_discriminator.step()

# Data for training the generator

latent_space_samples = torch.randn((batch_size, 2))

# Training the generator

generator.zero_grad()

generated_samples = generator(latent_space_samples)

output_discriminator_generated = discriminator(generated_samples)

loss_generator = loss_function(

output_discriminator_generated, real_samples_labels

)

loss_generator.backward()

optimizer_generator.step()

# Show loss

if epoch % 10 == 0 and n == batch_size - 1:

print(f"Epoch: {epoch} Loss D.: {loss_discriminator}")

print(f"Epoch: {epoch} Loss G.: {loss_generator}")

# Checking the samples generated by the GAN

latent_space_samples = torch.randn(100, 2)

generated_samples = generator(latent_space_samples)

generated_samples = generated_samples.detach()

plt.plot(generated_samples[:, 0], generated_samples[:, 1], ".")

plt.show()

To execute the code above seamlessly without any errors, we can utilize the PyScripter IDE.

What did the code above do?

Let’s break down the important parts of the code above:

1. Data generation:

train_data_length specifies the number of data points to be generated.
train_data is a tensor of shape (train_data_length, 2) where the first column represents random values between 0 and 2π, and the second column is the sine of the first column.
train_labels is a tensor of zeros.
train_set is a list of tuples, each containing a data point and its corresponding label.

2. Data visualization:

The generated training data is plotted using matplotlib.

3. Data loader:

batch_size is set to 32.
train_loader is a PyTorch data loader that shuffles and batches the training data.

4. Discriminator model:

The Discriminator class is defined as a subclass of nn.Module.
It consists of a feedforward neural network with layers of sizes 2 (input) → 256 → 128 → 64 → 1, followed by a Sigmoid activation.
Dropout layers with a dropout probability of 0.3 are added for regularization.

5. Generator model:

The Generator class is defined, similar to the Discriminator.
It is a neural network with layers of sizes 2 (input) → 16 → 32 → 2.

6. Model initialization:

Instances of the Discriminator and Generator classes are created.

7. Training configuration:

Learning rate (lr) is set to 0.001.
num_epochs is set to 300.
Binary Cross Entropy Loss (nn.BCELoss()) is used as the loss function.

8. Optimizer initialization:

Adam optimizers are created for both the discriminator and generator.

9. Training loop:

The code runs a training loop for the specified number of epochs.
For each epoch, it iterates through batches of data from the train_loader.
For the discriminator:
- Real and generated samples are combined.
- The discriminator is trained to distinguish between real and generated samples.
For the generator:
- The generator is trained to generate samples that the discriminator classifies as real.
Losses for the discriminator and generator are printed every 10 epochs.

10. Generated samples visualization:

After training, 100 samples are generated using the trained generator, and they are plotted.

In summary, the code above implements a simple Generative Adversarial Network (GAN) where the generator and discriminator are trained adversarially to generate realistic samples. The generator generates fake samples to try and fool the discriminator, while the discriminator learns to distinguish between real and fake samples.

Here are a few selected outputs from all the process above:

Selected outputs:

Examine the training data by plotting each point (x₁, x₂):

Plot the generated samples. We show you the screenshot of the plotting results in epoch 300, which almost perfectly resemble the sine plot:

To see the progression between epoch (from 0 to 300) more clearly, please watch the following video:

Hands-On GAN 2: Generate handwritten digits using GAN

In this section, we will explore how GANs can be used to generate realistic images of handwritten digits. For that, you’ll train the models using the MNIST dataset of handwritten digits, which is included in the torchvision package. We will implement a GAN using PyTorch and show how the generator will produce fake images and the discriminator will try to tell them apart.

The following is the complete Python code to automatically generate handwritten digits using GAN:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

import torch

from torch import nn, optim

import torchvision

import torchvision.transforms as transforms

import math

import matplotlib.pyplot as plt

import os

torch.manual_seed(111)

device = ''

if torch.cuda.is_available():

device = torch.device('cuda')

else:

device = torch.device('cpu')

transform = transforms.Compose(

[transforms.ToTensor(),

transforms.Normalize((0.5,), (0.5,))])

train_set = torchvision.datasets.MNIST(root='.',

train=True,

download=True,

transform=transform)

batch_size = 32

train_loader = torch.utils.data.DataLoader(train_set,

batch_size=batch_size,

shuffle=True)

plt.figure(dpi=150)

real_samples, mnist_labels = next(iter(train_loader))

for i in range(16):

ax = plt.subplot(4, 4, i+1)

plt.imshow(real_samples[i].reshape(28, 28), cmap='gray_r')

plt.xticks([])

plt.yticks([])

plt.tight_layout()

class Discriminator(nn.Module):

def __init__(self):

super().__init__()

self.model = nn.Sequential(

nn.Linear(784, 1024),

nn.ReLU(),

nn.Dropout(0.3),

nn.Linear(1024, 512),

nn.ReLU(),

nn.Dropout(0.3),

nn.Linear(512, 256),

nn.ReLU(),

nn.Dropout(0.3),

nn.Linear(256, 1),

nn.Sigmoid()

)

def forward(self, x):

x = x.view(x.size(0), 784)

output = self.model(x)

return output

discriminator = Discriminator().to(device=device)

class Generator(nn.Module):

def __init__(self):

super().__init__()

self.model = nn.Sequential(

nn.Linear(100, 256),

nn.ReLU(),

nn.Linear(256, 512),

nn.ReLU(),

nn.Linear(512, 1024),

nn.ReLU(),

nn.Linear(1024, 784),

nn.Tanh()

)

def forward(self, x):

#x = x.view(x.size(0), 100)

output = self.model(x)

output = output.view(x.size(0), 1, 28, 28)

return output

generator = Generator().to(device=device)

lr = 0.0001

num_epochs = 50

loss_function = nn.BCELoss()

optimizer_discriminator = torch.optim.Adam(discriminator.parameters(), lr=lr)

optimizer_generator = torch.optim.Adam(generator.parameters(), lr=lr)

latent_space_samples_plot = torch.randn((16, 100)).to(device=device)

# Load trained NN when it exists, or train a new NN

if os.path.isfile('discriminator.pt') and os.path.isfile('generator.pt'):

discriminator.load_state_dict(torch.load('./discriminator.pt'))

generator.load_state_dict(torch.load('./generator.pt'))

else:

for epoch in range(num_epochs):

for n, (real_samples, mnist_labels) in enumerate(train_loader):

# Data for training the discriminator

real_samples = real_samples.to(device=device)

real_samples_labels = torch.ones((batch_size, 1)).to(device=device)

latent_space_samples = torch.randn((batch_size, 100)).to(device=device)

generated_samples = generator(latent_space_samples)

generated_samples_labels = torch.zeros(

(batch_size, 1)).to(device=device)

all_samples = torch.cat((real_samples, generated_samples))

all_samples_labels = torch.cat(

(real_samples_labels, generated_samples_labels))

# Training the discriminator

discriminator.zero_grad()

output_discriminator = discriminator(all_samples)

loss_discriminator = loss_function(

output_discriminator, all_samples_labels)

loss_discriminator.backward()

optimizer_discriminator.step()

# Data for training the generator

latent_space_samples = torch.randn((batch_size, 100)).to(device=device)

# Training the generator

generator.zero_grad()

generated_samples = generator(latent_space_samples)

output_discriminator_generated = discriminator(generated_samples)

loss_generator = loss_function(

output_discriminator_generated, real_samples_labels)

loss_generator.backward()

optimizer_generator.step()

# Show loss

if n == batch_size - 1:

print(f"Epoch: {epoch} Loss D.: {loss_discriminator}")

print(f"Epoch: {epoch} Loss G.: {loss_generator}")

latent_space_samples = torch.randn(batch_size, 100).to(device=device)

generated_samples = generator(latent_space_samples)

generated_samples = generated_samples.cpu().detach()

plt.figure(dpi=150)

for i in range(16):

ax = plt.subplot(4, 4, i+1)

plt.imshow(generated_samples[i].reshape(28, 28), cmap='gray_r')

plt.xticks([])

plt.yticks([])

plt.tight_layout()

# Save trained NN parameters

torch.save(generator.state_dict(), 'generator.pt')

torch.save(discriminator.state_dict(), 'discriminator.pt')

To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.

What did the code above do?

Let’s break down the important parts of the code above:

1. Importing necessary libraries

torch for PyTorch,
nn for neural network modules,
optim for optimizers,
torchvision for handling datasets like MNIST,
transforms for data transformations,
math for mathematical functions,
matplotlib.pyplot for plotting, and
os for operating system related functions.

2. Checking if a CUDA-enabled GPU is available and setting the device accordingly.

3. Defining a data transformation pipeline using `transforms.Compose`.

It converts the images to PyTorch tensors and normalizes them.

4. Loading the MNIST dataset for training,

specifying the root directory, setting it for training, downloading it if not available, and applying the defined transformation.

5. Creating a PyTorch data loader to handle batching and shuffling of the training data.

6. Plotting 16 real samples from the MNIST dataset using matplotlib.

7. Defining the `Discriminator` class,

which is a neural network with several fully connected layers, ReLU activations, and Dropout layers. The final layer has a Sigmoid activation.

8. Implementing the forward method for the `Discriminator` class

and creating an instance of the Discriminator class, moving it to the specified device.

9. Defining the `Generator` class,

which is another neural network with fully connected layers, ReLU activations, and a hyperbolic tangent (Tanh) activation.

10. Implementing the forward method for the `Generator` class

and creating an instance of the Generator class, moving it to the specified device.

11. Setting hyperparameters:

learning rate (lr),
number of epochs (num_epochs), and using
Binary Cross Entropy Loss (nn.BCELoss()).
Initializing Adam optimizers for both the discriminator and generator.

12. Generating random samples in the latent space for visualization.

13. Loading pre-trained models if available,

otherwise training the models for a specified number of epochs.

14. Generating and plotting 16 samples from the generator.

15. Saving the trained model parameters for future use.

Here are a few selected outputs from all the process above:

Selected outputs:

Download and extract the dataset:

Train the model with seed=111:

The following is the visualization of the excerpt of the MNIST dataset:

vs the results of generated handwriting by GAN in epoch 50:

To see the progression between epoch (from 0 to 50) more clearly, please see the following video:

Hands-On 3: Generate realistic human faces using GAN

In this section, we will learn how to generate realistic human faces using GANs. The GAN consists of two competing networks: A generator that creates fake images from random noise, and a discriminator that distinguishes real images from fake ones. We will use a large dataset of celebrity images to train our GAN and produce high-quality and diverse faces. However, as this task consumes large computational power, we will perform it using Kaggle’s GPU, while we will also show you the limitations and challenges of using a regular laptop.

Introduction to Kaggle’s GPU Options: P100 vs. T4

Kaggle offers its users a 30-hour weekly time cap for GPU access, allowing them to choose between NVIDIA T4 and P100 GPUs. However, many Kaggle users may lack clarity on which GPU is best suited for their specific needs.

In general, the T4 GPU is an optimal choice for inference workloads that demand high throughput and low power consumption. On the other hand, the P100 GPU excels in handling training workloads, thanks to its superior performance and increased memory capacity.^[11].

It’s important to note that TPUs (Tensor Processing Units) are not part of this comparison, as they represent a distinct type of hardware accelerator designed by Google. When considering GPUs, the P100 is recommended for training tasks, while both the GPU P100 and GPU T4 can be utilized for inference purposes. Selecting the appropriate GPU depends on the specific requirements of the given machine learning task.^[11].

The complete code for generating realistic human faces using GAN, and what did it do?

The following is the complete Python code to automatically generate realistic human faces using GAN:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

import numpy as np # Linear algebra

import pandas as pd # Data processing, CSV file I/O (e.g. pd.read_csv)

import os

from matplotlib import pyplot as plt

from tqdm import tqdm

from PIL import Image as Img

from keras import Input

from keras.layers import Dense, Reshape, LeakyReLU, Conv2D, Conv2DTranspose, Flatten, Dropout

from keras.models import Model

from keras.optimizers import RMSprop

for dirname, _, filenames in os.walk('/kaggle/input'):

for filename in filenames:

print(os.path.join(dirname, filename))

# Load the data and resize the images

PIC_DIR = f'../input/celeba-dataset/img_align_celeba/img_align_celeba/'

IMAGES_COUNT = 10000

ORIG_WIDTH = 178

ORIG_HEIGHT = 208

diff = (ORIG_HEIGHT - ORIG_WIDTH) // 2

WIDTH = 128

HEIGHT = 128

crop_rect = (0, diff, ORIG_WIDTH, ORIG_HEIGHT - diff)

images = []

for pic_file in tqdm(os.listdir(PIC_DIR)[:IMAGES_COUNT]):

pic = Image.open(PIC_DIR + pic_file).crop(crop_rect)

pic.thumbnail((WIDTH, HEIGHT), Image.ANTIALIAS)

images.append(np.uint8(pic))

#Image shape

images = np.array(images) / 255

print(images.shape)

#Display first 25 images

plt.figure(1, figsize=(10, 10))

for i in range(25):

plt.subplot(5, 5, i+1)

plt.imshow(images[i])

plt.axis('off')

plt.show()

# Create Generator

LATENT_DIM = 32

CHANNELS = 3

def create_generator():

gen_input = Input(shape=(LATENT_DIM, ))

x = Dense(128 * 16 * 16)(gen_input)

x = LeakyReLU()(x)

x = Reshape((16, 16, 128))(x)

x = Conv2D(256, 5, padding='same')(x)

x = LeakyReLU()(x)

x = Conv2DTranspose(256, 4, strides=2, padding='same')(x)

x = LeakyReLU()(x)

x = Conv2DTranspose(256, 4, strides=2, padding='same')(x)

x = LeakyReLU()(x)

x = Conv2DTranspose(256, 4, strides=2, padding='same')(x)

x = LeakyReLU()(x)

x = Conv2D(512, 5, padding='same')(x)

x = LeakyReLU()(x)

x = Conv2D(512, 5, padding='same')(x)

x = LeakyReLU()(x)

x = Conv2D(CHANNELS, 7, activation='tanh', padding='same')(x)

generator = Model(gen_input, x)

return generator

# Create Discriminator

def create_discriminator():

disc_input = Input(shape=(HEIGHT, WIDTH, CHANNELS))

x = Conv2D(256, 3)(disc_input)

x = LeakyReLU()(x)

x = Conv2D(256, 4, strides=2)(x)

x = LeakyReLU()(x)

x = Conv2D(256, 4, strides=2)(x)

x = LeakyReLU()(x)

x = Conv2D(256, 4, strides=2)(x)

x = LeakyReLU()(x)

x = Conv2D(256, 4, strides=2)(x)

x = LeakyReLU()(x)

x = Flatten()(x)

x = Dropout(0.4)(x)

x = Dense(1, activation='sigmoid')(x)

discriminator = Model(disc_input, x)

optimizer = RMSprop(

lr=.0001,

clipvalue=1.0,

decay=1e-8

)

discriminator.compile(

optimizer=optimizer,

loss='binary_crossentropy'

)

return discriminator

# Define a GAN Model

from IPython.display import Image

from keras.utils.vis_utils import model_to_dot

generator = create_generator()

generator.summary()

Image(model_to_dot(generator, show_shapes=True).create_png())

discriminator = create_discriminator()

discriminator.trainable = False

discriminator.summary()

Image(model_to_dot(discriminator, show_shapes=True).create_png())

gan_input = Input(shape=(LATENT_DIM, ))

gan_output = discriminator(generator(gan_input))

gan = Model(gan_input, gan_output)

optimizer = RMSprop(lr=.0001, clipvalue=1.0, decay=1e-8)

gan.compile(optimizer=optimizer, loss='binary_crossentropy')

gan.summary()

# Training the GAN model

import time

iters = 15000

batch_size = 16

RES_DIR = 'res2'

FILE_PATH = '%s/generated_%d.png'

if not os.path.isdir(RES_DIR):

os.mkdir(RES_DIR)

CONTROL_SIZE_SQRT = 6

control_vectors = np.random.normal(size=(CONTROL_SIZE_SQRT**2, LATENT_DIM)) / 2

start = 0

d_losses = []

a_losses = []

images_saved = 0

for step in range(iters):

start_time = time.time()

latent_vectors = np.random.normal(size=(batch_size, LATENT_DIM))

generated = generator.predict(latent_vectors)

real = images[start:start + batch_size]

combined_images = np.concatenate([generated, real])

labels = np.concatenate([np.ones((batch_size, 1)), np.zeros((batch_size, 1))])

labels += .05 * np.random.random(labels.shape)

d_loss = discriminator.train_on_batch(combined_images, labels)

d_losses.append(d_loss)

latent_vectors = np.random.normal(size=(batch_size, LATENT_DIM))

misleading_targets = np.zeros((batch_size, 1))

a_loss = gan.train_on_batch(latent_vectors, misleading_targets)

a_losses.append(a_loss)

start += batch_size

if start > images.shape[0] - batch_size:

start = 0

if step % 50 == 49:

gan.save_weights('/gan.h5')

print('%d/%d: d_loss: %.4f, a_loss: %.4f. (%.1f sec)' % (step + 1, iters, d_loss, a_loss, time.time() - start_time))

control_image = np.zeros((WIDTH * CONTROL_SIZE_SQRT, HEIGHT * CONTROL_SIZE_SQRT, CHANNELS))

control_generated = generator.predict(control_vectors)

for i in range(CONTROL_SIZE_SQRT ** 2):

x_off = i % CONTROL_SIZE_SQRT

y_off = i // CONTROL_SIZE_SQRT

control_image[x_off * WIDTH:(x_off + 1) * WIDTH, y_off * HEIGHT:(y_off + 1) * HEIGHT, :] = control_generated[i, :, :, :]

im = Img.fromarray(np.uint8(control_image * 255))#.save(StringIO(), 'jpeg')

im.save(FILE_PATH % (RES_DIR, images_saved))

images_saved += 1

plt.figure(1, figsize=(12, 8))

plt.subplot(121)

plt.plot(d_losses, color='red')

plt.xlabel('epochs')

plt.ylabel('discriminant losses')

plt.subplot(122)

plt.plot(a_losses)

plt.xlabel('epochs')

plt.ylabel('adversary losses')

plt.show()

To execute the code provided seamlessly, without any errors, we can utilize the PyScripter IDE.

Let’s break down the important parts of the code above:

1. Importing necessary libraries for:

numerical operations (numpy),
data processing (pandas),
plotting (matplotlib),
tqdm for progress bars, and
components from keras for building a Generative Adversarial Network (GAN).

2. Walking through the Kaggle input directory and printing the filenames.

3. Setting up parameters for loading and resizing images from the CelebA dataset.

Cropping and resizing images to a specified size.

4. Converting the list of images to a NumPy array

and normalizing pixel values to the range [0, 1].

5. Displaying the first 25 images from the dataset.

6. Defining the generator model architecture using Keras.

7. Defining the discriminator model architecture using Keras.

8. Creating an instance of the generator and displaying its summary.

9. Creating an instance of the discriminator and displaying its summary.

Setting trainable to False to prevent it from being trained during the GAN training phase.

10. Creating the GAN model by connecting the `generator` and `discriminator`.

Compiling the GAN model with a binary cross-entropy loss.

11. Setting up parameters for training the GAN model and creating directories for saving generated images.

12. Training the GAN model,

saving weights periodically, and plotting the losses during training.
Images are also generated and saved for visualization.

Selected outputs:

Load and resize CelebA dataset:

Show image shape:

Display first 25 images:

Generator summary:

Generator scheme (as generated automatically using model_to_dot function from keras.utils.vis_utils):

output14 kaggle defineganmodel generator 7626898

Discriminator summary:

Discriminator scheme (also generated automatically using model_to_dot function from keras.utils.vis_utils):

output15 kaggle defineganmodel discriminator 7280900

GAN summary:

_________________________________________________________________

Model: "model_2"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

input_3 (InputLayer) [(None, 32)] 0

_________________________________________________________________

model (Functional) (None, 128, 128, 3) 14953987

_________________________________________________________________

model_1 (Functional) (None, 1) 4211713

=================================================================

Total params: 19,165,700

Trainable params: 14,953,987

Non-trainable params: 4,211,713

_________________________________________________________________

After the step above, further training and output production would require a sufficient GPU, so, for the next steps, I move on using Kaggle (with GPU P100 as accelerator).

Here is the screenshot of the last step that can be done without GPU using PyScripter IDE (if you have your own GPU, you can continue run the code on your PyScripter IDE seamlessly):

Plot of Discriminant and Adversary losses:

Quoting Reference [3] to help in interpreting the plot of Discriminant and Adversary losses: “GAN convergence is hard to identify. As the generator improves with training, the discriminator performance gets worse because the discriminator can’t easily tell the difference between real and fake. If the generator succeeds perfectly, then the discriminator has a 50% accuracy. In effect, the discriminator flips a coin to make its prediction.

This progression poses a problem for convergence of the GAN as a whole: the discriminator feedback gets less meaningful over time. If the GAN continues training past the point when the discriminator is giving completely random feedback, then the generator starts to train on junk feedback, and its quality may collapse.”

After 300 epochs, the code would produce the following realistic human faces:

On Kaggle /output:

On my computer:

Image output for the first epoch (0) vs the 300th epoch (299):

To see the progression between epoch (from 0 to 299) more clearly, please see the following video:

Out of curiosity, I retrain the model until epoch 599, which you can see the results (from epoch 300 to 599) in the following video:

Conclusion

GANs are a powerful and versatile class of deep generative models that can produce realistic and diverse data, such as images, text, audio, and video, from random noise. You also learned how GANs can generate realistic and diverse data using text descriptions as demonstrated by DALL-E.

This article has highlighted and demonstrated the potential use of deep learning, specifically within the context of the GANs architecture in the domain of numerical mathematics (approximate a plot of a function), generating handwritten digits, and generating realistic human faces. All are implemented with hands-on Python examples.

I hope this article was successful in giving you a comprehensive and accessible introduction to GANs, and a solid understanding and workflow of how to implement GANs to your domains and project goals, so, it would inspire you to learn more and experiment with GANs yourself.

Check out the full repository here:

github.com/Embarcadero/DL_Python05_GAN

Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.

Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.

Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.

Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.

References & further readings

[1] Biswal, A. (2023).

Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm.

[2] Candido, R. (2021).

Generative Adversarial Networks: Build Your First Models. Real Python. realpython.com/generative-adversarial-networks.

[3] Chauhan, N. S. (2021).

Generate Realistic Human Face using GAN. Kaggle. kaggle.com/code/nageshsingh/generate-realistic-human-face-using-gan.

[4] Google for Developers. (2024).

Generative Adversarial Networks. Advanced courses, machine learning, Google for Developers. developers.google.com/machine-learning/gan.

[5] LeCun, Y. (1998).

The MNIST database of handwritten digits. yann.lecun.com/exdb/mnist.

[6] Liu, Z., Luo, P., Wang, X., & Tang, X. (2018).

Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018), 11. mmlab.ie.cuhk.edu.hk/projects/CelebA.html.

[7] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014).

Generative adversarial nets. Advances in neural information processing systems, 27.

[8] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2020).

Generative adversarial networks. Communications of the ACM, 63(11), 139-144.

[9] Ramesh, A., Pavlov, M., Goh, G., Gray, S., Chen, M., Child, R., Misra, V., Mishkin, P., Krueger, G., Agarwal, S., & Sutskever, I. (2015-2023).

DALL·E: Creating images from text. DALL-E, OpenAI research. openai.com/research/dall-e.

[10] Sagar, R. (2020).

Top Libraries For Quick Implementation Of GANs. Analytics India Magazine. analyticsindiamag.com/generative-adversarial-networks-python-libraries.

[11] Siddhartha. (2023).

GPU T4 vs GPU P100 | Kaggle | GPU. Medium. siddhartha01writes.medium.com/gpu-t4-vs-gpu-p100-kaggle-gpu-cd852d56022c.

Unlock the Power of Python for Deep Learning with Generative Adversarial Networks (GANs) – The Engine behind DALL-E

What is Deep Learning?

Why Python for Deep Learning?

1. Simple to learn and use:

2. Abundant libraries and frameworks:

3. Strong community support:

4. Platform independence:

5. Easy integration with other languages:

What is DALL-E? A revolutionary image generation model

Applications beyond DALL-E: GANs in various domains

1. Image generation

2. Image-to-image translation

3. Image enhancement

4. Text generation

5. Text-to-speech

6. Speech enhancement

7. Video generation

8. Video-to-video translation

What is Generative Adversarial Networks (GANs)?

1. Discriminator

2. Generator

What are Python tools and libraries needed for GAN development?

1. PyTorch

2. TensorFlow

3. PyGAN

4. TorchGAN

5. VeGANs

Hands-On GAN 1: Generate random numbers using GAN, to approximate sine plot

What did the code above do?

1. Data generation:

2. Data visualization:

3. Data loader:

4. Discriminator model:

5. Generator model:

6. Model initialization:

7. Training configuration:

8. Optimizer initialization:

9. Training loop:

10. Generated samples visualization:

Selected outputs:

Hands-On GAN 2: Generate handwritten digits using GAN

What did the code above do?

1. Importing necessary libraries

2. Checking if a CUDA-enabled GPU is available and setting the device accordingly.

3. Defining a data transformation pipeline using transforms.Compose.

4. Loading the MNIST dataset for training,

5. Creating a PyTorch data loader to handle batching and shuffling of the training data.

6. Plotting 16 real samples from the MNIST dataset using matplotlib.

7. Defining the Discriminator class,

8. Implementing the forward method for the Discriminator class

9. Defining the Generator class,

10. Implementing the forward method for the Generator class

11. Setting hyperparameters:

12. Generating random samples in the latent space for visualization.

13. Loading pre-trained models if available,

14. Generating and plotting 16 samples from the generator.

15. Saving the trained model parameters for future use.

Selected outputs:

Hands-On 3: Generate realistic human faces using GAN

Introduction to Kaggle’s GPU Options: P100 vs. T4

The complete code for generating realistic human faces using GAN, and what did it do?

1. Importing necessary libraries for:

2. Walking through the Kaggle input directory and printing the filenames.

3. Setting up parameters for loading and resizing images from the CelebA dataset.

4. Converting the list of images to a NumPy array

5. Displaying the first 25 images from the dataset.

6. Defining the generator model architecture using Keras.

7. Defining the discriminator model architecture using Keras.

8. Creating an instance of the generator and displaying its summary.

9. Creating an instance of the discriminator and displaying its summary.

10. Creating the GAN model by connecting the generator and discriminator.

11. Setting up parameters for training the GAN model and creating directories for saving generated images.

12. Training the GAN model,

Selected outputs:

Conclusion

References & further readings

[1] Biswal, A. (2023).

[2] Candido, R. (2021).

[3] Chauhan, N. S. (2021).

[4] Google for Developers. (2024).

3. Defining a data transformation pipeline using `transforms.Compose`.

7. Defining the `Discriminator` class,

8. Implementing the forward method for the `Discriminator` class

9. Defining the `Generator` class,

10. Implementing the forward method for the `Generator` class

10. Creating the GAN model by connecting the `generator` and `discriminator`.