Handwritten Digit Recognizer (Deep Learning Project)
Project Overview & Use Case
While traditional Machine Learning (like Scikit-learn) is great for structured spreadsheet data, it struggles with complex, unstructured data like images, audio, or raw text. This is where Deep Learning steps in.
The Use Case: Imagine you are building software for a post office or a bank. You need a system that can automatically read handwritten zip codes on envelopes or check amounts. Writing traditional “if/then” rules to identify human handwriting is impossible because everyone writes differently.
The Output: This script uses TensorFlow to build a multi-layered Artificial Neural Network. It downloads a famous dataset of 70,000 handwritten digits (the MNIST dataset), trains the “brain” to recognize patterns in the pixels, and then tests the AI by having it look at a completely new image and guess what number it is.
System Workflow (How It Works)
Data Loading: The script downloads the MNIST dataset, which consists of 28x28 pixel grayscale images of handwritten numbers (0-9).
Preprocessing (Normalization): Image pixels have color values ranging from 0 to 255. Neural networks learn much faster when numbers are small, so we divide everything by 255 to scale the data between 0.0 and 1.0.
Building the Brain: We stack “layers” of digital neurons.
-
A Flatten layer unrolls the 28x28 image into a single line of 784 pixels.
-
A Dense hidden layer with 128 neurons searches for patterns (like curves or straight lines).
-
An Output layer with 10 neurons gives the final probability for each digit (0 through 9).
**Training (Fitting): **The AI looks at the training images 5 times (5 epochs), making guesses, calculating its mistakes, and updating its internal math to get smarter each time.
Live Test: We grab a random image from the test set, show it to you using Matplotlib, and print the AI’s prediction.
Source Code
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random
def build_and_train_model():
"""Loads data, builds a neural network, and trains it."""
print("📥 Downloading and loading the MNIST dataset...")
# 1. Load the data directly from TensorFlow's servers
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# 2. Preprocess the data (Normalize pixel values to be between 0 and 1)
print("⚙️ Normalizing image pixels...")
x_train, x_test = x_train / 255.0, x_test / 255.0
# 3. Build the Neural Network Architecture
print("🧠 Building the Neural Network...")
model = tf.keras.models.Sequential([
# Layer 1: Flattens the 28x28 image grid into a 1D array of 784 pixels
tf.keras.layers.Flatten(input_shape=(28, 28)),
# Layer 2: The "Hidden" layer with 128 artificial neurons.
# ReLU allows the network to learn non-linear, complex patterns.
tf.keras.layers.Dense(128, activation='relu'),
# Layer 3: Prevents "overfitting" by randomly turning off 20% of neurons during training
tf.keras.layers.Dropout(0.2),
# Layer 4: The Output layer. 10 neurons representing digits 0-9.
# Softmax turns the output into percentages (probabilities).
tf.keras.layers.Dense(10, activation='softmax')
])
# 4. Compile the model (Give it a brain/learning strategy)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# 5. Train the model
print("
🏋️♂️ Training the AI (This may take a minute)...")
# Epochs = how many times the AI loops through the entire dataset
model.fit(x_train, y_train, epochs=5)
# 6. Evaluate on the hidden test set
print("
📊 Evaluating AI on data it has never seen before...")
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f"🔹 Final Accuracy: {test_acc * 100:.2f}%")
return model, x_test, y_test
def test_random_image(model, x_test, y_test):
"""Picks a random test image, asks the AI to guess it, and displays it."""
print("
🔮 Testing a random handwritten digit...")
# Pick a random image index
image_index = random.randint(0, len(x_test) - 1)
# Isolate the image and the true answer
test_image = x_test[image_index]
true_label = y_test[image_index]
# Neural networks expect a "batch" of images, so we put our single image inside a list
prediction_batch = model.predict(np.expand_dims(test_image, axis=0))
# np.argmax finds the neuron with the highest probability (the AI's final answer)
ai_guess = np.argmax(prediction_batch[0])
print("-" * 40)
print(f"📝 The actual handwritten number is: {true_label}")
print(f"🤖 The AI confidently guessed: {ai_guess}")
if true_label == ai_guess:
print("✅ The AI was CORRECT!")
else:
print("❌ The AI was WRONG.")
print("-" * 40)
# Display the image visually using Matplotlib
plt.imshow(test_image, cmap='gray')
plt.title(f"AI Guess: {ai_guess} | Actual: {true_label}", fontsize=14, fontweight='bold')
plt.axis('off') # Hide axes
print("Close the image window to exit the script.")
plt.show()
if __name__ == "__main__":
print("=== Deep Learning Digit Recognizer ===")
# Train the neural network
trained_model, testing_images, testing_labels = build_and_train_model()
# Test it visually
test_random_image(trained_model, testing_images, testing_labels)
Code Explanation (TensorFlow/Deep Learning Concepts)
tf.keras.models.Sequential: This is the easiest way to build a neural network in TensorFlow. It allows you to stack layers exactly like a multi-layered cake. The data flows sequentially from the top layer to the bottom layer.
Dense Layer: This is a standard, fully-connected neural network layer. A Dense(128) layer means it contains 128 individual mathematical “neurons” that are all communicating with the layer before it.
Activation Functions (relu and softmax): Without activation functions, a neural network is just a giant linear algebra calculator and can’t learn complex shapes.
ReLU (Rectified Linear Unit) helps the network learn complex, non-linear patterns (like loops in an ‘8’ or straight lines in a ‘7’).
Softmax is used at the very end to squash the final numbers into a probability distribution that adds up to 100% (e.g., “I am 95% sure this is a 3, and 5% sure it is an 8”).
The adam Optimizer: If the loss function tells the AI how wrong it is, the optimizer is the engine that actually adjusts the AI’s internal math to be less wrong on the next try. Adam is currently the most popular and efficient optimizer in Deep Learning.
Epochs (epochs=5): Imagine reading a textbook to study for a test. If you read it once (1 epoch), you might remember 60%. If you read it 5 times (5 epochs), you’ll likely remember 95%. This tells the AI to review the 60,000 training images 5 times over.
Execution Guide
Install Requirements: Open your terminal or command prompt. TensorFlow is a massive library, so this download might take a few minutes: pip install tensorflow matplotlib numpy
Save the file: Create a new Python file named digit_recognizer.py and paste the provided code.
Run the script: Execute the script from your terminal: python digit_recognizer.py
Review Output: You will see a progress bar appear in your terminal as the AI trains itself over 5 epochs. Once it finishes, a pop-up window will appear showing you an actual low-resolution handwritten digit, along with the AI’s prediction!