Treinamento de uma rede neural simples sem uso de bibliotecas especializadas · gpoleszuk

Neste fim de semana tomei alguns minutos para sanar uma dúvida referente ao modelo de uma rede neural artificial (RNA) após o treinamento. Não quis usar bibliotecas especializadas para então poder entender a mecânica do processo de treinamento e inferência.

Sem muito procurar, já é possível encontrar material aqui no Tabnews. Mas um vídeo me chamou a atenção, onde apresenta a implementação do processo de treinamento e inferência em código Python 3.x. Utilizou os dados clássicos do The MNIST database of handwritten digits para o experimento. O repositório pode ser conferido aqui.

Exemplo de armazenamento de caractere em blocos de 28 x 28 bytes

Arquivo: mnist.npz (11490434 bytes)
MD5    : 8a61469f7ea1b51cbae51d4f78837e45
fortran_order: False
shape: (10000, 28, 28)
gzip -d --stdout mnist.npz | dd bs=1 skip=80 | xxd -ps -u -c 28 | sed 's/00/ ./g' | head -28
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . .54B99F973C24 . . . . . . . . . . . . . . . .
 . . . . . .DEFEFEFEFEF1C6C6C6C6C6C6C6C6AA34 . . . . . .
 . . . . . .43724872A3E3FEE1FEFEFEFAE5FEFE8C . . . . . .
 . . . . . . . . . . .11420E4343433B15ECFE6A . . . . . .
 . . . . . . . . . . . . . . . . . .53FDD112 . . . . . .
 . . . . . . . . . . . . . . . . .16E9FF53 . . . . . . .
 . . . . . . . . . . . . . . . . .81FEEE2C . . . . . . .
 . . . . . . . . . . . . . . . .3BF9FE3E . . . . . . . .
 . . . . . . . . . . . . . . . .85FEBB05 . . . . . . . .
 . . . . . . . . . . . . . . .09CDF83A . . . . . . . . .
 . . . . . . . . . . . . . . .7EFEB6 . . . . . . . . . .
 . . . . . . . . . . . . . .4BFBF039 . . . . . . . . . .
 . . . . . . . . . . . . .13DDFEA6 . . . . . . . . . . .
 . . . . . . . . . . . .03CBFEDB23 . . . . . . . . . . .
 . . . . . . . . . . . .26FEFE4D . . . . . . . . . . . .
 . . . . . . . . . . .1FE0FE7301 . . . . . . . . . . . .
 . . . . . . . . . . .85FEFE34 . . . . . . . . . . . . .
 . . . . . . . . . .3DF2FEFE34 . . . . . . . . . . . . .
 . . . . . . . . . .79FEFEDB28 . . . . . . . . . . . . .
 . . . . . . . . . .79FECF12 . . . . . . . . . . . . . .
 . . . . . . . . . . . . . . . . . . . . . . . . . . . .


bytes "00" substituídos por " .", facilitando a visualização.
Cada byte "xx" é sequencialmente associado a cada uma das 784 entradas da RNA.

Problema

Após modificar o código, realizar testes, quis finalmente ver esses tais parâmetros gerados pelo processo de treinamento. Tentei exportar as variáveis para um arquivo TXT, legível para humanos e, se fosse possível, estruturado para releitura ou mesmo geração automática de um SVG (Scalable Vector Graphics) para representar toda rede com seus pesos, biases, valores anotados em breve. Entretanto, o código a seguir me surpreendeu ao criar um arquivo TXT truncado. Notei que isso ocorre para o caso das matrizes gigantes.

def save_params(weights, biases):
    """Save weights and biases to a text file."""
    with open(PARAMS_FILE, "w") as f:
        f.write("Weights and Biases:\n")
        for i, (w, b) in enumerate(zip(weights, biases)):
            f.write(f"Layer {i+1} Weights:\n{w}\n")
            f.write(f"Layer {i+1} Biases:\n{b}\n")

Dúvida

Pensei e escrevi esta postagem para buscar uma dica dos experts aqui no Tabnews, caso pudessem compartilhar nos comentários se existe alguma função nativa do Python que faz essa extração de todos os parâmetros de uma variável multidimensional, imprimindo os valores correspondentes aos pesos, biases, parâmetros em um único arquivo TXT.

Naquele momento, para preservar os dados dos treinamentos, troquei o código anterior por outro que, pelo menos, salva todos os parâmetros pós-treinamento em um binário nativo. O código a seguir também está modificado no que se refere à inicialização dos parâmetros. Se existem de um treinamento anterior, são recarregados como inicialização acelerando o re-treinamento.

Edit (post scriptum): Depois de tentar reinventar a roda, acabei encontrando a resposta. Existe uma configuração simples que resolve a truncagem quando imprimindo um tensor em Python 3.x.

import numpy as np
np.set_printoptions(threshold=np.inf)

Aplicação

"""Save weights and biases to a text file."""
output_file = "network_params_full.txt"
np.set_printoptions(threshold=np.inf)  # Disable truncation
with open(output_file, "w") as f:
    for i, (weight_matrix, bias_vector) in enumerate(zip(weights, biases)):
        f.write(f"Layer {i+1} Weights:\n")
        np.savetxt(f, weight_matrix, fmt="%.6f")
        f.write(f"Layer {i+1} Biases:\n")
        np.savetxt(f, bias_vector, fmt="%.6f")
print(f"Weights and biases exported to {output_file}.")

Códigos fonte

Como já havia criado a publicação, resolvi não removê-la para caso outros possam se aventurar na mesma necessidade. Abaixo seguem mais detalhes do código original (sem salvamento do TXT).

Dependências

matplotlib.pyplot
numpy
os
pathlib
MNIST data set: pode ser baixado a partir do repositório.

Estrutura de pastas

Considerando que as dependências estão resolvidas, para executar o código basta criar a estrutura de pastas a seguir com os devidos arquivos. Posteriormente digitar python3 nn2.py no diretório onde se encontra o código invocado.

Estrutura de pastas
.
├── ./
├── ./data
│   └── ./mnist.npz
├── ./data.py
├── ./nn2.py
├── ./LICENSE
├── ./README.md

Listagens

File: nn2.py

import os
import numpy as np
import matplotlib.pyplot as plt
from data import get_mnist

# File to store network parameters
PARAMS_FILE = "network_params.npz"

# Network structure
INPUT_SIZE = 784
HIDDEN_SIZE = 20
OUTPUT_SIZE = 10
LEARN_RATE = 0.005
EPOCHS = 5


def save_params(weights, biases):
    """Save weights and biases to a binary file."""
    np.savez(PARAMS_FILE, weights=weights, biases=biases)
    print("Network parameters saved successfully.")


def load_params():
    """Load weights and biases from a binary file."""
    if not os.path.exists(PARAMS_FILE):
        return None, None
    data = np.load(PARAMS_FILE, allow_pickle=True)
    weights = data["weights"]
    biases = data["biases"]
    print("Network parameters loaded successfully.")
    return weights, biases


# Prevent Overflow in Sigmoid: Replace the direct computation of the sigmoid
# function with a numerically stable implementation:
# def sigmoid(x):
#     return np.where(x >= 0, 
#                     1 / (1 + np.exp(-x)), 
#                     np.exp(x) / (1 + np.exp(x)))
def sigmoid(x):
    return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))


# https://365datascience.com/tutorials/machine-learning-tutorials/what-is-xavier-initialization
# two types of simple initialization: random initialization and normal (naïve) initialization
# ToDo: Weight Initialization: Xavier Initialization: For weights, use the
# following formula to prevent gradients from exploding or vanishing:
# w_i_h = np.random.uniform(-np.sqrt(6/(HIDDEN_SIZE+INPUT_SIZE)), \
#                            np.sqrt(6/(HIDDEN_SIZE+INPUT_SIZE)), (HIDDEN_SIZE, INPUT_SIZE))
# w_h_o = np.random.uniform(-np.sqrt(6/(OUTPUT_SIZE+HIDDEN_SIZE)), \
#                            np.sqrt(6/(OUTPUT_SIZE+HIDDEN_SIZE)), (OUTPUT_SIZE, HIDDEN_SIZE))
    
# Initialize or load network parameters
weights, biases = load_params()
if weights is None or biases is None:
    weights = [
        np.random.uniform(-0.5, 0.5, (HIDDEN_SIZE, INPUT_SIZE)),
        np.random.uniform(-0.5, 0.5, (OUTPUT_SIZE, HIDDEN_SIZE)),
    ]
    biases = [
        np.zeros((HIDDEN_SIZE, 1)),
        np.zeros((OUTPUT_SIZE, 1)),
    ]

images, labels = get_mnist()

print(f"Network details:")
print(f"- Input size: {INPUT_SIZE}")
print(f"- Hidden layer neurons: {HIDDEN_SIZE}")
print(f"- Output size: {OUTPUT_SIZE}")
print(f"- Learning rate: {LEARN_RATE}")
print(f"- Epochs: {EPOCHS}")

# ToDo: Mini-Batches: Instead of training on one sample at a time, process
# data in batches (e.g., 32 samples per batch). This improves computational
# efficiency and stabilizes training:
# batch_size = 32
# for epoch in range(epochs):
#    for i in range(0, len(images), batch_size):
#        batch_images = images[i:i + batch_size]
#        batch_labels = labels[i:i + batch_size]
#        # Apply forward/backpropagation on batches

for epoch in range(EPOCHS):
    nr_correct = 0
    for img, l in zip(images, labels):
        # ToDo: Optimize matrix multiplications by ensuring shapes are correct and
        # avoid reshaping repeatedly:
        # img = img[:, None]  # Convert to column vector once at the start
        img = img.reshape(-1, 1)
        l = l.reshape(-1, 1)

        # Forward propagation
        h_pre = biases[0] + weights[0] @ img
        h = sigmoid(h_pre)  # Sigmoid activation
        o_pre = biases[1] + weights[1] @ h
        o = sigmoid(o_pre)  # Sigmoid activation

        # Cost / Error calculation
        e = 1 / len(o) * np.sum((o - l) ** 2, axis=0)
        nr_correct += int(np.argmax(o) == np.argmax(l))

        # ToDo: Add Momentum: Incorporate momentum to smooth updates and
        # accelerate convergence:
        # momentum = 0.9
        # velocity_w_h_o = momentum * velocity_w_h_o - learn_rate * delta_o @ h.T
        # w_h_o += velocity_w_h_o
        # velocities[1] = momentum * velocities[1] - learn_rate * (delta_o @ hidden_output.T)
        # weights[1] += velocities[1]
        # biases[1] -= learn_rate * np.sum(delta_o, axis=1, keepdims=True)

        # velocities[0] = momentum * velocities[0] - learn_rate * (delta_h @ batch_images.T)
        # weights[0] += velocities[0]
        # biases[0] -= learn_rate * np.sum(delta_h, axis=1, keepdims=True)
    
        # ToDo: Learning Rate Scheduling: Decrease the learning rate as training
        # progresses to refine convergence:
        # learn_rate = initial_rate / (1 + decay * epoch)

        # ToDo: Eliminate explicit loops for backpropagation by using
        # vectorized calculations. For example:
        # delta_h = (w_h_o.T @ delta_o) * (h * (1 - h))
        # w_i_h -= learn_rate * (delta_h @ img.T)
        # b_i_h -= learn_rate * delta_h
    
        # Backpropagation
        delta_o = o - l
        weights[1] -= LEARN_RATE * delta_o @ h.T
        biases[1] -= LEARN_RATE * delta_o
        delta_h = weights[1].T @ delta_o * (h * (1 - h))
        weights[0] -= LEARN_RATE * delta_h @ img.T
        biases[0] -= LEARN_RATE * delta_h

    # ToDo: improve the calculation of loss. Hardcoded to !zero!
    loss = 0.0
    accuracy = round((nr_correct / len(images)) * 100, 2)
    #print(f"Epoch {epoch+1}, Loss: {loss:.4f}, Accuracy: {accuracy:.2f}%")
    print(f"Epoch {epoch+1}, Accuracy: {accuracy:.2f}%")


# Save parameters after training
save_params(weights, biases)

# Inference
while True:
    index = int(input("Enter a number of an image (0 - 59999): "))
    img = images[index]
    plt.imshow(img.reshape(28, 28), cmap="Blues")
    img = img.reshape(-1, 1)

    # Forward propagation
    h_pre = biases[0] + weights[0] @ img
    h = 1 / (1 + np.exp(-h_pre))
    o_pre = biases[1] + weights[1] @ h
    o = 1 / (1 + np.exp(-o_pre))

    plt.title(f"Prediction: {o.argmax()} (Confidence: {o.max():.2f})")
    plt.show()

File: data.py

import numpy as np
import pathlib

def get_mnist():
    with np.load(f"{pathlib.Path(__file__).parent.absolute()}/data/mnist.npz") as f:
        images, labels = f["x_train"], f["y_train"]
    images = images.astype("float32") / 255
    images = np.reshape(images, (images.shape[0], images.shape[1] * images.shape[2]))
    labels = np.eye(10)[labels]
    return images, labels

Finalidade

Como o código funciona bem, quis otimizá-lo como um protótipo para posteriormente escrevê-lo em C fazendo várias adaptações. Acho que é um ótimo exemplo para lidar com algumas estruturas que podem ser exploradas nesta linguagem bem como teste de desempenho C x Python 3.x.

# Last version
import os
import numpy as np
import matplotlib.pyplot as plt
from data import get_mnist

# File to store network parameters
PARAMS_FILE = "network_params.npz"

# Network structure
INPUT_SIZE = 784
HIDDEN_SIZE = 20
OUTPUT_SIZE = 10
LEARN_RATE = 0.005
EPOCHS = 3
BATCH = 1000


# Prevent Overflow in Sigmoid: Replace the direct computation of the sigmoid
# function with a numerically stable implementation:
# def sigmoid(x):
#     return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
def sigmoid(x):
    return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))


def save_init_params(weights, biases):
    """Save weights and biases to a binary file."""
    np.savez("network_init_params_full.npz", weights=weights, biases=biases)
    print("Network init parameters saved successfully to binary file.")

    """Save weights and biases to a text file."""
    output_file = "network_init_params_full.txt"
    np.set_printoptions(threshold=np.inf)  # Disable truncation
    with open(output_file, "w") as f:
        for i, (weight_matrix, bias_vector) in enumerate(zip(weights, biases)):
            f.write(f"Layer {i+1} Weights:\n")
            np.savetxt(f, weight_matrix, fmt="%.6f")
            f.write(f"Layer {i+1} Biases:\n")
            np.savetxt(f, bias_vector, fmt="%.6f")
    print(f"Initial weights and biases exported to {output_file}.")


def save_params(weights, biases):
    """Save weights and biases to a binary file."""
    np.savez(PARAMS_FILE, weights=weights, biases=biases)
    print("Network parameters saved successfully to binary file.")

    """Save weights and biases to a text file."""
    output_file = "network_params_full.txt"
    np.set_printoptions(threshold=np.inf)  # Disable truncation
    with open(output_file, "w") as f:
        for i, (weight_matrix, bias_vector) in enumerate(zip(weights, biases)):
            f.write(f"Layer {i+1} Weights:\n")
            np.savetxt(f, weight_matrix, fmt="%.6f")
            f.write(f"Layer {i+1} Biases:\n")
            np.savetxt(f, bias_vector, fmt="%.6f")
    print(f"Weights and biases exported to {output_file}.")


def load_params():
    """Load weights and biases from a binary file."""
    if not os.path.exists(PARAMS_FILE):
        return None, None
    data = np.load(PARAMS_FILE, allow_pickle=True)
    weights = data["weights"]
    biases = data["biases"]
    print("Network parameters loaded successfully.")
    return weights, biases


# https://365datascience.com/tutorials/machine-learning-tutorials/what-is-xavier-initialization
# two types of simple initialization: random initialization and normal (naïve) initialization
# ToDo: Weight Initialization: Xavier Initialization: For weights, use the
# following formula to prevent gradients from exploding or vanishing:
# w_i_h = np.random.uniform(-np.sqrt(6/(HIDDEN_SIZE+INPUT_SIZE)), \
#                            np.sqrt(6/(HIDDEN_SIZE+INPUT_SIZE)), (HIDDEN_SIZE, INPUT_SIZE))
# w_h_o = np.random.uniform(-np.sqrt(6/(OUTPUT_SIZE+HIDDEN_SIZE)), \
#                            np.sqrt(6/(OUTPUT_SIZE+HIDDEN_SIZE)), (OUTPUT_SIZE, HIDDEN_SIZE))
# Initialize or load network parameters
weights, biases = load_params()
if weights is None or biases is None:
    weights = [
        np.random.uniform(-0.5, 0.5, (HIDDEN_SIZE, INPUT_SIZE)),
        np.random.uniform(-0.5, 0.5, (OUTPUT_SIZE, HIDDEN_SIZE)),
    ]
    biases = [
        np.ones((HIDDEN_SIZE, 1))*0.01,
        np.ones((OUTPUT_SIZE, 1))*0.01,
    ]
    save_init_params(weights, biases)

images, labels = get_mnist()


print(f"Network details:")
print(f"- Input size: {INPUT_SIZE}")
print(f"- Hidden layer neurons: {HIDDEN_SIZE}")
print(f"- Output size: {OUTPUT_SIZE}")
print(f"- Learning rate: {LEARN_RATE}")
print(f"- Epochs: {EPOCHS}")

# Mini-Batches: Instead of training on one sample at a time, process
# data in batches (e.g., 32 samples per batch). This improves computational
# efficiency and stabilizes training:
#        # Apply forward/backpropagation on batches
batch_size = BATCH
for epoch in range(EPOCHS):
    nr_correct = 0
    for i in range(0, len(images), batch_size):
        batch_images = images[i:i + batch_size]
        batch_labels = labels[i:i + batch_size]

        for img, l in zip(batch_images, batch_labels):
            # ToDo: Optimize matrix multiplications by ensuring shapes are correct and
            # avoid reshaping repeatedly:
            # img = img[:, None]  # Convert to column vector once at the start
            img = img.reshape(-1, 1)
            l = l.reshape(-1, 1)

            # Forward propagation
            h_pre = biases[0] + weights[0] @ img
            #h = 1 / (1 + np.exp(-h_pre))  # Sigmoid activation
            h = sigmoid(h_pre)  # Sigmoid activation
            o_pre = biases[1] + weights[1] @ h
            #o = 1 / (1 + np.exp(-o_pre))
            o = sigmoid(o_pre)

            # Cost / Error calculation
            e = 1 / len(o) * np.sum((o - l) ** 2, axis=0)
            nr_correct += int(np.argmax(o) == np.argmax(l))

            # ToDo: Add Momentum:
            # ToDo: Learning Rate Scheduling:
            # ToDo: Eliminate explicit loops

            # Backpropagation
            delta_o = o - l
            weights[1] -= LEARN_RATE * delta_o @ h.T
            biases[1] -= LEARN_RATE * delta_o
            delta_h = weights[1].T @ delta_o * (h * (1 - h))
            weights[0] -= LEARN_RATE * delta_h @ img.T
            biases[0] -= LEARN_RATE * delta_h

    # ToDo: improve the calculation of loss. Hardcoded to !zero!
    loss = 0.0
    accuracy = round((nr_correct / len(images)) * 100, 2)
    print(f"Epoch {epoch+1}, Accuracy: {accuracy:.2f}%")


# Save parameters after training
save_params(weights, biases)

# Inference
while True:
    index = int(input("Enter a number of an image (0 - 59999): "))
    img = images[index]
    plt.imshow(img.reshape(28, 28), cmap="Blues")
    img = img.reshape(-1, 1)

    # Forward propagation
    h_pre = biases[0] + weights[0] @ img
    h = 1 / (1 + np.exp(-h_pre))
    o_pre = biases[1] + weights[1] @ h
    o = 1 / (1 + np.exp(-o_pre))

    # Display probabilities in the terminal
    print("\nDigit probabilities:")
    for i, prob in enumerate(o.flatten()):
        print(f"{i}: {prob * 100:.2f}%")

    plt.title(f"Prediction: {o.argmax()} (Confidence: {o.max():.2f})")
    plt.show()

[Keywords: artificial neural network, MNIST, scratch, training and inference] // for [index|filter]ing purposes