Skip to content

UTA027: Artificial Intelligence

A06

tiet-uta027/tiet-uta027.github.io

A06

Download this page as PDF

Dataset Preparation

Use torchvision.datasets.VOCDetection to load the Pascal VOC 2007 or 2012 dataset.
Extract image-level labels from annotations.
Convert the dataset into PyTorch tensors.
Split the dataset into training and validation sets (e.g., 80% training, 20% validation).
Resize and normalize the image data.

Data Visualization

Display at least 5 sample images with their bounding boxes and labels.
Plot a bar chart showing the frequency of each object class.
Plot a pie chart of the top 5 most common classes.

Build the Classification Model

Choose a pre-trained model from torchvision.models (e.g., ResNet, VGG).
Replace the final layer with a new fully connected layer to output predictions for 20 classes.
Use ReLU activation in hidden layers and softmax (implicitly handled by loss function) for the output.

Train the Model

Define a suitable loss function (e.g., CrossEntropyLoss).
Choose an optimizer (e.g., Adam or SGD).
Train the model for a fixed number of epochs (e.g., 10–20 epochs).
Track and display training and validation loss per epoch.

Evaluate the Model

Compute accuracy, precision, and recall on the validation set.
Display a confusion matrix to visualize classification performance.