A06
Download this page as PDF
Dataset Preparation
- Use
torchvision.datasets.VOCDetection
to load the
Pascal VOC 2007 or 2012 dataset.
- Extract image-level labels from annotations.
- Convert the dataset into PyTorch tensors.
- Split the dataset into training and validation sets
(e.g., 80% training, 20% validation).
- Resize and normalize the image data.
Data Visualization
- Display at least 5 sample images with their bounding
boxes and labels.
- Plot a bar chart showing the frequency of each object
class.
- Plot a pie chart of the top 5 most common classes.
Build the Classification Model
- Choose a pre-trained model from torchvision.models
(e.g., ResNet, VGG).
- Replace the final layer with a new fully connected
layer to output predictions for 20 classes.
- Use ReLU activation in hidden layers and softmax
(implicitly handled by loss function) for the output.
Train the Model
- Define a suitable loss function (e.g.,
CrossEntropyLoss).
- Choose an optimizer (e.g., Adam or SGD).
- Train the model for a fixed number of epochs (e.g.,
10–20 epochs).
- Track and display training and validation loss per
epoch.
Evaluate the Model
- Compute accuracy, precision, and recall on the
validation set.
- Display a confusion matrix to visualize
classification performance.