
Cover2Catalog
Camden Alpert
cra200002
Henry Jones
hsj200000
Rishabh Medhi
rxm200047
Michael Nuyda
man200004
Abstract
This report presents a computer vision application aimed
at simplifying the tedious process of CD cataloging. Our
approach utilizes real-time object detection with a YOLOv8
model to identify and capture snapshots of CDs based on
their orientation (Front, Back, or Side) once a confidence
threshold is met. Subsequent steps involve extracting usable
information from the detected CD: barcode scanning for the
back and OCR-based catalog number recognition for the
side. This extracted information is then cross-referenced
with a music database to provide detailed metadata, with
the option to add entries to a collection using an exter-
nal API. While the application successfully achieved most
objectives, including object detection and information re-
trieval, integration with a collection-tracking API remains
incomplete. Future work includes refining the model with
an expanded dataset, incorporating external data, and de-
veloping a user interface with collection management capa-
bilities. This demonstrates the potential of computer vision
techniques to streamline CD cataloging workflows.
1. Keywords
YOLOv8, PaddleOCR, pyzbar, Label Studio, OpenCV,
Discogs, MusicBrainz, ultralytics
2. Introduction
CD cataloging can be a time-consuming and labor-
intensive process, especially for large collections. Whether
a store has just received a new shipment of albums or a radio
station is looking to reorganize their archive of music, the
first hurdle that one finds is how tedious it becomes to man-
ually search for every single release. This report explores
the application of computer vision techniques to streamline
and simplify this task. By leveraging real-time object detec-
tion through the YOLOv8 model, our approach automates
key aspects of CD cataloging, including identifying CD ori-
entation and extracting metadata from barcodes and catalog
numbers. The ultimate goal is to reduce the manual effort
required while enabling integration with music databases
for efficient collection management. This report outlines
the methods, results, and recommendations for further de-
velopment to achieve a more comprehensive solution.
3. Related Work
The application of computer vision techniques for object
detection and metadata extraction has been explored in var-
ious domains, including inventory management, library cat-
aloging, product identification, and even face identification.
Real-time object detection models, particularly YOLO (You
Only Look Once) architectures, have been widely adopted
for their speed and accuracy in detecting and classifying ob-
jects. The YOLOv8 model, which we have used, builds
upon previous iterations, and its efficiency makes it suitable
for real-time applications.
Optical character recognition (OCR) has also seen sig-
nificant advancements over the years, with applications
ranging from digitizing text in scanned documents to ex-
tracting product identifiers. Various OCR tools and custom
deep learning-based OCR models have been leveraged to
recognize alphanumeric text, such as catalog numbers, un-
der a variety of conditions. Barcode scanning, also a well-
established technology, has similarly been enhanced by in-
tegrating computer vision to improve reliability and perfor-
mance in real-world scenarios.
Our work builds on these advancements by combining
real-time object detection, OCR, and barcode scanning into
a single pipeline specified for CD cataloging. While prior
research often focuses on specific components, such as im-
proving object detection or OCR accuracy, this project aims
to integrate these components into a practical application.
Additionally, the incorporation of music database APIs to
retrieve and manage metadata bridges the gap between com-
puter vision techniques and collection management sys-
tems, while creating a streamlined user experience.
4. Implementation
4.1. System Overview
The application is designed to streamline CD cataloging
through employing computer vision techniques such as by
automating object detection and information retrieval pro-
cesses. The workflow begins with identifying the orienta-
1