Abstract:
This thesis examines the use of nonlinear PCA in data analysis, aiming to reduce
information loss during dimensionality reduction while preserving essential data
patterns. The study utilizes the wine dataset, selected for its relevance to customer
segmentation, containing 14 variables that present a challenging test case for
evaluating nonlinear PCA techniques on high-dimensional data. The dataset’s
complexity and practical significance in the wine industry add considerable value to
the research outcomes.
Data visualization using Python revealed that traditional PCA produced mixed
clusters, whereas Kernel PCA generated distinct clusters, highlighting its superior
performance. The dataset matrix exhibited a high condition number of 16251.1265,
indicating poor conditioning. Standard PCA resulted in a condition number of
1.502, pointing to sensitivity issues. Conversely, Kernel PCA achieved a lower
condition number of 1.42, reflecting its better handling of poorly conditioned
datasets and reduced information loss.
The research highlights Kernel PCA’s potential to enhance data analysis and
quality assessment in industrial settings. The thesis concludes by suggesting further
research to explore the practical applications of nonlinear PCA across various
industrial domains, contributing to the fields of applied mathematics and industrial
engineering.