Skip to content

The Hidden Patterns of Health: A Cluster Analysis of CDC Data

The analysis begins with data cleaning and standardization, followed by k-means clustering to group counties based on their health metrics. The optimal number of clusters was determined to be four. Various visualizations were created to explore these clusters.

Key Findings
1. Cluster-wise Box Plots
Box plots were used to visualize the spread of each health metric across the four clusters. These plots reveal variations in the distribution of diabetes, obesity, and inactivity rates among the different clusters.

2. Cluster-wise Bar Plots
Bar plots were created to visualize the average rates of diabetes, obesity, and inactivity for each cluster. These plots offer a bird’s-eye view of the health status across clusters and help identify which clusters might require targeted interventions.

3. Pairwise Scatter Plots
Pairwise scatter plots were employed to understand the relationships between each pair of health metrics within each cluster. These plots can help identify correlations or trends among the health metrics within each cluster.

Cluster Summaries
Cluster 0: High rates of diabetes and inactivity.
Cluster 1: Moderate rates across all metrics.
Cluster 2: Lower rates for all three health metrics.
Cluster 3: High obesity rates but moderate diabetes and inactivity rates.

The analysis provides valuable insights into the health metrics of U.S. counties, identifying specific clusters that may require targeted healthcare interventions. By understanding these clusters, policymakers and healthcare providers can develop more effective programs to improve public health

Published inUncategorized

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *