Interactive circle packing plots
I was looking for something suited for visualizing hierarchical categorical data that goes beyond the regular bar graphs. This D3 zoomable circle packing visualization, done using the circlepackeR
package, uses a series of nested circles that you can click on and zoom in/out of. To learn more, please see the official documentation by the package author.
As usual, we will use the IBM Telco customer churn dataset. Since I’m quite a bit more comfortable with data wrangling in Python, I will first get the number of customers in each level of every categorical variable using pandas
:
## Import data
import pandas as pd
df = pd.read_csv("https://github.com/nchelaru/data-prep/raw/master/telco_cleaned_renamed.csv")
## Get categorical column names
cat_list = []
for col in df.columns:
if df[col].dtype == object:
cat_list.append(col)
## Get all possible levels of every categorical variable and number of data points in each level
cat_levels = {}
for col in cat_list:
levels = df[col].value_counts().to_dict()
cat_levels[col] = levels
## Convert nested dictionary to dataframe
nestdict = pd.DataFrame(cat_levels).stack().reset_index()
nestdict.columns = ['Level', 'Category', 'Population']
We can take a look at the first few rows to get an idea:
Level | Category | Population |
---|---|---|
Male | Gender | 3549 |
Female | Gender | 3483 |
No SeniorCitizen | SeniorCitizen | 5890 |
SeniorCitizen | SeniorCitizen | 1142 |
No Partner | Partner | 3639 |
Partner | Partner | 3393 |
Now we will take the prepared data and move to R for making the plot:
## Import libraries
library(tidyverse)
library(circlepackeR)
library(hrbrthemes)
library(htmlwidgets)
library(data.tree)
## Import data
nestdict <- py$nestdict
## Prepare data format
nestdict$pathString <- paste("world",
nestdict$Category,
nestdict$Level,
sep = "/")
population <- as.Node(nestdict)
## Make the plot
circlepackeR(population,
size = "Population",
color_min = "hsl(56,80%,80%)",
color_max = "hsl(341,30%,40%)")
Try clicking on the circles!
At a glance, the sizes of circles in the second level give a quick overview of relative distributions of the levels of each categorical variable. Click on the circles to zoom in and out!
When the occasion is right, this could be a really fun way to add some pizzazz to your visualizations. :)