Interactive circle packing plots


I was looking for something suited for visualizing hierarchical categorical data that goes beyond the regular bar graphs. This D3 zoomable circle packing visualization, done using the circlepackeR package, uses a series of nested circles that you can click on and zoom in/out of. To learn more, please see the official documentation by the package author.

As usual, we will use the IBM Telco customer churn dataset. Since I’m quite a bit more comfortable with data wrangling in Python, I will first get the number of customers in each level of every categorical variable using pandas:

## Import data
import pandas as pd

df = pd.read_csv("https://github.com/nchelaru/data-prep/raw/master/telco_cleaned_renamed.csv")

## Get categorical column names
cat_list = [] 

for col in df.columns:
  if df[col].dtype == object:
    cat_list.append(col)
    
## Get all possible levels of every categorical variable and number of data points in each level
cat_levels = {}

for col in cat_list:
  levels = df[col].value_counts().to_dict()
  cat_levels[col] = levels
  
## Convert nested dictionary to dataframe
nestdict = pd.DataFrame(cat_levels).stack().reset_index()

nestdict.columns = ['Level', 'Category', 'Population'] 

We can take a look at the first few rows to get an idea:

Level Category Population
Male Gender 3549
Female Gender 3483
No SeniorCitizen SeniorCitizen 5890
SeniorCitizen SeniorCitizen 1142
No Partner Partner 3639
Partner Partner 3393

Now we will take the prepared data and move to R for making the plot:

## Import libraries
library(tidyverse)
library(circlepackeR)  
library(hrbrthemes)
library(htmlwidgets)
library(data.tree)

## Import data
nestdict <- py$nestdict

## Prepare data format
nestdict$pathString <- paste("world", 
                             nestdict$Category, 
                             nestdict$Level, 
                             sep = "/")

population <- as.Node(nestdict)
 
## Make the plot
circlepackeR(population, 
             size = "Population", 
             color_min = "hsl(56,80%,80%)", 
             color_max = "hsl(341,30%,40%)")


Try clicking on the circles!

At a glance, the sizes of circles in the second level give a quick overview of relative distributions of the levels of each categorical variable. Click on the circles to zoom in and out!

When the occasion is right, this could be a really fun way to add some pizzazz to your visualizations. :)