Race Bar plot in python using Matplot animation

For those who don't know what is Tidytuesday: Tidytuesday is a weekly data science project mainly for R ecosystem. Every week a raw dataset, a chart or article related to that dataset is posted, and you can explore the data, create a visualization, data models, etc. and post the results in social network with #tidytuesday tags. More info, please check here: https://github.com/rfordatascience/tidytuesday/

So, this week tidytuesday data is coming from Great American Beer Festival.
The Professional Judge Panel awards gold, silver or bronze medals that are recognized around the world as symbols of brewing excellence. These awards are among the most coveted in the industry and heralded by the winning brewers in their national advertising. Five different three-hour judging sessions take place over the three-day period during the week of the festival. Judges are assigned beers to evaluate in their specific area of expertise and never judge their own product or any product in which they have a concern.

Information which are available from this dataset are: Medal (Gold, Silver or Bronze), Beer name, Brewey, City, State, Category & year.

I liked to explore the data a bit using python and create an animation plot which displayed which states got the most medals during the years (from 1987 till 2020).

Let's jump in the code:

Data loading & preparing

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from colorsys import hls_to_rgb
import matplotlib.ticker as ticker

Let's read data directly from GitHub:

beer_awards = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-20/beer_awards.csv')

Check top rows:

beer_awards.head()

medal beer_name brewery city state category year
0 Gold Volksbier Vienna Wibby Brewing Longmont CO American Amber Lager 2020
1 Silver Oktoberfest Founders Brewing Co. Grand Rapids MI American Amber Lager 2020
2 Bronze Amber Lager Skipping Rock Beer Co. Staunton VA American Amber Lager 2020
3 Gold Lager at World's End Epidemic Ales Concord CA American Lager 2020
4 Silver Seismic Tremor Seismic Brewing Co. Santa Rosa CA American Lager 2020

Check general info:

beer_awards.info()

RangeIndex: 4970 entries, 0 to 4969
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   medal      4970 non-null   object
 1   beer_name  4970 non-null   object
 2   brewery    4970 non-null   object
 3   city       4970 non-null   object
 4   state      4970 non-null   object
 5   category   4970 non-null   object
 6   year       4970 non-null   int64 
dtypes: int64(1), object(6)
memory usage: 271.9+ KB

Animation creation

Let's dedicate a unique color for each state(better for final visualization):

un_state = beer_awards.state.unique()
NUM_COLORS = len(un_state)
clrs = sns.color_palette("hls", NUM_COLORS)
colors = dict(zip(
    
    un_state,clrs
))

For the animation part, I have used this blog post for the initial code: https://towardsdatascience.com/bar-chart-race-in-python-with-matplotlib-8e687a5c8a41.
Matplot has a dedicated package for animation, and since I would like to display the animation in jupyter notebook, had to use HTML display from Ipython

import matplotlib.animation as animation
from IPython.display import HTML

Since I liked to display Bar race plot for each year, in function definition, we have 'year' as the input parameters & will filter the data based on the input year to find the top 10 states which won most of the medals, on that year:

fig, ax = plt.subplots(figsize=(15, 8))
def draw_barchart(year):
    beer_awards_lim = beer_awards.loc[beer_awards.year == year]
    sm = pd.DataFrame(beer_awards_lim.groupby(['state'])['medal'].count().reset_index())
    sm_sorted = sm.sort_values(by=['medal'], ascending = False).reset_index()
    top_states = sm_sorted.loc[0:10,]['state']
    dff2 = beer_awards_lim.loc[beer_awards.state.isin(top_states)]
    dff = dff2.groupby('state')['medal'].count().reset_index().sort_values(by='medal', ascending=True)    
    dx = dff['medal'].max() 
    ax.clear()
    ax.barh(dff['state'], dff['medal'], color=[colors[x] for x in dff['state']])
    for i, (state, medal) in enumerate(zip(dff['state'], dff['medal'])):
        ax.text(medal-0.5, i,     state,           size=14, weight=600, ha='right', va='center')
        ax.text(medal, i,     f'{medal:,.0f}',  size=14, ha='left',  va='center')
    ax.text(1, 0.4, year, transform=ax.transAxes, color='#777777', size=46, ha='right', weight=800)
    ax.xaxis.set_ticks_position('top')
    ax.set_yticks([])
    ax.grid(which='major', axis='x', linestyle='-')
    ax.set_axisbelow(True)
    ax.text(0, 1.1, 'States that won the highest number of medals from 1987 to 2020',
            transform=ax.transAxes, size=24, weight=600, ha='left')
    plt.box(False)
draw_barchart(1987)


Here we use FuncAnimation to create the animation using these parameters:

  • Figure size
  • Function to call for creating the bar plot for each year to display top 10 state which won the most medals
  • Frames are year's range
  • Interval: delay between the frames

fig, ax = plt.subplots(figsize=(15, 8))

animator = animation.FuncAnimation(fig, draw_barchart, frames=range(beer_awards.year.min(), beer_awards.year.max()+1),interval = 600)
HTML(animator.to_jshtml()) 

There we go!
I have tried other options for creating animation inside jupyter notebook, but this one worked the best!

Save animation

To save the animation, you can use save function, by providing:

  • File name
  • Writer type: you may not have 'ffmpeg' writer available by default, so you have to install it, in conda I did by:

#conda install -c conda-forge ffmpeg

Writer = animation.writers['ffmpeg']
writer = Writer(fps=2, metadata=dict(artist='Me'), bitrate=1800)
animator.save(filename ='BeerUS.mp4', writer=writer)

That's it!
In next version, I will add types of medals(Gold, Silver & Bronze) on the bar plot, update this soon!

Author: Pari

Leave a Reply

Your email address will not be published. Required fields are marked *