Race Bar plot in python using Matplot animation
For those who don't know what is Tidytuesday: Tidytuesday is a weekly data science project mainly for R ecosystem. Every week a raw dataset, a chart or article related to that dataset is posted, and you can explore the data, create a visualization, data models, etc. and post the results in social network with #tidytuesday tags. More info, please check here: https://github.com/rfordatascience/tidytuesday/
So, this week tidytuesday data is coming from Great American Beer Festival.
The Professional Judge Panel awards gold, silver or bronze medals that are recognized around the world as symbols of brewing excellence. These awards are among the most coveted in the industry and heralded by the winning brewers in their national advertising. Five different three-hour judging sessions take place over the three-day period during the week of the festival. Judges are assigned beers to evaluate in their specific area of expertise and never judge their own product or any product in which they have a concern.
Information which are available from this dataset are: Medal (Gold, Silver or Bronze), Beer name, Brewey, City, State, Category & year.
I liked to explore the data a bit using python and create an animation plot which displayed which states got the most medals during the years (from 1987 till 2020).
Let's jump in the code:
Data loading & preparing
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from colorsys import hls_to_rgb import matplotlib.ticker as ticker
Let's read data directly from GitHub:
beer_awards = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-20/beer_awards.csv')
Check top rows:
beer_awards.head()
medal | beer_name | brewery | city | state | category | year | |
---|---|---|---|---|---|---|---|
0 | Gold | Volksbier Vienna | Wibby Brewing | Longmont | CO | American Amber Lager | 2020 |
1 | Silver | Oktoberfest | Founders Brewing Co. | Grand Rapids | MI | American Amber Lager | 2020 |
2 | Bronze | Amber Lager | Skipping Rock Beer Co. | Staunton | VA | American Amber Lager | 2020 |
3 | Gold | Lager at World's End | Epidemic Ales | Concord | CA | American Lager | 2020 |
4 | Silver | Seismic Tremor | Seismic Brewing Co. | Santa Rosa | CA | American Lager | 2020 |
Check general info:
beer_awards.info()
RangeIndex: 4970 entries, 0 to 4969
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 medal 4970 non-null object
1 beer_name 4970 non-null object
2 brewery 4970 non-null object
3 city 4970 non-null object
4 state 4970 non-null object
5 category 4970 non-null object
6 year 4970 non-null int64
dtypes: int64(1), object(6)
memory usage: 271.9+ KB
Animation creation
Let's dedicate a unique color for each state(better for final visualization):
un_state = beer_awards.state.unique() NUM_COLORS = len(un_state) clrs = sns.color_palette("hls", NUM_COLORS) colors = dict(zip( un_state,clrs ))
For the animation part, I have used this blog post for the initial code: https://towardsdatascience.com/bar-chart-race-in-python-with-matplotlib-8e687a5c8a41.
Matplot has a dedicated package for animation, and since I would like to display the animation in jupyter notebook, had to use HTML display from Ipython
import matplotlib.animation as animation from IPython.display import HTML
Since I liked to display Bar race plot for each year, in function definition, we have 'year' as the input parameters & will filter the data based on the input year to find the top 10 states which won most of the medals, on that year:
fig, ax = plt.subplots(figsize=(15, 8)) def draw_barchart(year): beer_awards_lim = beer_awards.loc[beer_awards.year == year] sm = pd.DataFrame(beer_awards_lim.groupby(['state'])['medal'].count().reset_index()) sm_sorted = sm.sort_values(by=['medal'], ascending = False).reset_index() top_states = sm_sorted.loc[0:10,]['state'] dff2 = beer_awards_lim.loc[beer_awards.state.isin(top_states)] dff = dff2.groupby('state')['medal'].count().reset_index().sort_values(by='medal', ascending=True) dx = dff['medal'].max() ax.clear() ax.barh(dff['state'], dff['medal'], color=[colors[x] for x in dff['state']]) for i, (state, medal) in enumerate(zip(dff['state'], dff['medal'])): ax.text(medal-0.5, i, state, size=14, weight=600, ha='right', va='center') ax.text(medal, i, f'{medal:,.0f}', size=14, ha='left', va='center') ax.text(1, 0.4, year, transform=ax.transAxes, color='#777777', size=46, ha='right', weight=800) ax.xaxis.set_ticks_position('top') ax.set_yticks([]) ax.grid(which='major', axis='x', linestyle='-') ax.set_axisbelow(True) ax.text(0, 1.1, 'States that won the highest number of medals from 1987 to 2020', transform=ax.transAxes, size=24, weight=600, ha='left') plt.box(False) draw_barchart(1987)
Here we use FuncAnimation to create the animation using these parameters:
- Figure size
- Function to call for creating the bar plot for each year to display top 10 state which won the most medals
- Frames are year's range
- Interval: delay between the frames
fig, ax = plt.subplots(figsize=(15, 8)) animator = animation.FuncAnimation(fig, draw_barchart, frames=range(beer_awards.year.min(), beer_awards.year.max()+1),interval = 600) HTML(animator.to_jshtml())
There we go!
I have tried other options for creating animation inside jupyter notebook, but this one worked the best!
Save animation
To save the animation, you can use save function, by providing:
- File name
- Writer type: you may not have 'ffmpeg' writer available by default, so you have to install it, in conda I did by:
#conda install -c conda-forge ffmpeg
Writer = animation.writers['ffmpeg'] writer = Writer(fps=2, metadata=dict(artist='Me'), bitrate=1800) animator.save(filename ='BeerUS.mp4', writer=writer)
That's it!
In next version, I will add types of medals(Gold, Silver & Bronze) on the bar plot, update this soon!