Analytics Vidhya
Published in

Analytics Vidhya

ML11: Hands-On Line Chart with Python

Drawing a satisfactory line chart by matplotlib.pyplot

Read time: 15 minPics, data & Python code: https://bit.ly/2UZftXq

Visualization always comes first when it comes to ML/DS project. Visualization help get insights of a given dataset, and through EDA (exploratory data analysis) we rely on visualization and other manners to prepare for building a performant model.

This data is the word counts of FOMC minutes from 1993/01 ~ 2020/09 after data selection and data pre-processing (by me), along with the Fed fund rate change after each FOMC. FOMC minute is released 3 weeks after every FOMC meeting. The data sources come from FOMC minutes & FOMC’s target federal funds rate or range, change (basis points) and level.

(1) Importance of Visualization in a ML/DS Project

… that very different Machine Learning algorithms, including fairly simple ones, performed almost identically well on a complex problem of natural language disambiguation once they were given enough data. [1]

There are a couple of main challenges of machine learning as follows: [1]

  1. Insufficient Quantity of Training Data
  2. Nonrepresentative Training Data
  3. Poor-Quality Data
  4. Irrelevant Features
  5. Overfitting the Training Data
  6. Underfitting the Training Data
Figure 1: Peformances of algorithms given enough data

In a famous paper published in 2001, Microsoft researchers Michele Banko and Eric Brill showed that very different Machine Learning algorithms, including fairly simple ones, performed almost identically well on a complex problem of natural language disambiguation once they were given enough data (as you can see in Figure 1).

As the authors put it: “these results suggest that we may want to reconsider the tradeoff between spending time and money on algorithm development versus spending it on corpus development.”

The idea that data matters more than algorithms for complex problems was further popularized by Peter Norvig et al. in a paper titled “The Unreasonable Effectiveness of Data” published in 2009.10 It should be noted, however, that small- and mediumsized datasets are still very common, and it is not always easy or cheap to get extra training data, so don’t abandon algorithms just yet.

It should be noted, however, that small- and medium- sized datasets are still very common, and it is not always easy or cheap to get extra training data, so don’t abandon algorithms just yet. [1]

Now that we see how major the sufficient quantity of data can impact a ML/DS project, it’s a not a surprise that the other challenges like “Nonrepresentative Training Data”, “Poor-Quality Data”, “Irrelevant Features” may significantly affect the performance of a ML/DS project.

So, visualization is the right tool to help solve those three challenges — “Nonrepresentative Training Data”, “Poor-Quality Data”, “Irrelevant Features.”

(2) Our goal: A Real-World Case

Let’s go straight to a real-world case— the line chart we desire — then we would see how to build this satisfactory line chart from scratch. “Up”, “Down”, and “Unchanged” stand for the targeted Fed fund rate change after every FOMC meeting.

Figure 2: The desired line chart. It will appear again at the end of this article.

(3) Starting Point: A Primitive Line Chart

#%% (2) Input & Setupimport os
os.chdir('D:\\G03_1\\Medium\\ML11') # Change it to desired directory
os.getcwd()
import pickle # Python-specific data format
with open("ML11_FOMC.pickle", 'rb') as file: # 'rb': read & binary
ML11_Morton_Kuo = pickle.load(file)
import numpy as np
FOMC_words = np.array(ML11_Morton_Kuo[0])
up_index = np.array(ML11_Morton_Kuo[1])
down_index = np.array(ML11_Morton_Kuo[2])
unchanged_index = np.array(ML11_Morton_Kuo[3])
import matplotlib.pyplot as plt
#%% (3) Starting Point: A Primitive Line Chartplt.figure(figsize=(18, 6))
plt.plot(list(range(1,223)), FOMC_words, 'bo')
# Draw markers. 'b' is blue. 'o' is circle marker.
plt.plot(list(range(1,223)), FOMC_words, 'k')
# Draw a line. 'k' is black. No marker, the data will be a line without markers.
plt.xlabel('Ordinal number (1993 ~ 2020)')
plt.ylabel('Words')
plt.title('Words of FOMC minutes from 1993 to 2020')
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.savefig('02_FOMC_Primitive.png') # 54.8 kb. Pretty small.
plt.close() # Close the current figure

Check matplotlib’s official document for more details (setting markers, colors) of matplotlib.pyplot.plot. Also, check the matplotlib’s official document for the complete list of named colors. Note that Pickle is a Python-specific data format.

Figure 3: A primitive line chart.

(4) High Definition & Tight Layout

#%% (4) High Definition & Tight Layoutplt.figure(figsize=(18, 6))
plt.plot(list(range(1,223)), FOMC_words, 'bo')
plt.plot(list(range(1,223)), FOMC_words, 'k')
plt.xlabel('Ordinal number (1993 ~ 2020)')
plt.ylabel('Words')
plt.title('Words of FOMC minutes from 1993 to 2020')
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.tight_layout() # tight layout for plt.show()
plt.savefig('03_FOMC_High_Definition_Tight_Layout.png', dpi= 800, bbox_inches= 'tight')
# 1.17 MB. High definition. By default, dpi= 100.
# bbox_inches= 'tight' makes to saved figure tight layout
plt.close()
Figure 4: High definition & tight layout.

(5) Figure size & Font Size

#%% (5) Figure Size & Font Size plt.figure(figsize=(15, 5)) # Change figure size
plt.plot(list(range(1,223)), FOMC_words, 'bo')
plt.plot(list(range(1,223)), FOMC_words, 'k')
plt.xlabel('Ordinal number (1993 ~ 2020)', fontsize = 'xx-large') # Change font size
plt.ylabel('Words', fontsize = 'xx-large') # Change font size
plt.title('Words of FOMC minutes from 1993 to 2020', fontname='Comic Sans MS', fontsize = 'xx-large') # Change font size
'''
1. family: A list of font names in decreasing order of priority. The items may include a generic font family name,
either 'serif', 'sans-serif', 'cursive', 'fantasy', or 'monospace'. In that case, the actual font to be used will
be looked up from the associated rcParam. Try fontname = 'Comic Sans MS' & fontname="Arial".
2. fontsize: Either an relative value of 'xx-small', 'x-small', 'small', 'medium', 'large', 'x-large', 'xx-large'
or an absolute font size, e.g., 12.
'''
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.savefig('04_FOMC_Figure_Size_Font_Size.png', dpi= 800, bbox_inches= 'tight')
plt.close()
Figure 5: Figure size & font size.

The following annotations are noteworthy.

1. family: A list of font names in decreasing order of priority. The items may include a generic font family name, either 'serif', 'sans-serif', 'cursive', 'fantasy', or 'monospace'. In that case, the actual font to be used will be looked up from the associated rcParam. Try fontname = 'Comic Sans MS' & fontname="Arial".

2. fontsize: Either an relative value of 'xx-small', 'x-small', 'small', 'medium', 'large', 'x-large', 'xx-large' or an absolute font size, e.g., 12.

(6) Axis & Type of Line and Marker

#%% (6) Axis & Type of Line and Markerplt.figure(figsize=(15, 5)) 
plt.plot(list(range(1,223)), FOMC_words, 'p', color= 'royalblue')
# Color changed
plt.plot(list(range(1,223)), FOMC_words, '--k') # Color changed
plt.xlabel('Ordinal number (1993 ~ 2020)', fontsize = 'xx-large')
plt.ylabel('Words', fontsize = 'xx-large')
plt.title('Words of FOMC minutes from 1993 to 2020', fontname='Comic Sans MS', fontsize = 'xx-large')
plt.axis([-1, 224, 0, 4800]) # Adjust the scope
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.savefig('05_FOMC_Axis_Line_Type.png', dpi= 800, bbox_inches= 'tight')
plt.close()
Figure 6: Axis & type of line and marker.

(7) Grid

#%% (7) Grid## Grid_1
plt.figure(figsize=(15, 5))
plt.plot(list(range(1,223)), FOMC_words, 'p', color= 'royalblue')
plt.plot(list(range(1,223)), FOMC_words, '--k')
plt.xlabel('Ordinal number (1993 ~ 2020)', fontsize = 'xx-large')
plt.ylabel('Words', fontsize = 'xx-large')
plt.title('Words of FOMC minutes from 1993 to 2020', fontname='Comic Sans MS', fontsize = 'xx-large')
plt.axis([-1, 224, 0, 4800])
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.grid() # Simply add grid by default
plt.savefig('06_FOMC_Grid_1.png', dpi= 800, bbox_inches= 'tight')
plt.close()
Figure 7: Grid_1.

Then, let’s see what can we do to create extraordinary grid. Just adjust the line plt.grid( ).

plt.grid(color='slategray', linestyle='-.', linewidth= 1.2, b=None, which='major', axis='both')
# Try to adjust color, linestyl, and linewidth. By default, linestyle= '-', linewidth= 1.
# The default of color is not indicated by the official document; however, it is close to 'silver'.
Figure 8: Grid_2.

Check matplotlib’s official document matplotlib.pyplot.grid.

(8) More Info: Shadow

#%% (8) More Info: Shadowplt.figure(figsize=(15, 5)) 
plt.plot(list(range(1,223)), FOMC_words, 'p', color= 'royalblue')
plt.plot(list(range(1,223)), FOMC_words, '--k')
plt.xlabel('Ordinal number (1993 ~ 2020)', fontsize = 'xx-large')
plt.ylabel('Words', fontsize = 'xx-large')
plt.title('Words of FOMC minutes from 1993 to 2020', fontname='Comic Sans MS', fontsize = 'xx-large')
plt.axis([-1, 224, 0, 4800])
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.grid()
# Shadow
plt.axvspan(66, 71, color='lightblue', alpha=0.5, lw=0)
plt.axvspan(120, 132, color='lightblue', alpha=0.5, lw=0)
plt.axvspan(218, 222, color='lightblue', alpha=0.5, lw=0)
plt.savefig('08_FOMC_Shadow.png', dpi= 1000, bbox_inches= 'tight') # dpi: 800 -> 1000
plt.close()
Figure 9: More info: shadow.

The light blue spans are recessions. Check US National Bureau of Economic Research & matplotlib.pyplot.axvspan.

(9) More Info: Annotation

#%% (9) More Info: Annotationplt.figure(figsize=(15, 5)) 
plt.plot(list(range(1,223)), FOMC_words, 'p', color= 'royalblue')
plt.plot(list(range(1,223)), FOMC_words, '--k')
plt.xlabel('Ordinal number (1993 ~ 2020)', fontsize = 'xx-large')
plt.ylabel('Words', fontsize = 'xx-large')
plt.title('Words of FOMC minutes from 1993 to 2020', fontname='Comic Sans MS', fontsize = 'xx-large')
plt.axis([-1, 224, 0, 4800])
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.grid()
plt.axvspan(66, 71, color='lightblue', alpha=0.5, lw=0)
plt.axvspan(120, 132, color='lightblue', alpha=0.5, lw=0)
plt.axvspan(218, 222, color='lightblue', alpha=0.5, lw=0)
# Annotation
crisis_data = [
(66, '2001/03. Peak. Dot-Com Bubble.'),
(120, '2007/12. Peak. Financial Crisis.'),
(218, '2020/02. Peak. COVID-19.')]
x, label = crisis_data[0]
plt.annotate(label, xy=(x, FOMC_words[x] - 600),
xytext= (x, FOMC_words[x] - 1200),
arrowprops= dict(facecolor='black', headwidth= 4, width=2, headlength= 4),
horizontalalignment='left', verticalalignment= 'top', fontsize = 'x-large')
x, label = crisis_data[1]
plt.annotate(label, xy=(x, FOMC_words[x] - 1000),
xytext= (x, FOMC_words[x] - 1600),
arrowprops= dict(facecolor='black', headwidth= 4, width=2, headlength= 4),
horizontalalignment='left', verticalalignment= 'top', fontsize = 'x-large')
x, label = crisis_data[2]
plt.annotate(label, xy=(x, FOMC_words[x] - 1200),
xytext= (x, FOMC_words[x] - 1700),
arrowprops= dict(facecolor='black', headwidth= 4, width=2, headlength= 4),
horizontalalignment='right', verticalalignment= 'top', fontsize = 'x-large')
plt.savefig('09_FOMC_Annotation.png', dpi= 1000, bbox_inches= 'tight')
plt.close()
Figure 10: More info — annotation.

Check matplotlib.pyplot.annotate or Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (2nd ed.).

(10) More Info: Up, Down or Unchanged

#%% (10) More Info: Up, Down or Unchangedplt.figure(figsize=(15, 5))plt.plot(up_index+1 ,  FOMC_words[up_index], 'o', color= 'blue', label="Up")  # up_index 
plt.plot(down_index+1, FOMC_words[down_index], 's', color= 'orangered', label="Down") # down_index
plt.plot(unchanged_index+1, FOMC_words[unchanged_index], '*', color= 'darkgreen', label="Unchanged") # unchanged_index
# Draw 3 kinds of markers and change their colors.
# 'o' is circle marker;'s' is square marker; '*' star marker.
# Add labels and plt.legend() will catch them later
plt.plot(list(range(1,223)), FOMC_words, '--k')plt.xlabel('Ordinal number (1993 ~ 2020)', fontsize = 'xx-large')
plt.ylabel('Words', fontsize = 'xx-large')
plt.title('Words of FOMC minutes from 1993 to 2020', fontsize = 'xx-large')
plt.axis([-1, 224, 0, 4800])
plt.xticks(range(0, 224, 25))
plt.yticks(range(0, 5001, 500))
plt.grid()
plt.axvspan(66, 71, color='lightblue', alpha=0.5, lw=0)
plt.axvspan(120, 132, color='lightblue', alpha=0.5, lw=0)
plt.axvspan(218, 222, color='lightblue', alpha=0.5, lw=0)
crisis_data = [
(66, '2001/03. Peak. Dot-Com Bubble.'),
(120, '2007/12. Peak. Financial Crisis.'),
(218, '2020/02. Peak. COVID-19.')]
x, label = crisis_data[0]
plt.annotate(label, xy=(x, FOMC_words[x] - 600),
xytext= (x, FOMC_words[x] - 1200),
arrowprops= dict(facecolor='black', headwidth= 4, width=2, headlength= 4),
horizontalalignment='left', verticalalignment= 'top', fontsize = 'x-large')
x, label = crisis_data[1]
plt.annotate(label, xy=(x, FOMC_words[x] - 1000),
xytext= (x, FOMC_words[x] - 1600),
arrowprops= dict(facecolor='black', headwidth= 4, width=2, headlength= 4),
horizontalalignment='left', verticalalignment= 'top', fontsize = 'x-large')
x, label = crisis_data[2]
plt.annotate(label, xy=(x, FOMC_words[x] - 1200),
xytext= (x, FOMC_words[x] - 1700),
arrowprops= dict(facecolor='black', headwidth= 4, width=2, headlength= 4),
horizontalalignment='right', verticalalignment= 'top', fontsize = 'x-large')
plt.legend(fontsize = 'large') # Indicate the labed markers
'''
1. fontsize: Either an relative value of 'xx-small', 'x-small', 'small', 'medium', 'large', 'x-large', 'xx-large'
or an absolute font size, e.g., 12.
'''
plt.savefig('10_FOMC_Up_Down_Unchanged.png', dpi= 1000, bbox_inches= 'tight')
plt.close()
Figure 11: More info: up, down or unchanged.

“Up”, “Down”, and “Unchanged” stand for the targeted Fed fund rate change after every FOMC meeting.

Finally, here we see this satisfactory line chart again! We can compare it with the graph below.

Figure 12: Effective Federal funds rate. [12]

(11) Reference

[1] Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). California, CA: O’Reilly Media.

[2] McKinney, W. (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (2nd ed.). California, CA: O’Reilly Media.

[3] matplotlib.pyplot.plot — Matplotlib 3.3.3 documentation. Retrieved from

[4] matplotlib.font_manager. Retrieved from
https://matplotlib.org/3.1.1/api/font_manager_api.html

[5] Soma, J. (2016). Changing fonts in matplotlib. Retrieved from

[6] List of named colors — Matplotlib 3.1.0 documentation. Retrieved from

[7] matplotlib.pyplot.grid — Matplotlib 3.3.3 documentation. Retrieved from

[8] matplotlib.pyplot.axvspan — Matplotlib 3.3.3 documentation. Retrieved from

[9] matplotlib.pyplot.annotate — Matplotlib 3.3.3 documentation. Retrieved from

[10] FOMC minutes. Retrieved from

[11] FOMC’s target federal funds rate or range, change (basis points) and level. Retrieved from

[12] Effective Federal Funds Rate. Retrieved from

[13] US Business Cycle Expansions and Contractions. Retrieved from

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yu-Cheng Kuo

Yu-Cheng Kuo

62 Followers

ML/DS using Python & R. A Taiwanese earned MBA from NCCU and BS from NTHU with MATH major & ECON minor. Email: yc.kuo.28@gmail.com