【Data Analysis(1)】- Numpy, Pandas

Using NumPy and Pandas to start your first step of data analysis

TEJ 台灣經濟新報
TEJ-API Financial Data Analysis
7 min readMar 29, 2021

--

After reading our previous articles, you might have already known how to get the data from TEJ API, store it into your computer, and update automatically! Then we are going to tell you how to analyze this data by using these two important packages- Numpy and Pandas

✨ Highlights of this article ✨

  • 🌟 Numpy Intro/Application
  • 🌟 Pandas Intro/Application

🚪Links related to this article🚪

* What is Numpy? How to use it?

Numpy is designed to conveniently and efficiently process n-dimensional and large-scale data arrays. With built-in functions, users could perform preliminary and rapid data processing.

  • Basic Application-Single Dimension

Examples above:

  1. Create a float data type array; string data type array
  2. Through np.arange() function, creating an array starts with 0, ends with 2, and the interval is 2.
  3. In python, ”[]” means select, and ”: ” means to…. But what we have to notice is that the location of the first element is 0 instead of 1 in python. Therefore, c[2:] means selecting the element from location 2 to the end (include the last element).
  4. Same as above, but if we change from c[2:] to c[:2], which means selecting elements from start to location 1( location 2 is not included)!!
  • Mathematical Tools

Examples above:

  1. the sum of array a; average; standard deviation; cumulative sum
  2. elements in array an add with the corresponding position; multiply with the corresponding position

The first example is to use numpy built-in functions to calculate. In the second example, we can see the numpy vectorized computation. If we multiply a list(2–1) by 2, the number of elements in the list will double instead of doubling the value. But if it is numpy array(2–2, 2–3), it is possible to perform mathematical operations on the corresponding positions of the elements in the array~💪💪

  • Basic Application-Multiple Dimensions

Examples above:

  1. Select the first row of array b; select the second element of the first row of the array b; row sum of array b
  2. Shape(2*15) of array b; change to a new shape(2*15 -> 5*6)

Next, let’s take a look at how numpy performs on multi-dimensional arrays. Similarly, we also use “[]” to select. The difference is that there are more elements that can be selected, so we can use 2 “[][]” to select column and position respectively. If we want to do some matrix operations, we can use shape functions in numpy to check and find the desired shape to do the calculation.~💪💪

  • Other Applications-Boolean, Random Variables, Financial Functions

Examples above:

  1. Boolean:
    We can directly use inequality(bigger than 15 in the example) to find the corresponding T/F array in numpy array or use np.where() function to make a new way of judging T/F (T is 1, F is 0 in the example).
  2. Random Variables:
    Using different distributions in statistics to generate random variables, such as the normal distribution in the example(mean 5, std 2, 10 elements), and standard normal distribution, and so on.
  3. Financial Functions:
    In numpy, there is also a package designed for financial functions such as fv, pv, and irr which will be used when discounting. But we will need to install this package separately. All functions included in this package can be checked in HERE~.

Numpy has many applications for data processing, so it is very difficult for us to tell you all of them in just one article😢. Therefore, if you are interested in numpy, you can go through Numpy Official Website or leave the message below!💪💪

* What is Pandas? How to use it?

Pandas is a package that specializes in analyzing table data. Just like Excel, it presents data in a format we called DataFrame in order to help users analyze data more conveniently, especially for financial time series data.

  • Basic Application

From the codes above, we can create a table with column name “Numbers” and row names ”index_a, b, c, and d” respectively.

Examples above:

  1. Use loc and iloc to find the corresponding value. It should be noted that loc is the name of the column/row, so we have to enter the name when selecting, while iloc is the position corresponding to the element. For example(1–2), select the elements from the start to position 1 (2 is not included!).
  2. Add; select; delete the column
  3. Sum of the whole df; average; standard deviation

Like the numpy arrays which we have mentioned earlier, in Pandas, we also use brackets [“column name”] to select or add columns. But we will have to use the drop() function to delete columns. For operations, pandas dataFrame can perform basic statistical calculations in tables.~💪💪

  • Basic Data Analysis
Descriptive Statistics Table

The sample data we used for pandas data analysis is 2330.TW stock price daily data got from the TEJ API. Then, most of the statistics that may be used further can be obtained through describe() function(figure above👆). If we want to do some operations on these values, we could directly use numpy to perform operations on the entire table!

Stock Price(Daily)

Last is the data visualization. There are several ways for users to plot the graph in python, and Pandas provides a very very easy one! If the chart we want to present is not complicated such as simple stock daily price, daily return, etc. We can select the column and use the plot() function to directly see the result! (figure above👆)

The only thing we have to note here is that the X and Y axes in the chart are the index and data you select respectively. That’s why we use a set_index() function to process our raw data at first.

* Conclusion

What we share with you this time is how to use Numpy and Pandas packages to do the data analysis. However, it is very difficult for us to explain all the functions included in these 2 packages. Therefore, if you have any question or interested in any topic, you could go to their websites or leave the message below ❗️❗️ Then, we will go further into financial data analysis and applications in the next article, please look forward to it ❗️❗️

Finally, if you like this topic, please click 👏 below, giving us more support and encouragement. Additionally, if you have any questions or suggestions, please leave a message or email us, we will try our best to reply to you.👍👍

Links related to this article again!💪

If you have any question and difficulty, do not hesitate to contact us: Contact Information

Source Code🚪:

--

--

TEJ 台灣經濟新報
TEJ-API Financial Data Analysis

TEJ 為台灣本土第一大財經資訊公司,成立於 1990 年,提供金融市場基本分析所需資訊,以及信用風險、法遵科技、資產評價、量化分析及 ESG 等解決方案及顧問服務。鑒於財務金融領域日趨多元與複雜,TEJ 結合實務與學術界的精英人才,致力於開發機器學習、人工智慧 AI 及自然語言處理 NLP 等新技術,持續提供創新服務