Mandatory Simple Knowledge to Survive Data Analyst Roles — Part 2

Astry Citrasari
6 min readJan 26, 2024

--

Everyone can be a Data Analyst but surviving the role is different story.

If you haven’t checked the Part 1 of this article, please click link above. For this part, I will continue explain No 3 until 5 about the simple knowledge to survive as Data Analyst.

  1. Network Shared Folder
  2. Basic Server Management
  3. Database Configuration
  4. Integration and Automation Tool
  5. (Optional) API — Application Programming Interface

Brief Explanation

This article will explain the surface information of those knowledge. If you would like to understand more about the skills, you must find credible online course (ex : Coursera, Udemy, etc) to learn more.

3. Database Configuration

Photo by Jan Antonin Kolar on Unsplash

Most tech and non-tech company starts using database server for storing their data. From this point, they start hire employee to be their Data Team. In early stage of building database, company need a person who is familiar with database architecture. A person who understand the fundamental of Relational Database Management System (RDBMS). In addition, after database has been built, Database Administrator team will handle the database operation.

why is this important for Data Analyst ?

But, don’t worry, the last paragraph is not what Data Analyst do. Data Analyst only need to pay attention how to get themselves into the Databases. As my experience using on-premises Database, there are some ways of authentication mode before we log in to Database and it also depend on which Database provider the company used.

For On-premises Oracle Database, Database schema is the first information we need to know first before we request to IT Team for access. In a company, there will be dozens of schema and only employee who authorize to see the data will be granted access. After your access request approved, you will be given username and password along with service name and server name/host where the database is located. (Please see Basic Server Management knowledge in part 1).

For application tool, there are 2 common tools to access Oracle : Oracle SQL Developer and PL/SQL Developer. Here is the Sign in/Log in UI for both tools.

Oracle SQL Developer. source : https://stackoverflow.com/questions/34234166/oracle-can-log-into-sqlplus-using-hr-account-but-unable-to-log-in-using-sql-deve
PL/SQL Developer. Source : https://incorporatedsapje.weebly.com/pl-sql-developer-oracle-logon.html

There is some cases where your devices is not ready to connect to database, even though you already have all information ready to key in. You will still not be able to log in if TNSNAMES.ora file is missing from your config system. Please make sure this .ora file is available in the config folder where your oracle tools is installed. For more clear explanation about TNSNAMES.ora, please check below link :

For On-premises SQL Server Database, Database Server name is the first information you need to have. SQL Server Database is one of Microsoft product, basically the authentication mode can be set into our windows account.

SQL Server Log in with Windows Authentication. https://windows-hexerror.linestarve.com/q/so45940406-logging-into-local-sql-server-instance-with-windows-auth

Another authentication for SQL Server is SQL Server Authentication. IT Team will give username and password for you for this type of authentication.

SQL Server Log in with SQL Server Authentication. https://windows-hexerror.linestarve.com/q/so45940406-logging-into-local-sql-server-instance-with-windows-auth

4. Integration and Automation Tool

Photo by Hunter Harritt on Unsplash

Integration and automation is needed to synchronize or transfer data from one data point to another data point which known as Data Integration. Before Data Analyst processing data, data is collecting from several sources. Data sources can be varied and it need integration system to collect it into 1 storage and those task must be automated by system.

Data Integration and Automation should not be responsible of Data Analyst, Data Engineer team will build and maintain the integration which also called ETL (Extraction, Transformation, and Load).

why is this important for Data Analyst ?

Data Analyst must have understanding

  • Source data and destination of data flow (Upstream and downstream data flow)
  • Type of data processing (Batch process or real time)
  • Frequency of the processing data from source (if it run batch processing)

I take example of website application in which people will do bank transaction. This application have online database which directly connect to application and storing all transaction data from customer. We call this database as OLTP — Online Transaction Processing. OLTP Database can’t be done and only less employee able to access that. If a lot of employee access the OLTP database, it will affect the performance and make the application slow. Data Analyst will not have access to this database because its function is not for reporting/business intelligence.

Once data from app is stored in OLTP Database, raw data is transferred into ODS (Operational Data Store) Database. The integration method to synchronize OLTP and ODS is varied and it usually happened in batch processing. It means there will be job/procedure to update data in ODS frequently (ex : every 15 minutes, hourly, 3-hourly). Some reporting can get the data from ODS Database. However, Data Analyst need to deal with complex processing as data in ODS Database is raw and sometimes is not human readable.

ETL processing usually happened between ODS and the next database that we call DWH — Data Warehouse. Complex processing, cleaning and formatting data is applied to the raw data from ODS and it creates to serve purpose of every end user (Finance, Supply Chain, or Production). The ETL will run lesser than data transfer from OLTP to ODS because it will load big amount of data and calculation. DWH usually run once or twice a day. DWH is the playground of Data Analyst. Data Analyst must fully understood how their DWH works in terms of explaining data behavior in the reporting to the end user. In addition, inconsistency or data loss also can be prevented once Data Analyst understand the bigger picture of data flow.

Data Integration Flow. https://streamsets.com/learn/data-integration/

5. API — Application Programming Interface

Photo by Tamas Tuzes-Katai on Unsplash

Integration is defines the data move from which data point to destination data point, API is rules how the data move from one to another data point. API is commonly used when application need third party source. For example, data is obtained from external organizations/outside company. API also help to secure the data flow because it can be added into firewall.

why is this important for Data Analyst ?

Data Analyst doesn’t always get data from internal sources. If the analysis getting deeper and larger, dataset from external is needed. For instance, when analysis include the worldwide location, then you must connect into openstreet map API. Another example, extracting data from website, which known as Web Scrapping. API will help Data Analyst to get wider data source.

API doesn’t always free of use, there is cost for advanced API if we’re going to use it. Company Policy also contributes for the usage of API, some company prohibited external API use for security reason. So, Data Analyst must consult with IT security team for external API usage policy.

Common API used for data analysis

https://www.springboard.com/blog/data-science/top-apis-for-data-scientists/

Those are the additional knowledge to help you survive and grow your role as Data Analyst. Data field have large task and responsibility that is not be able to bear with only 1 single employee to run it all. The best practice for company is to have their Data Engineer and Data Analyst separately, then a whole different IT Infrastructure team to support all the requirement needed from Data team.

Hopes this help all your journey through become a data expert.

--

--

Astry Citrasari

Developed automation using Alteryx and Python. If you love my article don't forget https://www.buymeacoffee.com/astrycitrao