Best ETL Testing Training In Chennai Adyar
DB Testing vs. ETL Testing
Most of us are little confused over considering that both database testing and the ETL testing are similar and same. The fact is they are similar but not same.
DB Testing is usually used extensively in the business flows where there are multiple data flows occurring in the application from multiple data sources on to a single table. The data source can be a table, flat file, application or anything else that can yield some output data. In turn the output data obtained can still be used as input for the sequential business flow. Hence when we perform DB testing the most important thing that has to be captured is the way the data can get transformed from the source along with how it gets saved in the destination location.
Synchronization is one major and the essential thing that has to be considered when performing the DB testing. Due to the positioning of the application in the architectural flow, there might be few issues with the data or DB synchronization. Hence while performing the testing, this has to be taken care as this can overcome the potential invalid defects or bugs.
Project “A” has integrated architecture where particular application makes use of data from several other heterogeneous data sources. Hence the integrity of these data with the destination location has to be done along with the validations for the following:
- Primary foreign key validation
- Column values integrity
- Null values for any columns
ETL testing is a special type of testing that the client wants to have it done for their forecasting and analysis of their business. This is mostly used for the reporting purposes. For instance if the clients need to have reports on the customers who use or go for their product based on the day they purchase, they have to make use of the ETL reports.
Post analysis and reporting, this data is data warehoused to a data warehouse where the old historical business data has to be moved.
This is a multiple level testing as the data from the source is transformed into multiple environments before it reaches the final destined location.
We will consider a group “A” doing retail customer business through a shopping market where the customer can purchase any household items required for their day to day survival. Here all the customers visiting are provided with a unique membership id with which they can gain points every time they come to purchase things from the shopping market. The regulations provided by the group say that the points gained expire every year. And depending upon their usage, the membership can be either upgraded to a higher grade member or downgraded to a lower grade member comparatively to the current grade. After 5 years of shopping market establishment now management is looking for scaling up their business along with revenue.
Hence they required few business reports so that they can promote their customers.
In database testing we perform the following:
1) Validations on the target tables which are created with columns with logical calculations as described in the logical mapping sheet and the data routing document.
2) Manipulations like Inserting, updating and deletion of the customer data can be performed on the any end user POS application in an integrated system along with the back end database so that the same changes are reflected in the end system.
3) DB testing has to ensure that there is no customer data that has been misinterpreted or even truncated. This might lead to serious issues like in correct mapping of customer data with their loyalty
In ETL testing we check for the following:
1) Assuming there are 100 customers in the source, you will check whether all these customers along with their data from the 100 rows have been moved from the source system to the target. This is known as verification of Data completeness check.
2) Checking if the customer data has been properly manipulated and demonstrated in the 100 rows. This is simply called as verification ofData accuracy check.
3) Reports for the customers who have gained points more than x values within the particular period.
Comparative study of ETL and DB testing
ETL Testing Training In Chennai and DB testing have few of the aspects differing within themselves that is more essential to be understood before performing them. This helps us in understanding the values and significance of the testing and the way it helps the business.
Following is a tabular form that describes the basic behaviour of both the testing formats.
DB TestingETL TestingPrimary goalData integrationBI ReportingApplicable placeIn the functional system where the business flow occursExternal to the business flow environment. input is the historical business dataAutomation toolQTP, SeleniumInformatica, QuerySurge, COGNOSBusiness impactSevere impacts can lead as it is the integrated architecture of the business flowsPotential impacts as in when the clients wants to have the forecasting and analysis to be doneModelling usedEntity RelationshipDimensionalSystemOnline Transaction ProcessingOnline Analytical ProcessingData NatureNormalized data is being used hereDenormalized data is being used here
Why should the business go for ETL?
Plenty of business needs are available for them to consider ETL testing. Every business has to have their unique mission and the line of business. All business has their product life cycle which takes the generic form:
It is very clear that any new product enters the market with a tremendous growth in sales and till a stage called maturity and thereafter it declines in sales. This gradual change witnesses a definite drop in business growth. Hence it is more important to analyze the customer needs for the business growth and other factors required to make the organisation more profitable. So in reality, the clients want to analyze the historical data and come up with some reports strategically.
One of the main steps in ETL testing is about planning the test that is going to be executed. It will be similar to the test plan for the system testing that is usually performed except few attributes like requirements and test cases.
Here the requirements are nothing but a mapping sheet that will have kind of mapping between data within different databases. As we are aware that the ETL testing occurs in multiple levels, there are various mappings needed for validating this.
Most of the time the data being captured from the source databases is not directly. All the source data will have the tables’ view from where the data can be used.
Examples: Following is an example of how the mappings can be provided. The two columns VIEW_NAME and TABLE_NAME can be used to represent the views for reading data from the source and the table in the ETL environment respectively. It is advisable to maintain naming convention that can help us while planning for automation. Generic notation that can be used is just prefixing the name of the environment.
— — — — — —
Most significant thing in ETL is about identifying the essential data and the tables from the source. The next essential step is the mapping of tables from source to the ETL environment.
Following is an example how the mapping between the tables from the various environments can be related for the ETL purpose.
The above mapping assumes the data from the source table to the stage table. And from then on to the tables in EDW and then to OLAPwhich is the final reporting environment. Hence at any point of time, data synchronization is very important for the ETL sake.
Critical ETL needs
As we understand ETL is the need for forecasting, reporting and analysing of the business in order to capture the customer needs in a more successive manner. This will enable the business to have higher demands than the past.
Here are few of the critical needs without which ETL testing cannot be achieved:
- Data and tables identification — This is important as there can be many other irrelevant and unnecessary data that can be of least importance when forecasting and analysing the customer needs. Hence the relevant data and the tables have to be selected before starting up the ETL works.
- Mapping sheet — This is one of the critical needs while doing ETL works. Mapping of the right table from the source to the destination is mandatory and any problems or incorrect data in this sheet might impact the whole ETL deliverable.
- Table designs and data, column type — This is next major step when considering the mapping of source tables into the destined tables. The column type has to match with the tables at both the places etc.
- Database access — Main thing is the access to the database where ETL goes on. Any restrictions on the access will have equivalent impact.
ETL reporting and testing
Reporting in ETL is more important as it explains and directs the clients the customer needs. By this they can forecast and analyse the exact customer needs
A company which manufactures silk fabric wanted to analyse on their annual sales. On review of their annual sales, they found during the month of August and September there was tremendous fall in sales with the use of the report they generated. Hence they decided to roll out promotional offer like exchange, discounts etc., that enhanced their sales.
Basic issues in ETL testing
There can be number of issues while performing ETL testing like the following:
- Either the access to the source tables or the views will not be valid.
- The column name and the data type from the source to the next layer might not match.
- Number of records from the source table to the destined tabled might not match.
And there might be many more..
Following is a sample of mapping sheet where there are columns like VIEW_NAME, COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and TRANSFORMATION LOGIC present.
The first 3 columns represent the details of the source database and the next 3 are the details for the immediate preceding database. The last column is very important. Transformation logic is the way the data from the source is read and stored in the destined database. This depends on the business and the ETL needs.
Points to remember while ETL test planning and execution
The most important thing in ETL testing is loading of data based on the extraction criteria from the source DB. When this criterion is invalid or obsolete then there will be no data in the table to perform ETL testing that really brings in more issues.
Following are few of the points to be taken care while ETL test planning and execution:
#1: Data is being extracted from the heterogeneous data sources
#2: ETL process handling in the integrated environment that have different:
- Communication protocols
#3: Necessity in having a logical data mapping sheet before the physical data can be transformed
#4: Understanding and examining of the data sources
#5: Initial load and the incremental load
#6: Audit columns
#7: Loading the facts and the dimensions
ETL Tools and their significant usage
ETL tools are basically used to build and convert the transformation logic by taking data from the source into another applying the transformation logic. You can also map the schemas from the source to the destination which occurs in unique ways, transform and clean up data before it can be moved into the destination, along with loading at the destination in an efficient manner.
This can significantly reduce the manual efforts as the mapping can be done that is used for almost all of the ETL validation and the verification.