Data Warehouse Testing
This week we continue to look at the different software testing types. Today we break down data warehouse testing. This process involves comparing of millions of data records typically found within a database or large file.
What is a data warehouse (DW)?
Simply put, it is a system used to hold an organisation’s historical data. It allows users to then process and analyse the data for business decision support. The data held is typically that which is used for:
- Customer Relationship Management (CRM)
- Enterprise Resource Planning (ERP)
- Online Transaction Processing (OLTP)
There is often confusion over what differentiates a data warehouse from a database. The former is optimised towards the querying and reading of large data sets. The latter is geared towards maximising the overall efficiency of updates.
What is data warehouse testing?
The success of any organisation’s DW solution relates directly to the quality of data held. If they wish to validate this, then testing must be performed.
In order to do this then, a process known as ‘Extract, Transform and Load’ (ETL) is deployed. This loads data from source systems into the DW, accurately transforming it to then match the target schema and also verifying it along the way.
Top Tip: What are some of the best practices?
During the initial analysis phase, the testing team must familiarise themselves with the different stages of the DW implementation:
- DW design
- Source data profiling
- ETL development
- Data loading…
- Then transformations
Bonus tip: During the ETL process, data flows through a pipeline which must also then be tested to ensure all data types are transformed as expected.
What are the different phases of ETL testing?
- Analysis of core business requirements
- Test plan creation with timelines
- Test case design / test data selection
- Tests execution with reporting
- Report summary with analysis…
- Then test completion
The six phases are typically adopted in the planning stage. Testing itself is then carried out across five phases:
1. Identifying data sources / requirements
2. Data acquisition
3. Implementation of business logic and dimensional modelling
4. Data build…
5. Then creating reports
Common ETL testing types
Metadata: Verifies the table definitions against data model and application design specifications
Completeness: Confirms that all data has been loaded from the source
Quality: Validates data accuracy
Transformation: White box and black box; white box examines program structure and then black box examines the functionality of an application
Regression: Confirms that the ETL produces the same output for an input before and then after a change
Incremental: Confirms that updates on sources are being loaded into the system correctly
Integration: End-to-end testing within the ETL process as a whole
Performance: This determines then whether or not the overall process is working with the expected volumes of data.
Data warehouses are a central source of information then, and as such are of critical importance to any organisation. Over the last few weeks, we have seen that testing is an essential aspect of IT maintenance in general. This is especially true in today’s data-driven world. If you would like more information then feel free to contact us.
eTestware is part of theICEway ecosystem of companies. We work with your IT teams to deliver the right results. We’ll take time to understand your objectives and then work with you to achieve them.
Next week: Automation Testing