Data Warehouse Testing
This week brings the latest in our series of articles looking at the different software testing types. Today we break down data warehouse testing, a process which involves the comparing of millions of data records typically found within a database or large file.
What is a data warehouse (DW)?
Simply put, it is a system used to hold an organisation’s historical data which allows users to process and analyse it for business decision support. The data held is typically that used for Customer Relationship Management (CRM), Enterprise Resource Planning (ERP) and Online Transaction Processing (OLTP).
There is often confusion over what differentiates a data warehouse from a database; the former is optimised towards the querying and reading of large data sets, whilst the latter is geared towards maximising the overall efficiency of updates.
What is data warehouse testing?
The success of any organisation’s DW solution relates directly to the quality of data held and in order to validate this, testing must be performed.
In order to do this, a process known as ‘Extract, Transform and Load’ (ETL) is deployed to load data from source systems into the DW, accurately transforming it to match the target schema and also verifying it along the way.
Top Tip: What are some of the best practices?
During the initial analysis phase, the testing team must familiarise themselves with the different stages of the DW implementation: DW design, source data profiling, ETL development, data loading and transformations.
Bonus tip: During the ETL process, data flows through a pipeline which must also be tested to ensure that all data types are transformed as expected.
What are the different phases of ETL testing?
- Analysis of core business requirements,
- Test plan creation with timelines,
- Test case design / test data selection,
- Tests execution with reporting,
- Report summary with analysis,
- Test completion.
Traditionally, the six phases listed here are adopted in the planning stage, with testing itself then carried out across five phases: 1. Identifying data sources / requirements, 2. Data acquisition, 3. Implementation of business logic and dimensional modelling, 4. Data build, 5. Reports created.
Common ETL testing types
Metadata: Verifies the table definitions against data model and application design specifications
Data Completeness: Confirms that all data has been loaded from the source
Data Quality: Validates data accuracy
Data Transformation: White box and black box; white box examines program structure and black box examines the functionality of an application
Regression: Confirms that the ETL produces the same output for an input before and after a change
Incremental: Confirms that updates on sources are being loaded into the system correctly
Integration: End-to-end testing within the ETL process as a whole
Performance: Similar to integration testing, this determines whether or not the overall process is working with the expected volumes of data.
Data warehouses are a central source of information, and as such are of critical importance to any organisation. Over the last few weeks, we have seen that testing is an essential aspect of IT maintenance in general – especially in today’s data-driven world. For more information, contact us.
eTestware is part of theICEway ecosystem of companies along with ICE and CRIBB Cyber Security. Together, we work with your IT teams to deliver the right results for your business.
Next week: Automation Testing