Checking two tables for equality

When developing DataWareHouses you often have to check if a new/optimized ETL has same results as the old version. So you have to check, if two database tables contain the same rows.

For two tables at the same database, or databases that are connected, this can be done by the following SQL statements.

1. Does the old table contains rows, that the new table doesn’t:

SELECT * FROM OLD_TABLE
EXCEPT
SELECT * FROM NEW_TABLE

when this statement returns an empty resultset, all old rows exist in the new table.

2. Does the new table contains rows, that the old table did not:

SELECT * FROM NEW_TABLE
EXCEPT
SELECT * FROM OLD_TABLE

when this statement returns an empty resultset, all new rows existed in the old table.

Do both tables contain keys, this is all. IF there are/can be duplicate rows, following statements are needed, cause EXCEPT works on distinct rows:

3. Do both tables contain the same count of rows:

SELECT count(*) FROM OLD_TABLE
EXCEPT
SELECT count(*) FROM NEW_TABLE

when this statement returns an empty resultset, the rowcount in both tables is equal.

4. Do both tables contain the same duplicate rows :

The following situations would lead to differences. There are duplicate rows in both tables, but not the same rows are duplicated in every table:

OLD_TABLE: NEW_TABLE:
COL_A COL_B
VALUE 1 VALUE 2
VALUE 1 VALUE 2
VALUE 3 VALUE 4
COL_A COL_B
VALUE 1 VALUE 2
VALUE 3 VALUE 4
VALUE 3 VALUE 4

This can be checked by the following statement

SELECT COL_A, COL_B, count(*) FROM OLD_TABLE GROUP BY COL_A, COL_B
EXCEPT
SELECT COL_A, COL_B, count(*) FROM NEW_TABLE GROUP BY COL_A, COL_B

You don’t need to execute this statement in both directions, cause you eliminated the possibility of different rows already in step 1 & 2.

Hint: When adding count(*) and GROUP BY in steps 1 & 2, step 4 can be skipped.

Tagged , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *