When developing DataWareHouses you often have to check if a new/optimized ETL has same results as the old version. So you have to check, if two database tables contain the same rows.
For two tables at the same database, or databases that are connected, this can be done by the following SQL statements.
1. Does the old table contains rows, that the new table doesn’t:
SELECT * FROM OLD_TABLE EXCEPT SELECT * FROM NEW_TABLE
when this statement returns an empty resultset, all old rows exist in the new table.
2. Does the new table contains rows, that the old table did not:
SELECT * FROM NEW_TABLE EXCEPT SELECT * FROM OLD_TABLE
when this statement returns an empty resultset, all new rows existed in the old table.
Do both tables contain keys, this is all. IF there are/can be duplicate rows, following statements are needed, cause EXCEPT works on distinct rows:
3. Do both tables contain the same count of rows:
SELECT count(*) FROM OLD_TABLE EXCEPT SELECT count(*) FROM NEW_TABLE
when this statement returns an empty resultset, the rowcount in both tables is equal.
4. Do both tables contain the same duplicate rows :
The following situations would lead to differences. There are duplicate rows in both tables, but not the same rows are duplicated in every table:
This can be checked by the following statement
SELECT COL_A, COL_B, count(*) FROM OLD_TABLE GROUP BY COL_A, COL_B EXCEPT SELECT COL_A, COL_B, count(*) FROM NEW_TABLE GROUP BY COL_A, COL_B
You don’t need to execute this statement in both directions, cause you eliminated the possibility of different rows already in step 1 & 2.
Hint: When adding count(*) and GROUP BY in steps 1 & 2, step 4 can be skipped.