[GH-ISSUE #86] SQL Server: Performance: Indices for work tables #49

New issue

Closed

opened 2026-03-23 20:28:43 +00:00 by mirror · 1 comment

mirror commented

2026-03-23 20:28:43 +00:00

Owner

Originally created by @Ben-Goethuys on GitHub (Jul 2, 2024).
Original GitHub issue: https://github.com/RADar-AZDelta/Rabbit-in-a-Blender/issues/86

The "work"-zone tables corresponding to the main omop tables like for example [work].[vocabulary] don't have indices.
Other "work"-zone tables do have indices.

Could indices on these tables help with the SQL performance?

Side note: Azure SQL could benefit from the compression of columnstore indices in batch loads

Originally created by @Ben-Goethuys on GitHub (Jul 2, 2024). Original GitHub issue: https://github.com/RADar-AZDelta/Rabbit-in-a-Blender/issues/86 The "work"-zone tables corresponding to the main omop tables like for example [work].[vocabulary] don't have indices. Other "work"-zone tables do have indices. Could indices on these tables help with the SQL performance? Side note: Azure SQL could benefit from the compression of columnstore indices in batch loads

mirror closed this issue

2026-03-23 20:28:43 +00:00

mirror commented

2026-03-23 20:28:44 +00:00

Author

Owner

@pjlammertyn commented on GitHub (Jul 2, 2024):

Ben, all the tables that have event columns, are first stored in the WORK zone.
After all the other tables are done, then the second stage of the ETL, will fill in the event colums (based on the _swap tables in the work zone), while copying the date to the OMOP zone.
This will always be a full table scan.
So adding indexes won't improve performance.

A performance improvement can be, to split up the records that have a event record filled in. So the event column is not filled in, send that record directly to the OMOP zone, other wise send it to the WORK zone. This will minimize the copy around of data, but will break the atomic nature of the ETL. So there are trade offs.

@pjlammertyn commented on GitHub (Jul 2, 2024): Ben, all the tables that have event columns, are first stored in the WORK zone. After all the other tables are done, then the second stage of the ETL, will fill in the event colums (based on the _swap tables in the work zone), while copying the date to the OMOP zone. This will always be a full table scan. So adding indexes won't improve performance. A performance improvement can be, to split up the records that have a event record filled in. So the event column is not filled in, send that record directly to the OMOP zone, other wise send it to the WORK zone. This will minimize the copy around of data, but will break the atomic nature of the ETL. So there are trade offs.