mirror of
https://github.com/RADar-AZDelta/Rabbit-in-a-Blender.git
synced 2026-03-23 20:37:03 +00:00
[GH-ISSUE #86] SQL Server: Performance: Indices for work tables #49
Labels
No labels
bug
cannot reproduce
duplicate
enhancement
experiment
invalid
pull-request
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
RADar-AZDelta/archived-Rabbit-in-a-Blender#49
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Ben-Goethuys on GitHub (Jul 2, 2024).
Original GitHub issue: https://github.com/RADar-AZDelta/Rabbit-in-a-Blender/issues/86
The "work"-zone tables corresponding to the main omop tables like for example [work].[vocabulary] don't have indices.
Other "work"-zone tables do have indices.
Could indices on these tables help with the SQL performance?
Side note: Azure SQL could benefit from the compression of columnstore indices in batch loads
@pjlammertyn commented on GitHub (Jul 2, 2024):
Ben, all the tables that have event columns, are first stored in the WORK zone.
After all the other tables are done, then the second stage of the ETL, will fill in the event colums (based on the _swap tables in the work zone), while copying the date to the OMOP zone.
This will always be a full table scan.
So adding indexes won't improve performance.
A performance improvement can be, to split up the records that have a event record filled in. So the event column is not filled in, send that record directly to the OMOP zone, other wise send it to the WORK zone. This will minimize the copy around of data, but will break the atomic nature of the ETL. So there are trade offs.