Worker migration in SQL 2019 is the process by which workers that are bound to a scheduler can be unbound and run on another scheduler in order to complete long running transactions sooner.
Once again I find the example of a car wash very handy here.
Assume you have a query (car) and a team of cleaners (threads) assigned to a team leader (worker) given a task of cleaning a car (task) in your shop -Shop A (scheduler). Now assume you have a competitor shop – Shop B down the road as well. Typically a car would enter a shop based on the load factor (the queue of cars waiting to be cleaned). Ideally the shop with the lesser queue gets the next car. However every once in a while one shop gets a car and the other gets a truck. Naturally the truck is going to take longer to clean but once a vehicle has been assigned to a shop it almost never switches in the middle.
In other words, workers for query execution are assigned from schedulers with lower load factor. The drawback of this approach is; it only works when all queries are similar in size i.e. there are only cars. If one shop has a truck and the other shop finishes cleaning its car it is still unable to assist. Therefore CPU cycles are wasted. With worker migration what is achieved is something similar to the below table:-
Shop A is busy cleaning the Truck, Shop B is done cleaning its car.
Scheduler 1 is busy running a task, Scheduler 2 is done and doesn’t have any runnable tasks in queue
Team leader B from Shop B looks into Shop A and decides his team can help clean the carpets while Team A is busy cleaning the exterior.
Scheduler 2 picks the next runnable items from Scheduler 1s queue and Steals/ migrates the worker from Scheduler 1 ( which would otherwise have wasted time waiting)
So how is this different from context switching?
A context switch could be viewed as moving the truck repeatedly between SHOP A and SHOP B. This is obviously much more resource intensive and often requires a ton of coordination. With worker migration the car still remains in SHOP A and only a small task that can be run independently within the same NUMA node is what gets offloaded. Like removing the carpets to SHOP B and cleaning them there or having the customer pay in SHOP B while the work is still being performed in SHOP A.
Bottom line if a CPU is idle its processing cycles are not wasted with worker migration.