Set based vs row by row (procedural code)

I was reading a blog about how to write procedural code today ( link below) and wanted to take a moment to explore another approach just to provide a more complete reference. To provide some context I am explaining the business case here.

A hospital wants to identify people whose heights have changed over time in order to detect if the person is who they say they are. Since the height of a person doesn’t change after a certain age the approach is used to detect identity theft. The requirement is that we compare the heights logged during different visit and flag any patients where they have different heights logged.

The code to create the dummy data is mentioned below and was taken from the link mentioned earlier.

SET NOCOUNT ON;

CREATE TABLE Dates (
    ID INT
    ,VisitDate DATETIME
    );

--populate table with 20 visit dates
DECLARE @i INT
    ,@startdate DATETIME;

SET @i = 1;
SET @startdate = GETDATE();

WHILE @i <= 20
BEGIN
    INSERT Dates (
        ID
        ,VisitDate
        )
    VALUES (
        @i
        ,@startdate
        );

    SET @startdate = DATEADD(dd, 7, @startdate);
    SET @i = @i + 1;
END

CREATE TABLE PatientHeight (
    PatientID INT NOT NULL
    ,Height INT
    );

-- populate table with 1000 patientids with heights between 59 and 74 inches
SET @i = 1;

WHILE @i <= 10000
BEGIN
    INSERT PatientHeight (
        PatientID
        ,Height
        )
    VALUES (
        @i
        ,@i % 16 + 59
        );

    SET @i = @i + 1;
END

ALTER TABLE PatientHeight ADD CONSTRAINT PK_PatientHeight PRIMARY KEY (PatientID);

-- cartesian join produces 200,000 PatientVisit records
SELECT ISNULL(PatientID, - 1) AS PatientID
    ,ISNULL(VisitDate, '19000101') AS VisitDate
    ,Height
INTO PatientVisit
FROM PatientHeight
CROSS JOIN Dates;

ALTER TABLE PatientVisit ADD CONSTRAINT PK_PatientVisit PRIMARY KEY (
    PatientID
    ,VisitDate
    );

-- create changes of height
SET @i = 3;

WHILE @i < 10000
BEGIN
    UPDATE pv
    SET Height = Height + 2
    FROM PatientVisit pv
    WHERE PatientID = @i
        AND pv.VisitDate = (
            SELECT TOP 1 VisitDate
            FROM Dates
            WHERE id = ABS(CHECKSUM(@i)) % 19
            );

    SET @i = @i + 7;
END
        /*
-- return AdventureWorks to its previous state when you are finished
-- with this example.

DROP TABLE Dates;
DROP TABLE PatientHeight;
DROP TABLE PatientVisit;
*/

The first approach uses a cursor to iterate row by row and identify if there has been a change in height. The second approach uses a ranking function to perform a self-join based on patient id to identify any change. Both these approaches are detailed below as well to save the reader time.

CUROSR BASED APPROACH

-- CURSOR BASED APPROACH
CREATE TABLE #Changes (
    PatientID INT
    ,VisitDate DATETIME
    ,BeginHeight SMALLINT
    ,CurrentHeight SMALLINT
    );

DECLARE @PatientID INT
    ,@CurrentID INT
    ,@BeginHeight SMALLINT
    ,@CurrentHeight SMALLINT
    ,@VisitDate DATETIME;

SET @PatientID = 0;

DECLARE Patient_cur CURSOR FAST_FORWARD
FOR
SELECT PatientID
    ,VisitDate
    ,Height
FROM PatientVisit
ORDER BY PatientID
    ,VisitDate;

OPEN Patient_cur;

FETCH NEXT
FROM Patient_cur
INTO @CurrentID
    ,@VisitDate
    ,@CurrentHeight;

WHILE @@FETCH_STATUS = 0
BEGIN
    -- first record for this patient
    IF @PatientID <> @CurrentID
    BEGIN
        SET @PatientID = @CurrentID;
        SET @BeginHeight = @CurrentHeight;
    END

    IF @BeginHeight <> @CurrentHeight
    BEGIN
        INSERT #Changes (
            PatientID
            ,VisitDate
            ,BeginHeight
            ,CurrentHeight
            )
        VALUES (
            @PatientID
            ,@VisitDate
            ,@BeginHeight
            ,@CurrentHeight
            );

        SET @BeginHeight = @CurrentHeight;
    END

    FETCH NEXT
    FROM Patient_cur
    INTO @CurrentID
        ,@VisitDate
        ,@CurrentHeight;
END

CLOSE Patient_cur;

DEALLOCATE Patient_cur;

SELECT *
FROM #Changes

DROP TABLE #Changes
GO

SET BASED APPROACH

-- RANKING FUNCTION BASED APPROACH
WITH PV_RN
AS (
    SELECT ROW_NUMBER() OVER (
            ORDER BY PatientID
                ,VisitDate
            ) AS ROWID
        ,*
    FROM PatientVisit
    )
SELECT t1.PatientID
    ,t2.VisitDate AS DateChanged
    ,t1.Height AS HeightChangedFrom
    ,t2.Height AS HeightChangedTo
FROM PV_RN t1
JOIN PV_RN t2 ON t2.ROWID = t1.ROWID + 1
WHERE t1.patientid = t2.patientid
    AND t1.Height <> t2.Height
ORDER BY t1.PatientID
    ,t2.VisitDate;

There is however a third option which I believe would be much simpler and is also much faster has been listed below. Here we use the math operator STDEV which calculates a standard deviation for the heights that were captured. When the height hasn’t changed STDEV will return a value of zero else a non-zero number which can be used to identify patient’s with different heights.

THIRD APPROACH

-- MATH OPERATOR BASED APPROACH 
-- HAS THE ADVANTAGE OF PROVIDING A MARGIN FOR ERROR FOR MEASUREMENTS AS WELL 
SELECT PatientID
    ,MIN(Height)
    ,Max(Height)
FROM PatientVisit
WHERE PatientID IN (
        SELECT  patientid FROM  PatientVisit GROUP BY PatientID
        HAVING STDEV(height) > 0
        )
GROUP BY PatientID

Hope this approach helps clarify what we are doing with set based approach since recently I have noticed that ranking functions and CTEs seem to be the go to approach for most developers when they think set based logic.

Reference

http://sqlmag.com/t-sql/programming-sql-set-based-way