European Windows 2019 Hosting BLOG

BLOG about Windows 2019 Hosting and SQL 2019 Hosting - Dedicated to European Windows Hosting Customer

European SQL Server 2022 Hosting :: Create FTP Task in SQL Server Integration Services

clock December 20, 2024 06:44 by author Peter

FTP tasks in SSIS (SQL Server Integration Services) are an important feature that is used to perform file operations across the server and client/local machine. FTP task in SSIS is used to automate the process of uploading and downloading files to/from the FTP server. This task ensures the secure transfer of files, automates data workflow, and improves its efficiency. FTP Task, when combined with other SSIS tasks, can integrate data from multiple sources, such as File System Task, Script Task, Execute SQL Task, etc.

Now, let’s see how to create an FTP Task.

FTP is basically a File Transfer Protocol; it’s basically transferring files from one server to another server or to a local machine.

In the SSIS package within the control flow, we can see the SSIS Toolbox, and there are numerous options. From there we can select FTP Task by drag and drop it on package.

Below is the screenshot, which shows what it looks like.

In the above screenshot, we can see a red sign popped up; it means the connection is empty.

Note: I have used a personal FTP server and hence you may not be able to access it, you can however check a few public FTP servers available online.

Now, let’s configure the FTP Task and establish a connection. We will go on the FTP task editor and do the following.

Add FTP Server to establish FTP connection: Enter the below credential to establish an FTP connection. (FTP server path: ftp://Peter:[email protected].
x.x)
Server Name: 192.168.x.x
User ID: Peter
PWD: ****

The connection has been established.

Now, enter the following details in file transfer and hit ok; the red sign will go away.

  • Add Operation as Send files
  • Add local path: C:\Users\Peter\Documents\FTP_Local
  • Add RemotePath: /

After hitting ok, you will see on the screen below that the red sign is gone. This means that both connections are established, and we can see this in Connection Manager.

The screenshot below shows that it was taken before executing the package, and you can see no files in the FTP Server are available currently.

Once step 2 is done, Execute the FTP Task by hitting the start button. The screenshot below shows packages executed successfully. Here, our objective is to see if the file moved from the source (local path) to the destination (FTP Server).

Now, I will check into the FTP server to see if the file moved to the destination or FTP Server. As you can see in the screenshot below, the file has indeed been moved to the FTP server.

Summary
There are several benefits of creating FTP Tasks in SSIS, for example.

  • Automates the data transfer and reduces the time-consuming manual efforts.
  • It helps integrate data from various sources by creating a workflow that automatically handles file transfers.
  • Enables creation of FTP tasks for regular data backups and enhances data transfer security.

HostForLIFEASP.NET SQL Server 2022 Hosting

 



European SQL Server 2022 Hosting :: Using Tablock to Unlock Faster INSERT Operations in SQL Server

clock December 18, 2024 07:17 by author Peter

Performance is frequently the main issue in any SQL Server system when working with big datasets. The INSERT operation is one frequent operation that occasionally turns into a bottleneck. The time required to input data increases with its size, which can have a major effect on system performance and user experience in general. The usage of the TABLOCK hint is one of the many methods and improvements that SQL Server offers to help speed up data insertions. When working with huge datasets or when parallelism is crucial, this straightforward yet effective method can significantly increase the pace of your INSERT operations.

Let’s explore how using the TABLOCK hint works, its benefits, and how you can apply it to your own operations, with examples from the HFL database.

What is the TABLOCK Hint?
The TABLOCK hint is a table-level lock hint that forces SQL Server to take a schema modification (Sch-M) lock on the target table when performing an INSERT, UPDATE, or DELETE operation. This hint ensures that the table is locked for the duration of the operation, which can help speed up data loading by minimizing logging and reducing contention.

A key benefit of the TABLOCK hint is that it reduces the amount of log space used during the operation, as the minimal logging mechanism is activated. This means that SQL Server does not have to log each individual row insertion, but rather just the metadata for the bulk operation. As a result, this significantly reduces the overhead and speeds up data loading.

Additionally, because the table is locked at the schema level, it allows SQL Server to parallelize the operation, leading to faster execution times. This is particularly useful for large-scale data-loading tasks.

When to Use TABLOCK Hint
The TABLOCK hint is ideal for scenarios where:

  • You are inserting a large number of rows into a table.
  • You can afford to lock the table for the duration of the operation (i.e., no other transactions need access to the table while the insert is in progress).
  • You want to reduce the logging overhead and speed up bulk insertions.
  • You want to use parallel insertions to take advantage of SQL Server's ability to use multiple threads for data loading.


It’s also important to note that the TABLOCK hint works well with temporary tables, so you can take advantage of these performance benefits when working with temp tables, often used in ETL processes or batch operations.

Benefits of Using TABLOCK

  • Improved Performance: The primary benefit of using the TABLOCK hint is the performance improvement during large INSERT operations. By reducing the amount of logging, SQL Server can insert rows much faster.
  • Parallel Insertion: With TABLOCK, SQL Server can use parallelism to load the data, speeding up the operation of systems with sufficient resources.
  • Reduced Logging Overhead: Since SQL Server logs less information, the system consumes less log space, which can be crucial when working with large datasets.
  • Works with Temp Tables: You can apply TABLOCK to temporary tables as well, giving you the same performance benefits for in-memory operations.


Example
Let’s consider a scenario where we need to insert a large number of rows from the Sales.SalesOrderDetail table into the Sales.SalesOrderDetailTemp table in the HFL database.
Create table script for Sales.SalesOrderDetailTem
USE [HFL]
GO

DROP TABLE IF EXISTS [Sales].[SalesOrderDetailTemp]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [Sales].[SalesOrderDetailTemp](
    [SalesOrderID] [int] NOT NULL,
    [SalesOrderDetailID] [int] NOT NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [OrderQty] [smallint] NOT NULL,
    [ProductID] [int] NOT NULL,
    [SpecialOfferID] [int] NOT NULL,
    [UnitPrice] [money] NOT NULL,
    [UnitPriceDiscount] [money] NOT NULL,
    [LineTotal]  [money] NULL,
    [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
    [ModifiedDate] [datetime] NOT NULL,
 CONSTRAINT [PK_SalesOrderDetailTemp_SalesOrderID_SalesOrderDetailTempID] PRIMARY KEY CLUSTERED
(
    [SalesOrderID] ASC,
    [SalesOrderDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO


Without the TABLOCK hint, this operation may take a considerable amount of time, especially when the table is large and the database is under load.

Here’s a basic example of how you can speed up the INSERT operation by using the TABLOCK hint.
USE HFL
GO

SET STATISTICS TIME, IO ON

SET NOCOUNT ON

INSERT INTO Sales.SalesOrderDetailTemp
SELECT *
FROM Sales.SalesOrderDetail;

Truncate the table.
USE HFL
GO

TRUNCATE TABLE Sales.SalesOrderDetailTemp


Now, let’s modify the query to use the TABLOCK hint.
USE HFL
GO

SET STATISTICS TIME, IO ON

SET NOCOUNT ON

INSERT INTO Sales.SalesOrderDetailTemp
WITH (TABLOCK)
SELECT *
FROM Sales.SalesOrderDetail;

Comparison
Execution 1 (without TABLOCK) took longer, with higher CPU and elapsed time (204 ms and 284 ms), indicating a slower operation. Execution 2 (with TABLOCK) performed better, completing in 125 ms CPU time and 157 ms elapsed time, making the TABLOCK version more efficient in this case.
Considerations When Using TABLOCK

While the TABLOCK hint can greatly improve performance, it’s important to be aware of some considerations:

  • Table Locking: The TABLOCK hint locks the entire table for the duration of the operation. This means that other transactions cannot access the table until the INSERT operation is complete, so be sure that this behavior aligns with your application’s requirements.
  • Transaction Log Growth: Although TABLOCK reduces the amount of logging, it still logs certain details of the operation. If you’re inserting a massive amount of data, you may need to monitor transaction log growth and ensure that you have enough log space available.
  • Not Suitable for OLTP Workloads: The TABLOCK hint is more suited to batch operations or bulk-loading scenarios. It may not be appropriate for transactional systems that require frequent concurrent access to the table.

Conclusion
If you are working with large datasets and want to speed up your INSERT operations in SQL Server, the TABLOCK hint can be a game-changer. By reducing logging overhead and enabling parallel insertions, it helps improve performance and can significantly reduce the time it takes to load data.

HostForLIFEASP.NET SQL Server 2022 Hosting

 



European SQL Server 2022 Hosting :: Understanding Triggers in SQL Server

clock December 10, 2024 06:17 by author Peter

Triggers in SQL Server are special types of stored procedures that automatically execute or “fire” when certain events occur in the database.

Types of Triggers
DML Triggers

These are triggers that fire in response to data manipulation events such as INSERT, UPDATE, or DELETE operations on a table or view.

  • AFTER Triggers: Execute after the associated DML operation is completed.
  • INSTEAD OF Triggers: Execute in place of the associated DML operation.

DDL Triggers
These fire in response to data definition language (DDL) events such as CREATE, ALTER, or DROP statements.

Real-Time Use Case Example
Scenario. Imagine a retail company that wants to maintain an audit trail for all changes made to the Orders table. Every time an order is inserted, updated, or deleted, the company wants to log the details of the operation in an OrderAudit table.

Setting Up the Example

First, let’s create the Orders table and the OrderAudit table.
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATETIME,
    Amount DECIMAL(10, 2)
);

CREATE TABLE OrderAudit (
    AuditID INT IDENTITY(1,1) PRIMARY KEY,
    OrderID INT,
    Operation VARCHAR(10),
    OldCustomerID INT,
    NewCustomerID INT,
    OldOrderDate DATETIME,
    NewOrderDate DATETIME,
    OldAmount DECIMAL(10, 2),
    NewAmount DECIMAL(10, 2),
    ChangeDate DATETIME DEFAULT GETDATE()
);

Creating Triggers
AFTER INSERT Trigger
CREATE TRIGGER trg_AfterInsert_Orders
ON Orders
AFTER INSERT
AS
BEGIN
    INSERT INTO OrderAudit (OrderID, Operation, NewCustomerID, NewOrderDate, NewAmount)
    SELECT OrderID, 'INSERT', CustomerID, OrderDate, Amount
    FROM inserted;
END;

-- Insert a new order
INSERT INTO Orders (OrderID, CustomerID, OrderDate, Amount)
VALUES (1, 101, '2024-01-01', 100.00);

-- Check audit logs
select * from  OrderAudit


Result
We can see we have a record inserted into the audit log table, and when any record is inserted into the order table, we will know what operation was performed and at what time it was executed.

AFTER INSERT Trigger: When a new order is inserted into the Orders table, the trg_AfterInsert_Orders trigger fires. It captures the new order details from the inserted pseudo-table and inserts a corresponding record into the OrderAudit table with the operation type 'INSERT'.

2. AFTER UPDATE Trigger

CREATE TRIGGER trg_AfterUpdate_Orders
ON Orders
AFTER UPDATE
AS
BEGIN
    INSERT INTO OrderAudit (OrderID, Operation, OldCustomerID, NewCustomerID, OldOrderDate, NewOrderDate, OldAmount, NewAmount)
    SELECT
        i.OrderID,
        'UPDATE',
        d.CustomerID,
        i.CustomerID,
        d.OrderDate,
        i.OrderDate,
        d.Amount,
        i.Amount
    FROM inserted i
    INNER JOIN deleted d ON i.OrderID = d.OrderID;
END;

-- Update the order
UPDATE Orders
SET Amount = 120.00
WHERE OrderID = 1;

-- Audit log
SELECT * FROM OrderAudit;

Result

we can see at this time, we updated the record, and we can see details of updated records, this is very helpful in case when we want to track how many time a order has been updated etc.

AFTER UPDATE Trigger: When an existing order is updated, the trg_AfterUpdate_Orders trigger fires. It captures both the old and new values by joining the inserted and deleted pseudo-tables and inserts a record into the OrderAudit table with the operation type 'UPDATE'.

3. AFTER DELETE Trigger
CREATE TRIGGER trg_AfterDelete_Orders
ON Orders
AFTER DELETE
AS
BEGIN
    INSERT INTO OrderAudit (OrderID, Operation, OldCustomerID, OldOrderDate, OldAmount)
    SELECT
        OrderID,
        'DELETE',
        CustomerID,
        OrderDate,
        Amount
    FROM deleted;
END;

-- Delete the order
DELETE FROM Orders
WHERE OrderID = 1;

-- Reading audit table
SELECT * FROM OrderAudit;


Result. We can see that order data has been deleted, and we can check logs to see what time it was deleted.

AFTER DELETE Trigger: When an order is deleted, the trg_AfterDelete_Orders trigger fires. It captures the details of the deleted order from the deleted pseudo-table and inserts a corresponding record into the OrderAudit table with the operation type 'DELETE'.

Dropping a Trigger

Just use DROP triggername.

Instead of Trigger
Let’s consider a table of Employees where we want to restrict the insertion of records such that the Salary must be above a certain threshold (e.g., $30,000). If the Salary is below this threshold, the insertion should be rejected.

We can use it instead of Trigger in this case.
CREATE TABLE Employees (
    EmployeeID INT IDENTITY(1,1) PRIMARY KEY,
    Name VARCHAR(100),
    Position VARCHAR(50),
    Salary DECIMAL(10, 2)
);


CREATE TRIGGER trg_InsteadOfInsert_Employees
ON Employees
INSTEAD OF INSERT
AS
BEGIN
    IF EXISTS (SELECT * FROM inserted WHERE Salary < 30000)
    BEGIN
        -- Raise an error and roll back the transaction if the salary is below the threshold
        RAISERROR ('Salary must be at least $30,000', 16, 1);
        ROLLBACK TRANSACTION;
    END
    ELSE
    BEGIN
        -- Perform the actual insert if the condition is met
        INSERT INTO Employees (Name, Position, Salary)
        SELECT Name, Position, Salary
        FROM inserted;
    END
END;


--

--less than 90000
-- This insertion should  fail
INSERT INTO Employees (Name, Position, Salary)
VALUES ('peter scott', 'Developer', 89999);

Explanation

  • The trigger intercepts the insert operation and checks if any of the inserted rows have a Salary below $90,000.
  • If any rows do not meet the condition, an error is raised, and the transaction is rolled back, preventing the insertion.
  • If all rows meet the condition, the insertion proceeds by inserting the rows into the Employees table.

Thanks. We have learned triggers.

HostForLIFEASP.NET SQL Server 2022 Hosting

 



European SQL Server 2022 Hosting :: What is a CTE, and How Do You Write a CTE in SQL Server?

clock December 2, 2024 07:07 by author Peter

We will learn what CTE is, how to construct it in SQL Server, and its applications in this article.

CTE: What is it?

"Common Table Expressions" is what CTE stands for. SQL servers utilize it to specify a temporary name result. The common table expression (CTE) is a temporary named result set that can be referenced within a Select, Insert, Update, or Delete operation. It was first introduced in SQL Server 2005. Another crucial point is that, like any other table, CTE results can always be accessed within the query even when they are not saved anywhere.

Why do we need CTE on the SQL server?
CTE is a powerful SQL feature that helps simplify queries. CTEs act as virtual tables (with records and columns) that are created during query execution, used by the query, and deleted after the query executes.

Syntax
[ WITH <common_table_expression> [ ,...n ] ]

<common_table_expression>::=
    expression_name [ ( column_name [ ,...n ] ) ]
    AS
    ( CTE_query_definition )


Let’s try to understand CTE Syntax.

Arguments

There are three arguments in the above syntax.

  • expression_name.
  • column_name.
  • query_derfination.

Let’s understand one by one.

  • expression_name: It should be a valid identifier for a common table expression. The expression_name must be different from the names of other common table expressions defined in the same WITH clause.
  • column_name: Specifies a column name in the common table expression. Duplicate names are not allowed within a single CTE definition. There are two rules for column names. The first is the number of column names specified, which must match the number of columns in the result set of the query_definition, and the second one is The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.
  • query_defination: Specifies a SELECT statement whose result set satisfies a common table expression.

Some Examples
Before starting to explain some CTE examples, we need to create a database and some tables under the database, so run the following script in your SQL server.
USE [master]
GO

/****** Object:  Database [db_MyPractice]    Script Date: 06-10-2024 01:18:01 ******/
CREATE DATABASE [db_MyPractice]
    CONTAINMENT = NONE
    ON PRIMARY
    (
        NAME = N'db_MyPractice',
        FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.SQLEXPRESS\MSSQL\DATA\db_MyPractice.mdf',
        SIZE = 8192KB,
        MAXSIZE = UNLIMITED,
        FILEGROWTH = 65536KB
    )
    LOG ON
    (
        NAME = N'db_MyPractice_log',
        FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.SQLEXPRESS\MSSQL\DATA\db_MyPractice_log.ldf',
        SIZE = 8192KB,
        MAXSIZE = 2048GB,
        FILEGROWTH = 65536KB
    )
    WITH CATALOG_COLLATION = DATABASE_DEFAULT
GO

ALTER DATABASE [db_MyPractice] SET COMPATIBILITY_LEVEL = 150
GO

IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
BEGIN
    EXEC [db_MyPractice].[dbo].[sp_fulltext_database] @action = 'enable'
END
GO


ALTER DATABASE [db_MyPractice] SET ANSI_NULL_DEFAULT OFF
GO
ALTER DATABASE [db_MyPractice] SET ANSI_NULLS OFF
GO
ALTER DATABASE [db_MyPractice] SET ANSI_PADDING OFF
GO
ALTER DATABASE [db_MyPractice] SET ANSI_WARNINGS OFF
GO
ALTER DATABASE [db_MyPractice] SET ARITHABORT OFF
GO
ALTER DATABASE [db_MyPractice] SET AUTO_CLOSE OFF
GO
ALTER DATABASE [db_MyPractice] SET AUTO_SHRINK OFF
GO
ALTER DATABASE [db_MyPractice] SET AUTO_UPDATE_STATISTICS ON
GO
ALTER DATABASE [db_MyPractice] SET CURSOR_CLOSE_ON_COMMIT OFF
GO
ALTER DATABASE [db_MyPractice] SET CURSOR_DEFAULT GLOBAL
GO
ALTER DATABASE [db_MyPractice] SET CONCAT_NULL_YIELDS_NULL OFF
GO
ALTER DATABASE [db_MyPractice] SET NUMERIC_ROUNDABORT OFF
GO
ALTER DATABASE [db_MyPractice] SET QUOTED_IDENTIFIER OFF
GO
ALTER DATABASE [db_MyPractice] SET RECURSIVE_TRIGGERS OFF
GO
ALTER DATABASE [db_MyPractice] SET DISABLE_BROKER
GO
ALTER DATABASE [db_MyPractice] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
GO
ALTER DATABASE [db_MyPractice] SET DATE_CORRELATION_OPTIMIZATION OFF
GO
ALTER DATABASE [db_MyPractice] SET TRUSTWORTHY OFF
GO
ALTER DATABASE [db_MyPractice] SET ALLOW_SNAPSHOT_ISOLATION OFF
GO
ALTER DATABASE [db_MyPractice] SET PARAMETERIZATION SIMPLE
GO
ALTER DATABASE [db_MyPractice] SET READ_COMMITTED_SNAPSHOT OFF
GO
ALTER DATABASE [db_MyPractice] SET HONOR_BROKER_PRIORITY OFF
GO
ALTER DATABASE [db_MyPractice] SET RECOVERY SIMPLE
GO
ALTER DATABASE [db_MyPractice] SET MULTI_USER
GO
ALTER DATABASE [db_MyPractice] SET PAGE_VERIFY CHECKSUM
GO
ALTER DATABASE [db_MyPractice] SET DB_CHAINING OFF
GO
ALTER DATABASE [db_MyPractice] SET FILESTREAM (NON_TRANSACTED_ACCESS = OFF)
GO
ALTER DATABASE [db_MyPractice] SET TARGET_RECOVERY_TIME = 60 SECONDS
GO
ALTER DATABASE [db_MyPractice] SET DELAYED_DURABILITY = DISABLED
GO
ALTER DATABASE [db_MyPractice] SET ACCELERATED_DATABASE_RECOVERY = OFF
GO
ALTER DATABASE [db_MyPractice] SET QUERY_STORE = OFF
GO

USE [db_MyPractice]
GO

/****** Object:  Table [dbo].[Employees]    Script Date: 06-10-2024 01:18:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Employees] (
    [EmployeeID] [smallint] NOT NULL,
      NOT NULL,
      NOT NULL,
      NOT NULL,
    [DeptID] [smallint] NOT NULL,
    [ManagerID] [smallint] NULL,
    CONSTRAINT [PK_EmployeeID] PRIMARY KEY CLUSTERED
    (
        [EmployeeID] ASC
    ) WITH (
        PAD_INDEX = OFF,
        STATISTICS_NORECOMPUTE = OFF,
        IGNORE_DUP_KEY = OFF,
        ALLOW_ROW_LOCKS = ON,
        ALLOW_PAGE_LOCKS = ON,
        OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF
    ) ON [PRIMARY]
) ON [PRIMARY]
GO

/****** Object:  Table [dbo].[tbl_book]    Script Date: 06-10-2024 01:18:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tbl_book] (
    [BookId] [bigint] IDENTITY(1,1) NOT NULL,
      NOT NULL,
      NOT NULL
) ON [PRIMARY]
GO

/****** Object:  Table [dbo].[Tbl_Bookissued]    Script Date: 06-10-2024 01:18:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Tbl_Bookissued] (
    [transaction_id] [bigint] IDENTITY(1,1) NOT NULL,
    [StudentID] [bigint] NULL,
    [TeacherID] [bigint] NULL,
    [BookID] [bigint] NOT NULL,
    [issueDate] [date] NOT NULL
) ON [PRIMARY]
GO

/****** Object:  Table [dbo].[tbl_student]    Script Date: 06-10-2024 01:18:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tbl_student] (
    [StudentID] [bigint] IDENTITY(1,1) NOT NULL,
      NOT NULL
) ON [PRIMARY]
GO

/****** Object:  Table [dbo].[tbl_teacher]    Script Date: 06-10-2024 01:18:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tbl_teacher] (
    [TeacherID] [bigint] IDENTITY(1,1) NOT NULL,
      NOT NULL
) ON [PRIMARY]
GO

-- Insert statements
INSERT [dbo].[Employees] ([EmployeeID], [FirstName], [LastName], [Title], [DeptID], [ManagerID]) VALUES
(1, N'Ken', N'Sánchez', N'Chief Executive Officer', 16, NULL),
(16, N'David', N'Bradley', N'Marketing Manager', 4, 273),
(23, N'Mary', N'Gibson', N'Marketing Specialist', 4, 16),
(273, N'Brian', N'Welcker', N'Vice President of Sales', 3, 1),
(274, N'Stephen', N'Jiang', N'North American Sales Manager', 3, 273),
(275, N'Michael', N'Blythe', N'Sales Representative', 3, 274),
(276, N'Linda', N'Mitchell', N'Sales Representative', 3, 274),
(285, N'Syed', N'Abbas', N'Pacific Sales Manager', 3, 273),
(286, N'Lynn', N'Tsoflias', N'Sales Representative', 3, 285)
GO

SET IDENTITY_INSERT [dbo].[tbl_book] ON
GO
INSERT [dbo].[tbl_book] ([BookId], [bookName], [className]) VALUES
(1, N'Maths', N'Class-1'),
(2, N'Maths', N'Class-2'),
(3, N'Maths', N'Class-3'),
(4, N'Maths', N'Class-4'),
(5, N'Maths', N'Class-5'),
(6, N'Maths', N'Class-6'),
(7, N'Hindi', N'Class-1'),
(8, N'Hindi', N'Class-2'),
(9, N'Hindi', N'Class-3'),
(10, N'Hindi', N'Class-4'),
(11, N'Hindi', N'Class-5'),
(12, N'Hindi', N'Class-6'),
(13, N'English', N'Class-1'),
(14, N'English', N'Class-2'),
(15, N'English', N'Class-3'),
(16, N'English', N'Class-4'),
(17, N'English', N'Class-5'),
(18, N'English', N'Class-6')
GO
SET IDENTITY_INSERT [dbo].[tbl_book] OFF


After running this script, you will have the following table. Following tables with data.

A. Create a common table expression: The following example shows the total number of books issued per year for each student during their all-academic years.
-- Define the CTE expression name and column list.
WITH book_Issue_CTE (StudentID, StudentName, BookID, AcademicYear)
AS
(
    -- Define the CTE query.
    SELECT
        BI.StudentID,
        StudentName,
        BookID,
        YEAR(issueDate) AS AcademicYear
    FROM
        Tbl_Bookissued BI
    JOIN
        tbl_student s ON BI.StudentID = s.StudentID
    WHERE
        BI.StudentID IS NOT NULL
)
-- Define the outer query referencing the CTE name.
SELECT
    StudentID,
    StudentName,
    COUNT(BookID) AS TotalBookIssued,
    AcademicYear
FROM
    book_Issue_CTE
GROUP BY
    AcademicYear, StudentID, StudentName
ORDER BY
    StudentID, AcademicYear;

B. Use a common table expression to limit counts and report averages: The following example shows the average number of books issued for all years for the students.
WITH Avg_book_Issue_CTE (StudentID, NumberOfBooks) AS
(
    SELECT StudentID, COUNT(*)
    FROM Tbl_Bookissued
    WHERE StudentID IS NOT NULL
    GROUP BY StudentID
)
SELECT AVG(NumberOfBooks) AS "Average book issued Per Student"
FROM Avg_book_Issue_CTE;

C. Use multiple CTE definitions in a single query: The following example shows how to define more than one CTE in a single query. Notice that a comma is used to separate the CTE query definitions.

In the example below, we are fetching records of books issued to teachers and students.
WITH book_issued_student_CTE (StudentID, BookIssued, IssueYear) AS
-- Define the first CTE query.
(
    SELECT StudentID, COUNT(BookID) AS BookIssued, YEAR(issueDate) AS IssueYear
    FROM Tbl_Bookissued
    WHERE StudentID IS NOT NULL
    GROUP BY StudentID, YEAR(issueDate)
),

-- Define the second CTE query, which returns sales quota data by year for each sales person.
book_issued_Teacher_CTE (TeacherID, BookIssued, IssueYear) AS
(
    SELECT TeacherID, COUNT(BookID) AS BookIssued, YEAR(issueDate) AS IssueYear
    FROM Tbl_Bookissued
    WHERE TeacherID IS NOT NULL
    GROUP BY TeacherID, YEAR(issueDate)
)

-- Define the outer query by referencing columns from both CTEs.
SELECT CTE.StudentID,
       StudentName,
       NULL AS TeacherID,
       '' AS TeacherName,
       BookIssued,
       IssueYear
FROM book_issued_student_CTE CTE
JOIN tbl_student s ON CTE.StudentID = s.StudentID

UNION ALL

SELECT '',
       '',
       CTE.TeacherID,
       TeacherName,
       BookIssued,
       IssueYear
FROM book_issued_Teacher_CTE CTE
JOIN tbl_teacher t ON CTE.TeacherID = t.TeacherID

ORDER BY CTE.StudentID, TeacherID;


D. Use a recursive common table expression to display multiple levels of recursion: The following example shows managers and the employees reporting to them. The number of levels returned is limited to two.

Create one more table to proceed with this example.
-- Create an Employee table.
CREATE TABLE dbo.Employees
(
    EmployeeID SMALLINT NOT NULL,
    FirstName NVARCHAR(30) NOT NULL,
    LastName NVARCHAR(40) NOT NULL,
    Title NVARCHAR(50) NOT NULL,
    DeptID SMALLINT NOT NULL,
    ManagerID SMALLINT NULL,

    CONSTRAINT PK_EmployeeID PRIMARY KEY CLUSTERED (EmployeeID ASC),
    CONSTRAINT FK_Employees_ManagerID_EmployeeID
        FOREIGN KEY (ManagerID) REFERENCES dbo.Employees (EmployeeID)
);
-- Populate the table with values.
INSERT INTO dbo.Employees
VALUES
    (1, N'Ken', N'Sánchez', N'Chief Executive Officer', 16, NULL),
    (273, N'Brian', N'Welcker', N'Vice President of Sales', 3, 1),
    (274, N'Stephen', N'Jiang', N'North American Sales Manager', 3, 273),
    (275, N'Michael', N'Blythe', N'Sales Representative', 3, 274),
    (276, N'Linda', N'Mitchell', N'Sales Representative', 3, 274),
    (285, N'Syed', N'Abbas', N'Pacific Sales Manager', 3, 273),
    (286, N'Lynn', N'Tsoflias', N'Sales Representative', 3, 285),
    (16, N'David', N'Bradley', N'Marketing Manager', 4, 273),
    (23, N'Mary', N'Gibson', N'Marketing Specialist', 4, 16);

After running this script we have the following data.

The following example shows the hierarchical list of managers and the employees who report to them.
WITH DirectReports(ManagerID, EmployeeID, Title, EmployeeLevel) AS
(
    SELECT
        ManagerID,
        EmployeeID,
        Title,
        0 AS EmployeeLevel
    FROM dbo.Employees
    WHERE ManagerID IS NULL

    UNION ALL

    SELECT
        e.ManagerID,
        e.EmployeeID,
        e.Title,
        EmployeeLevel + 1
    FROM dbo.Employees AS e
    INNER JOIN DirectReports AS d
        ON e.ManagerID = d.EmployeeID
)

SELECT
    ManagerID,
    EmployeeID,
    Title,
    EmployeeLevel
FROM DirectReports
ORDER BY ManagerID;

The output of this CTE will be.

E. Use a recursive common table expression to display two levels of recursion: The following example shows managers and the employees reporting to them. The number of levels returned is limited to two.
WITH DirectReports (ManagerID, EmployeeID, Title, EmployeeLevel) AS
(
    SELECT ManagerID, EmployeeID, Title, 0 AS EmployeeLevel
    FROM dbo.Employees
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.ManagerID, e.EmployeeID, e.Title, EmployeeLevel + 1
    FROM dbo.Employees AS e
    INNER JOIN DirectReports AS d
        ON e.ManagerID = d.EmployeeID
)
SELECT ManagerID, EmployeeID, Title, EmployeeLevel
FROM DirectReports
WHERE EmployeeLevel <= 2;


The result of this CTE will be.

HostForLIFEASP.NET SQL Server 2022 Hosting



European SQL Server 2022 Hosting :: Cleaning Data in SQL Server

clock November 25, 2024 06:45 by author Peter

In order to guarantee the accuracy, consistency, and dependability of the data used for analysis, reporting, or machine learning, data cleansing is an essential stage in the data preparation process. Inaccurate judgments, faulty models, and ineffective decision-making can result from low-quality data.

1. Removing Duplicates
Duplicates in datasets can skew results, inflate counts, or cause redundancy. SQL Server provides ways to remove duplicates using DISTINCT or GROUP BY.
Example. Identify and remove duplicate rows in the Person.EmailAddress table.

If you want to remove duplicates permanently, you can use CTE (Common Table Expressions) with ROW_NUMBER().
USE Hostforlife;
GO

SELECT DISTINCT
    EmailAddressID,
    EmailAddress
FROM
    Person.EmailAddress;

WITH CTE AS (
    SELECT
        EmailAddressID,
        EmailAddress,
        ROW_NUMBER() OVER (PARTITION BY EmailAddress ORDER BY EmailAddressID) AS RowNum
    FROM
        Person.EmailAddress
)
DELETE FROM CTE
WHERE RowNum > 1;

2. Handling Missing Values
Missing values can impact analysis and decision-making. Use SQL functions like IS NULL, COALESCE, or CASE to identify and handle missing data. Example. Replace missing PhoneNumber values in the Person.Use the personPhone table with a default value or remove rows with missing values.
USE 
Hostforlife;
GO

SELECT
    BusinessEntityID,
    PhoneNumber,
    COALESCE(PhoneNumber, 'Unknown') AS CleanedPhoneNumber
FROM
    Person.PersonPhone;

DELETE FROM
    Person.PersonPhone
WHERE
    PhoneNumber IS NULL;

3. Correcting Data Inconsistencies
Inconsistencies like variations in case or formatting can lead to errors in joins or grouping. SQL Server string functions like UPPER, LOWER, or REPLACE can help standardize data.

Example. Standardize FirstName values in the Person.Person table to uppercase. Replace incorrect substrings in email addresses.
USE 
Hostforlife;
GO

UPDATE Person.Person
SET FirstName = UPPER(FirstName);

UPDATE Person.EmailAddress
SET EmailAddress = REPLACE(EmailAddress, '@
Hostforlife.eu', '@Hostforlife.eu
');

4. Standardizing Data Formats
Standardized data formats ensure consistency and compatibility across systems. Functions like CAST and CONVERT are often used for this purpose. Example. Convert ModifiedDate in the Sales.SalesOrderHeader table to a specific format. Here, the date format 101 converts the date to MM/DD/YYYY.

USE
Hostforlife
GO

SELECT SalesOrderID,
       CONVERT(VARCHAR(10), ModifiedDate, 101) AS FormattedDate
FROM Sales.SalesOrderHeader;

5. Removing Outliers

Outliers can distort statistical analyses and trends. Use statistical functions and filtering to identify and exclude them.

Example. Remove outliers based on TotalDue in the Sales.SalesOrderHeader table.
USE
Hostforlife
GO

SELECT *
FROM Sales.SalesOrderHeader
WHERE TotalDue BETWEEN
      (SELECT PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY TotalDue) OVER ()) AND
      (SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER ());

6. Validating Data
Data validation ensures data integrity by applying constraints or rules. SQL Server constraints like NOT NULL, UNIQUE, and CHECK are essential for enforcing data quality.
Example. Enforce data integrity when creating a new table.
USE 
Hostforlife;
GO

CREATE TABLE Sales.Promotions (
    PromotionID INT PRIMARY KEY,
    PromotionName NVARCHAR(100) NOT NULL,
    DiscountPercentage DECIMAL(5, 2) CHECK (DiscountPercentage BETWEEN 0 AND 100),
    StartDate DATE NOT NULL,
    EndDate DATE NOT NULL,
    CONSTRAINT CK_Promotions_EndDate CHECK (EndDate > StartDate)
);

Alternatively, validate existing data using conditional queries.
SELECT *
FROM Sales.Promotions
WHERE DiscountPercentage < 0
   OR DiscountPercentage > 100;


Conclusion
Data cleansing is an ongoing process and a crucial component of the data lifecycle. By removing duplicates, handling missing values, correcting inconsistencies, standardizing formats, removing outliers, and validating data, you can significantly improve the quality of your data. These techniques, demonstrated using the AdventureWorks2022 database, can be applied to real-world datasets to ensure accurate and actionable insights. By incorporating these practices into your data workflows, you can ensure that your analysis, reports, and machine learning models are built on a solid foundation of clean data.

HostForLIFEASP.NET SQL Server 2022 Hosting



European SQL Server 2022 Hosting :: Making Subquery Workable

clock November 6, 2024 09:11 by author Peter

When working with SubQuery, the behavior or requirement might not be the save as ordinary query.

This query is working:

However, as a subquery it does not working any more:

The reason is the subquery needs an alias: when we added it, the original error is gone, but a new one is coming:

This error is due to the missing column name. That is good for ordinary query, but not for subquery:

adding column name, it is working now:


Adding "ORDER BY", this does not work, with errors:

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

As suggested, adding TOP it is working

HostForLIFEASP.NET SQL Server 2022 Hosting


 



European SQL Server 2022 Hosting :: Windowing Enhancements New T-SQL Enhancement in SQL Server

clock October 31, 2024 08:11 by author Peter

A number of noteworthy T-SQL improvements included in SQL Server 2022 simplify data handling, particularly when utilizing analytics, window functions, and sliding aggregations. These new features can be used by database administrators, data engineers, and SQL developers to streamline complicated data operations, enhance readability, and maximize query performance.

Key Enhancements in SQL Server 2022 Window Functions

  • OVER with ORDER BY for Aggregate Functions: Allows running and sliding aggregations.
  • Sliding Aggregations at Account Level: Maximize window rows and apply at a granular level.
  • WINDOW Clause: Reduces code duplication.
  • IGNORE NULLS: Optimizes data analysis for first and last values.
  • IGNORE VALUES for NULLs in Aggregations: Adds flexibility in handling NULL data points.

We will use the table below for our examples.CREATE TABLE dbo.AccountTransactions (
    AccountID INT,
    TransactionDate DATE,
    Amount DECIMAL(10, 2)
);

INSERT INTO dbo.AccountTransactions (AccountID, TransactionDate, Amount)
VALUES
    (1, '2023-01-01', 100.00),
    (1, '2023-01-02', 150.00),
    (1, '2023-01-03', 200.00),
    (1, '2023-01-04', NULL),
    (1, '2023-01-05', 300.00),
    (2, '2023-01-01', 500.00),
    (2, '2023-01-02', 700.00),
    (2, '2023-01-03', NULL),
    (2, '2023-01-04', 800.00),
    (2, '2023-01-05', 900.00);
1. Using OVER with ORDER BY for Aggregate Functions
In SQL Server 2012, Microsoft introduced the ability to use OVER with ORDER BY for aggregate functions, which allows for running and sliding aggregations. This feature lets you compute cumulative totals, moving averages, and other metrics across an ordered dataset without needing a self-join or correlated subquery.Example. Running Total of Transactions by Account
The following query calculates a running total of transaction amounts per AccountID, ordered by TransactionDate.
SELECT
    AccountID,
    TransactionDate,
    Amount,
    SUM(Amount) OVER (PARTITION BY AccountID ORDER BY TransactionDate) AS RunningTotal
FROM dbo.AccountTransactions
ORDER BY AccountID, TransactionDate;

/*
The SUM function is paired with OVER (PARTITION BY AccountID ORDER BY TransactionDate). For each row, it calculates a running total of the Amount column by partitioning the data by AccountID and ordering it by TransactionDate.
*/
Output

Use Case: Running totals are helpful for calculating cumulative spending per account, which is common in financial reporting or customer analytics.

2. Sliding Aggregations with a Limit on Rows per Window
Sliding aggregations are a specific type of windowed calculation where only a fixed number of preceding or following rows are considered. In SQL Server 2022, this can be particularly useful for rolling averages or sums over a specified window of rows.

Example. Three-Row Sliding Sum
In this example, we calculate a sliding sum of the amount for each of the three most recent transactions of each AccountID.
SELECT
    AccountID,
    TransactionDate,
    Amount,
    SUM(Amount) OVER (PARTITION BY AccountID ORDER BY TransactionDate ROWS 2 PRECEDING) AS SlidingSum
FROM dbo.AccountTransactions
ORDER BY AccountID, TransactionDate;

/*
The ROWS 2 PRECEDING clause limits the window to the current row and the previous two rows. This is a rolling three-row sum, which helps in understanding recent trends in transaction amounts.
*/

Output

Use Case. Financial analysts might use a sliding aggregation to calculate moving averages, which is useful for smoothing out trends over a defined period

3. Using the WINDOW Clause to Eliminate Code Duplication
The WINDOW clause, introduced in SQL Server 2022, allows you to define a window frame once and reference it multiple times. This feature helps reduce code duplication and improves readability, especially in queries with multiple windowed calculations.

Example. Defining and Using a Window
Here, we define a window once using the WINDOW clause and reference it for both the running total and the sliding sum.
SELECT
    AccountID,
    TransactionDate,
    Amount,
    SUM(Amount) OVER w AS RunningTotal,
    AVG(Amount) OVER w AS RollingAverage
FROM dbo.AccountTransactions
WINDOW w AS (PARTITION BY AccountID ORDER BY TransactionDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY AccountID, TransactionDate;

/*
The WINDOW clause defines a reusable window called w, which can then be applied to different aggregates, reducing code repetition and making modifications easier.
*/

Output

Use Case. The WINDOW clause is particularly valuable in analytics queries where multiple window functions need to use the same partition and order criteria.

4. Using IGNORE NULLS in FIRST_VALUE and LAST_VALUE Functions

The IGNORE NULLS option, introduced in SQL Server 2022, allows window functions like FIRST_VALUE and LAST_VALUE to skip NULL values, making it easier to retrieve non-null values in sequences with missing data.

Example. Getting the Last Non-NULL Value per Account
In this example, we use LAST_VALUE with IGNORE NULLS to fetch the most recent non-null transaction amount for each account.
SELECT
    AccountID,
    TransactionDate,
    Amount,
    LAST_VALUE(Amount) IGNORE NULLS OVER (PARTITION BY AccountID ORDER BY TransactionDate) AS LastNonNullAmount
FROM dbo.AccountTransactions
ORDER BY AccountID, TransactionDate;

/*
Without IGNORE NULLS, the LAST_VALUE function would return a NULL value if the last row in the partition contained NULL. With IGNORE NULLS, SQL Server skips over the NULL values and returns the most recent non-null value instead.
*/


Output

Use Case. This is particularly useful for handling incomplete or intermittent data, such as filling in missing stock prices, temperature readings, or any sequential time-based metric.

5. Handling NULLs with the IGNORE NULLS Option for Aggregations
SQL Server 2022's IGNORE NULLS option can also be applied to functions like SUM and AVG, making it easier to handle datasets with missing values without affecting calculations.

Example. Average Transaction Amount Ignoring NULLs

The following query calculates the average transaction amount for each account while ignoring NULL values.
SELECT
    AccountID,
    TransactionDate,
    Amount,
    SUM(Amount) OVER (PARTITION BY AccountID ORDER BY TransactionDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) /
    NULLIF(COUNT(Amount) OVER (PARTITION BY AccountID ORDER BY TransactionDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0)
    AS AverageAmount
FROM dbo.AccountTransactions
ORDER BY AccountID, TransactionDate;

/*
With IGNORE NULLS, the calculation ignores rows where Amount is NULL, providing a more accurate average.
*/


Output

Use Case: NULL values are common in datasets, and having the ability to ignore them directly in aggregation functions simplifies code and improves result accuracy.

Conclusion
SQL Server 2022 brings powerful enhancements to window functions, making them more flexible and capable of handling real-world scenarios in SQL development. From running totals and sliding windows to null handling with IGNORE NULLS, these features improve the clarity, efficiency, and precision of windowed calculations. By leveraging these new tools, SQL developers, data engineers, and DBAs can build more expressive and performant SQL queries, making SQL Server 2022 a strong choice for data-intensive applications.

HostForLIFEASP.NET SQL Server 2022 Hosting


 



European SQL Server 2022 Hosting :: NTile Function in SQL Server

clock October 23, 2024 08:32 by author Peter

In my series of essays about SQL Server's ranking functions, this is the last one. The Row Number, Rank, and DenseRank() functions were covered in earlier talks. It's time to finally talk about the idea behind the NTile() method. The following will be the first data setup for the discussion.

The NTile function
The NTile function takes an integer as an input and divides the records of the result set into that number of groups. For example, NTile(5) will divide a result set of 10 records into 5 groups with two records in each group. In case the result set is not exactly divisible into an equal number of records, it assigns more records to the starting groups and less to the following ones.

For example, if we have 11 records and we apply NTile(5), then it will divide the result set into 5 groups with 3 records in the first group and 2 records in the rest of the 4 groups.

The main point is that within each group, each record will have the same number or rank.

Again, we will be discussing the concept with two cases.

Case 1. Ranking entire result set using Order By clause
Our entire query will remain the same except the ranking function is changed to NTile and we will divide the result set into 4 groups.
SELECT
    NTILE(4) OVER (ORDER BY DepartmentName DESC) AS GeneratedRank,
    MemberName,
    Gender,
    DepartmentName
FROM
    dbo.DepartmentMembers;


Execute the query and see the results. It divides the 9 records into 4 groups with 3 records in the first group and 2 records in the rest of the 3 groups. Within each group, the records have the same rank or number assigned to them.

It might seem quite similar to the Dense_Rank, but it is not. Dense_Rank orders the data by the column specified in the order by clause and then assigns the rank. But here, the order by clause only matters for the starting point of dividing the result set into groups. The results are divided into groups, depending on the size provided by the function parameter and the ranking is assigned to the records based on the group number. See the queries below that compare both functions on the same data.

Case 2. Ranking partitioned result set using Partition By Clause
The query remains the same, except the Partition By clause is also added. So the query changes to.
SELECT
    NTILE(4) OVER (PARTITION BY Gender ORDER BY DepartmentName DESC) AS GeneratedRank,
    MemberName,
    Gender,
    DepartmentName
FROM
    dbo.DepartmentMembers;


This time, the result set is first partitioned into 2 result sets based on Gender, and then within each result set, the results are further divided by the NTile function depending on the size defined. The logic of dividing each result set remains the same as we discussed above. the only difference is that in case 1, it was applied on the entire results set, but in this case, it is applied to two result sets 1 and 2. See the results below.

As we can see above, the entire result set was divided into two sets, based on the partition by Gender. Further, within each result set, they are divided into 4 groups. For result set 1, we have 5 records. So the first group in the first result set is divided into 2 records and the remaining 3 records are divided into 1 record each. For result set 2, the record set is divided into 4 equal groups of 1 record each, since there was a total of 4 records. So this was about the NTile function in SQL Server. I hope you enjoyed reading this.

HostForLIFEASP.NET SQL Server 2022 Hosting

 



European SQL Server 2022 Hosting :: The Best and Worst New T-SQL Improvements for SQL Server

clock October 16, 2024 06:55 by author Peter

The syntax, application cases, and basic examples of these two new T-SQL enhancements—LEAST and GREATEST—will be covered in this article. We will also look at how they differ from conventional methods and how they affect regular database operations. Many querying jobs can be made more efficient by these functions for SQL developers, data engineers, and DBAs.

Introduction
Before introducing these functions in SQL Server 2022, finding the minimum or maximum value from a set of columns or expressions required more complex logic or verbose CASE statements. The LEAST and GREATEST functions simplify these operations, offering more intuitive and concise ways to achieve the desired outcomes.

Syntax Overview
    LEAST: Returns the smallest value from a list of expressions.

    LEAST (
        expression1,
        expression2,
        ...,
        expressionN
    )

GREATEST: Returns the largest value from a list of expressions.

GREATEST (
    expression1,
    expression2,
    ...,
    expressionN
)

Both functions can compare multiple arguments (expressions), and their arguments can be any data type that supports comparison (numeric, date, or even string types). The result will be of the same data type as the input expressions.

LEAST and GREATEST vs. Traditional Methods

Prior to SQL Server 2022, SQL developers typically had to use CASE statements or a combination of MIN and MAX functions along with complex logic to compare multiple columns or values. While these methods worked, they were cumbersome and often led to less readable queries.

Examples of Traditional Methods
To compare multiple columns before SQL Server 2022.
SELECT
    CASE
        WHEN Column1 <= Column2 AND Column1 <= Column3 THEN Column1
        WHEN Column2 <= Column1 AND Column2 <= Column3 THEN Column2
        ELSE Column3
    END AS SmallestValue,

    CASE
        WHEN Column1 >= Column2 AND Column1 >= Column3 THEN Column1
        WHEN Column2 >= Column1 AND Column2 >= Column3 THEN Column2
        ELSE Column3
    END AS LargestValue
FROM MyTable;


This logic can quickly become unwieldy if you have many columns to compare. Now, with the new functions in SQL Server 2022, this task becomes much simpler.

Using LEAST and GREATEST

The same example uses LEAST and GREATEST.
SELECT
    LEAST(Column1, Column2, Column3) AS SmallestValue,
    GREATEST(Column1, Column2, Column3) AS LargestValue
FROM
    MyTable;


Examples
Example 1. Using LEAST to Compare Different Tax Rates

Let's assume you want to compare the TaxAmt for several sales orders across different order dates and determine the minimum tax amount for each sales order. The Sales. The salesOrderHeader table includes details about each sales order, including the tax amount.

We’ll use the LEAST function to identify the minimum tax amount for several orders.
USE [Hostforlife];
GO

SELECT
    SalesOrderID,
    LEAST(TaxAmt, Freight, SubTotal) AS SmallestAmount
FROM
    Sales.SalesOrderHeader WITH (NOLOCK)
WHERE
    SalesOrderID IN (43659, 43660, 43661);

Output

Example 2. Use GREATEST to Find the Maximum Bonus, Sick Leave, and Vacation Hours
Let’s now use the HumanResources.Employee table. We want to compare each employee's VacationHours, SickLeaveHours, and Bonus (we’ll assume a fixed bonus column for demonstration) and find out which of these values is the largest for each employee.
USE [Hostforlife];
GO

SELECT
    BusinessEntityID,
    GREATEST(VacationHours, SickLeaveHours, 10) AS MaxBenefit
FROM
    HumanResources.Employee
WHERE
    BusinessEntityID BETWEEN 1 AND 10;

Output

Performance Considerations
While the LEAST and GREATEST functions simplify queries, it’s important to consider their performance impact in large datasets or complex queries. Typically, these functions are efficient, especially compared to more verbose alternatives such as CASE statements. However, in cases where you're comparing large datasets or many columns, performance tuning may still be necessary.

Use Cases for DBAs, Data Engineers, and SQL Developers
For SQL developers, LEAST and GREATEST streamline common query patterns, making code easier to read and maintain. This is especially beneficial when writing queries that compare multiple values across different columns.

For data engineers, these functions simplify data pipeline transformations, where selecting the minimum or maximum value from a set of columns is common. Whether dealing with dates, numeric data, or even strings, LEAST and GREATEST can reduce the complexity of transformation logic.

DBAs will also appreciate the reduced complexity when working with large, production-level queries that must compare multiple values or optimize reporting views. These functions can reduce the need for additional table scans or subqueries in complex reports.

Conclusion

Whether you're comparing sales figures, tracking project dates, or handling any scenario that involves finding the smallest or largest value among multiple expressions, LEAST and GREATEST make the job easier. By understanding their behavior, particularly with respect to NULL values, you can ensure that your queries are both efficient and accurate.

HostForLIFEASP.NET SQL Server 2022 Hosting

 



European SQL Server 2022 Hosting :: IS [NOT] DISTINCT FROM New T-SQL Enhancements in SQL Server

clock October 8, 2024 06:56 by author Peter

The most recent version of Microsoft's well-liked relational database management system, SQL Server 2022, contains a number of new features and improvements targeted at enhancing database management, query efficiency, and developer productivity. The addition of the IS DISTINCT FROM and IS NOT DISTINCT FROM predicates, which provide a more thorough and user-friendly method of handling comparisons across nullable columns in SQL queries, is one such enhancement.

Dealing with NULL values in SQL can be challenging for database administrators (DBAs), data engineers, and SQL developers, particularly when it comes to comparisons. Conventional equality operators (=) and their negation (<>) frequently handle NULL values in an ineffective manner, causing confusion and unexpected outcomes. SQL Server 2022 introduces IS DISTINCT FROM and IS NOT DISTINCT FROM, which offer a more dependable and expressive method of handling NULL values and comparing nullable columns.

The Problem with NULL Comparisons in SQL
It's important to understand the challenges that arise when dealing with NULL values in SQL. In SQL, NULL represents an unknown or missing value, and it behaves differently than other values in the database.

The standard equality (=) and inequality (<>) operators do not behave as expected when NULL values are involved.

  • When comparing two NULL values using =, the result is not TRUE, but rather UNKNOWN, because SQL treats NULL as an unknown value.
  • Similarly, comparing a NULL value with a non-NULL value using <> results in UNKNOWN, not TRUE.

Consider the following example. This behavior stems from the fact that NULL represents an unknown value, and comparing two unknown values logically does not lead to a definitive TRUE or FALSE result. Instead, SQL returns UNKNOWN, which is treated as FALSE in most boolean contexts.

Traditional Solutions: IS NULL and IS NOT NULL
To work around this issue, SQL developers typically use the IS NULL and IS NOT NULL checks when comparing nullable columns. While this approach works, it can be verbose and error-prone, especially in complex queries involving multiple nullable columns.

The New IS DISTINCT FROM and IS NOT DISTINCT FROM Predicates
SQL Server 2022 addresses the challenge of NULL comparisons by introducing the IS DISTINCT FROM and IS NOT DISTINCT FROM predicates, which offer a more straightforward and intuitive way to compare nullable columns.

IS DISTINCT FROM: The IS DISTINCT FROM predicate compares two values and returns TRUE if the values are different, even when one or both of the values are NULL. In other words, IS DISTINCT FROM treats NULL as a regular value, and two NULL values are considered equal.

Syntax

value1 IS DISTINCT FROM value2

--TRUE if the two values are different, including cases where one value is NULL and the other is not.
--FALSE if the two values are the same, including cases where both values are NULL.

IS NOT DISTINCT FROM: The IS NOT DISTINCT FROM predicate is the logical opposite of IS DISTINCT FROM. It returns TRUE if the two values are the same, even when both values are NULL.

Syntax
value1 IS NOT DISTINCT FROM value2

--TRUE if the two values are the same, including cases where both values are NULL.
--FALSE if the two values are different, including cases where one value is NULL and the other is not.


Example Use Case

Consider the following example, where we have a table Employees with nullable columns Salary and Bonus. We want to compare the Salary and Bonus columns and determine hether they are distinct or not, even when NULL values are involved.

To compare the Salary and Bonus columns using traditional SQL operators, we would need to write a verbose query.
SELECT
    EmployeeID,
    CASE
        WHEN Salary IS NULL AND Bonus IS NULL THEN 'Equal'
        WHEN Salary = Bonus THEN 'Equal'
        ELSE 'Distinct'
    END AS ComparisonResult
FROM dbo.Employees


Using IS DISTINCT FROM, the query becomes much simpler.
SELECT
    EmployeeID,
    CASE
        WHEN Salary IS DISTINCT FROM Bonus THEN 'Distinct'
        ELSE 'Equal'
    END AS ComparisonResult
FROM dbo.Employees


Output

Advantages

  • Simplified Code: As demonstrated in the previous example, the IS DISTINCT FROM and IS NOT DISTINCT FROM predicates allow for more concise and readable SQL queries. By eliminating the need for IS NULL and IS NOT NULL checks, these predicates reduce the verbosity of the code and make it easier to understand.
  • More Intuitive Handling of NULL Values: These predicates provide a more intuitive way to compare nullable columns, as they treat NULL values as regular values. This is especially useful in scenarios where developers need to compare nullable columns without worrying about the special handling of NULL.
  • Improved Consistency: The traditional behavior of SQL's equality (=) and inequality (<>) operators can be confusing when NULL values are involved. By using IS DISTINCT FROM and IS NOT DISTINCT FROM, developers can ensure consistent behavior in their queries, regardless of whether NULL values are present.

Use Cases
1. Data Deduplication
In data deduplication tasks, developers often need to identify and remove duplicate rows from a table. When nullable columns are involved, the IS DISTINCT FROM predicate can help ensure that NULL values are treated consistently.

For example, to identify duplicate rows in a table with nullable columns.
WITH DuplicateRows AS (
    SELECT
        EmployeeID,
        ROW_NUMBER() OVER (PARTITION BY Salary, Bonus ORDER BY EmployeeID) AS RowNum
    FROM dbo.Employees
    WHERE Salary IS NOT DISTINCT FROM Bonus
)
DELETE FROM DuplicateRows WHERE RowNum > 1


2. Conditional Updates and Inserts
In scenarios where developers need to perform conditional updates or inserts based on comparisons of nullable columns, IS DISTINCT FROM and IS NOT DISTINCT FROM can simplify the logic and ensure accurate results.

For example, to update the Bonus column only if it is distinct from the Salary column.
UPDATE dbo.Employees
SET Bonus = Salary * 0.1
WHERE Salary IS DISTINCT FROM Bonus;


3. Data Comparison and Synchronization
In data comparison and synchronization tasks, developers often need to compare rows in two different tables to identify changes or discrepancies. When nullable columns are involved, the traditional comparison operators may not behave as expected, leading to incorrect results.

For example, consider two tables, dbo.Employees and dbo.Employees1, which stores employee salary information. We want to compare the two tables and identify rows where the salary has changed. In this query, IS DISTINCT FROM ensures that the comparison works correctly, even when one or both Salary values are NULL.
SELECT
    a.EmployeeID,
    a.Salary AS SalaryInTableA,
    b.Salary AS SalaryInTableB
FROM
    dbo.Employees a
JOIN
    dbo.Employees1 b
    ON a.EmployeeID = b.EmployeeID
WHERE
    a.Salary IS DISTINCT FROM b.Salary;

Conclusion
The introduction of the IS DISTINCT FROM and IS NOT DISTINCT FROM predicates in SQL Server 2022 is a welcome enhancement for SQL developers, data engineers, and DBAs. These new predicates simplify the process of comparing nullable columns, reduce the complexity of SQL queries, and provide more intuitive and consistent.

HostForLIFEASP.NET SQL Server 2022 Hosting



About HostForLIFE

HostForLIFE is European Windows Hosting Provider which focuses on Windows Platform only. We deliver on-demand hosting solutions including Shared hosting, Reseller Hosting, Cloud Hosting, Dedicated Servers, and IT as a Service for companies of all sizes.

We have offered the latest Windows 2019 Hosting, ASP.NET 5 Hosting, ASP.NET MVC 6 Hosting and SQL 2019 Hosting.


Month List

Tag cloud

Sign in