European Windows 2019 Hosting BLOG

BLOG about Windows 2019 Hosting and SQL 2019 Hosting - Dedicated to European Windows Hosting Customer

European SQL Server 2022 Hosting :: Cleaning Data in SQL Server

clock November 25, 2024 06:45 by author Peter

In order to guarantee the accuracy, consistency, and dependability of the data used for analysis, reporting, or machine learning, data cleansing is an essential stage in the data preparation process. Inaccurate judgments, faulty models, and ineffective decision-making can result from low-quality data.

1. Removing Duplicates
Duplicates in datasets can skew results, inflate counts, or cause redundancy. SQL Server provides ways to remove duplicates using DISTINCT or GROUP BY.
Example. Identify and remove duplicate rows in the Person.EmailAddress table.

If you want to remove duplicates permanently, you can use CTE (Common Table Expressions) with ROW_NUMBER().
USE Hostforlife;
GO

SELECT DISTINCT
    EmailAddressID,
    EmailAddress
FROM
    Person.EmailAddress;

WITH CTE AS (
    SELECT
        EmailAddressID,
        EmailAddress,
        ROW_NUMBER() OVER (PARTITION BY EmailAddress ORDER BY EmailAddressID) AS RowNum
    FROM
        Person.EmailAddress
)
DELETE FROM CTE
WHERE RowNum > 1;

2. Handling Missing Values
Missing values can impact analysis and decision-making. Use SQL functions like IS NULL, COALESCE, or CASE to identify and handle missing data. Example. Replace missing PhoneNumber values in the Person.Use the personPhone table with a default value or remove rows with missing values.
USE 
Hostforlife;
GO

SELECT
    BusinessEntityID,
    PhoneNumber,
    COALESCE(PhoneNumber, 'Unknown') AS CleanedPhoneNumber
FROM
    Person.PersonPhone;

DELETE FROM
    Person.PersonPhone
WHERE
    PhoneNumber IS NULL;

3. Correcting Data Inconsistencies
Inconsistencies like variations in case or formatting can lead to errors in joins or grouping. SQL Server string functions like UPPER, LOWER, or REPLACE can help standardize data.

Example. Standardize FirstName values in the Person.Person table to uppercase. Replace incorrect substrings in email addresses.
USE 
Hostforlife;
GO

UPDATE Person.Person
SET FirstName = UPPER(FirstName);

UPDATE Person.EmailAddress
SET EmailAddress = REPLACE(EmailAddress, '@
Hostforlife.eu', '@Hostforlife.eu
');

4. Standardizing Data Formats
Standardized data formats ensure consistency and compatibility across systems. Functions like CAST and CONVERT are often used for this purpose. Example. Convert ModifiedDate in the Sales.SalesOrderHeader table to a specific format. Here, the date format 101 converts the date to MM/DD/YYYY.

USE
Hostforlife
GO

SELECT SalesOrderID,
       CONVERT(VARCHAR(10), ModifiedDate, 101) AS FormattedDate
FROM Sales.SalesOrderHeader;

5. Removing Outliers

Outliers can distort statistical analyses and trends. Use statistical functions and filtering to identify and exclude them.

Example. Remove outliers based on TotalDue in the Sales.SalesOrderHeader table.
USE
Hostforlife
GO

SELECT *
FROM Sales.SalesOrderHeader
WHERE TotalDue BETWEEN
      (SELECT PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY TotalDue) OVER ()) AND
      (SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER ());

6. Validating Data
Data validation ensures data integrity by applying constraints or rules. SQL Server constraints like NOT NULL, UNIQUE, and CHECK are essential for enforcing data quality.
Example. Enforce data integrity when creating a new table.
USE 
Hostforlife;
GO

CREATE TABLE Sales.Promotions (
    PromotionID INT PRIMARY KEY,
    PromotionName NVARCHAR(100) NOT NULL,
    DiscountPercentage DECIMAL(5, 2) CHECK (DiscountPercentage BETWEEN 0 AND 100),
    StartDate DATE NOT NULL,
    EndDate DATE NOT NULL,
    CONSTRAINT CK_Promotions_EndDate CHECK (EndDate > StartDate)
);

Alternatively, validate existing data using conditional queries.
SELECT *
FROM Sales.Promotions
WHERE DiscountPercentage < 0
   OR DiscountPercentage > 100;


Conclusion
Data cleansing is an ongoing process and a crucial component of the data lifecycle. By removing duplicates, handling missing values, correcting inconsistencies, standardizing formats, removing outliers, and validating data, you can significantly improve the quality of your data. These techniques, demonstrated using the AdventureWorks2022 database, can be applied to real-world datasets to ensure accurate and actionable insights. By incorporating these practices into your data workflows, you can ensure that your analysis, reports, and machine learning models are built on a solid foundation of clean data.

HostForLIFEASP.NET SQL Server 2022 Hosting



European SQL Server 2022 Hosting :: Temporary Tables vs Table Variables in SQL Server Explained

clock November 20, 2024 07:52 by author Peter

In SQL Server, a temporary table is a unique kind of table that is used to hold data for a short time. When it is no longer required, it is automatically removed after existing for the duration of a batch or database session. The tempdb system database contains temporary tables.

Types of Temporary Tables
SQL Server supports two types of temporary tables.

  • Local Temporary Table (#)
    • Only visible to the session or connection that created it.
    • Automatically deleted when the session ends.
  • Global Temporary Table (##)
    • Visible to all sessions and connections.
    • Deleted only when the last session referencing it is closed.

Procedure for utilizing a temporary table

Step 1. Syntax for Creating a Temporary Table.
CREATE TABLE #TempTableName
(
    Column1 DataType PRIMARY KEY,  -- Example of a primary key
    Column2 DataType,
    Column3 DataType
);


Step 2. Example of Using a Temporary Table in a Stored Procedure.
CREATE PROCEDURE StudentDetailsTempTable
AS
BEGIN
    -- Step 1: Create a temporary table
    CREATE TABLE #StudentDetails
    (
        StudentID INT PRIMARY KEY,           

-- Primary key example
        StudentName NVARCHAR(100),
        Course NVARCHAR(100),
        Fees DECIMAL(18,2)
    );

    -- Step 2: Insert data into the temporary table
    INSERT INTO #StudentDetails (StudentID, StudentName, Course, Fees)
    VALUES
        (1, 'Peter', 'MBA-IT', 160000.00),
        (2, 'Leon', 'MBA-Economics', 180000.00),
        (3, 'Alex', 'Master in technology in cs', 150000.00);

    -- Step 3: Select data from the temporary table
    SELECT * FROM #StudentDetails;

    -- Step 4: Temporary table will be dropped automatically after procedure ends
END;

Notes

  • Scope: A temporary table established within a stored procedure is confined to that specific procedure and is automatically removed upon the completion of the procedure's execution.
  • Multiple Sessions: Temporary tables that begin with a # symbol are restricted to the session in which they were created. For access across different sessions, a global temporary table can be created using ##; however, local tables are generally considered safer for use within stored procedures.

When to Utilize Temporary Tables?

  • Intermediate Results: Temporary tables are advantageous for storing intermediate results of queries temporarily within a stored procedure.
  • Data Transformation: They serve the purpose of manipulating or aggregating data prior to delivering the final result set.
  • Performance: In certain scenarios, the use of temporary tables can enhance performance, particularly when managing complex joins or aggregations involving large datasets.

Process for utilizing a Table variable
A table variable in SQL Server is a distinct category of variable designed to hold a temporary collection of data organized in a table structure. It is defined through the DECLARE statement utilizing the TABLE data type. While table variables share similarities with temporary tables, they exhibit notable differences in terms of scope, performance, and application.Limitations

  1. Prohibition of DDL Operations: DDL commands such as ALTER or DROP are not permitted on table variables.
  2. Restriction on Explicit Indexes: Only indexes that are established through constraints are permissible.
  3. Inefficiency with Large Data Sets: Table variables are not as effective as temporary tables when managing large volumes of data.

Step 1. To create a table variable, the appropriate syntax involves using the DECLARE statement in conjunction with the TABLE. Syntax for Declaring a Table Variable.
DECLARE @TableName TABLE
(
    Column1 DataType PRIMARY KEY,  -- Primary key example
    Column2 DataType,
    Column3 DataType
);


Step 2. Defining and Utilizing a Table Variable.
CREATE PROCEDURE StudentDetailsTableVariable
AS
BEGIN
    DECLARE @StudentDetailsTable TABLE
    (
        StudentID INT PRIMARY KEY,            -- Primary key example
        StudentName NVARCHAR(100),
        Course NVARCHAR(100),
        Fees DECIMAL(18,2)
    );

    -- Insert data into the table variable
    INSERT INTO @StudentDetailsTable (StudentID, StudentName, Course, Fees)
    VALUES
        (1, 'Peter', 'BSC', 60000.00),
        (2, 'Leon', 'BA', 80000.00),
        (3, 'Alex', 'Data Science', 50000.00);

    -- Select data from the table variable
    SELECT * FROM @StudentDetailsTable;
END;


  • Scope: A temporary table established within a stored procedure is confined to that specific procedure and is automatically removed upon the completion of the procedure's execution.
  • Multiple Sessions: Temporary tables that begin with a # symbol are restricted to the session in which they were created. For access across different sessions, a global temporary table can be created using ##; however, local tables are generally considered safer for use within stored procedures.

When to Utilize Temporary Tables?

  • Intermediate Results: Temporary tables are advantageous for storing intermediate results of queries temporarily within a stored procedure.
  • Data Transformation: They serve the purpose of manipulating or aggregating data prior to delivering the final result set.
  • Performance: In certain scenarios, the use of temporary tables can enhance performance, particularly when managing complex joins or aggregations involving large datasets.

Process for utilizing a Table variable

A table variable in SQL Server is a distinct category of variable designed to hold a temporary collection of data organized in a table structure. It is defined through the DECLARE statement utilizing the TABLE data type. While table variables share similarities with temporary tables, they exhibit notable differences in terms of scope, performance, and application.

Limitations

  • Prohibition of DDL Operations: DDL commands such as ALTER or DROP are not permitted on table variables.
  • Restriction on Explicit Indexes: Only indexes that are established through constraints are permissible.
  • Inefficiency with Large Data Sets: Table variables are not as effective as temporary tables when managing large volumes of data.

Step 1. To create a table variable, the appropriate syntax involves using the DECLARE statement in conjunction with the TABLE. Syntax for Declaring a Table Variable.
DECLARE @TableName TABLE
(
    Column1 DataType PRIMARY KEY,  -- Primary key example
    Column2 DataType,
    Column3 DataType
);

Step 2. Defining and Utilizing a Table Variable.
CREATE PROCEDURE StudentDetailsTableVariable
AS
BEGIN
    DECLARE @StudentDetailsTable TABLE
    (
        StudentID INT PRIMARY KEY,            -- Primary key example
        StudentName NVARCHAR(100),
        Course NVARCHAR(100),
        Fees DECIMAL(18,2)
    );

    -- Insert data into the table variable
    INSERT INTO @StudentDetailsTable (StudentID, StudentName, Course, Fees)
    VALUES
        (1, 'AmIt', 'BSC', 60000.00),
        (2, 'Nagpal', 'BA', 80000.00),
        (3, 'Prashant', 'Data Science', 50000.00);

    -- Select data from the table variable
    SELECT * FROM @StudentDetailsTable;
END;

When to Utilize Table Variables?

  • Small Data Sets: Table variables are more effective for handling small amounts of data.
  • Short Lifespan: These variables are automatically removed upon the completion of the batch or procedure.
  • Stored Procedures: They function optimally within the context of a stored procedure or batch.

 Key Distinctions Between Table Variables and Temporary Tables

HostForLIFEASP.NET SQL Server 2022 Hosting

 



European SQL Server 2022 Hosting :: Exploring the New T-SQL Enhancements in SQL Server 2022

clock November 7, 2024 07:12 by author Peter

Microsoft SQL Server 2022 introduces powerful new T-SQL functions that enhance developer productivity and make data manipulation faster and more intuitive. These enhancements are designed to streamline complex queries and add new flexibility to SQL Server’s capabilities, making it a more versatile tool for modern data management.

1. IS [NOT] DISTINCT FROM Comparison

The IS [NOT] DISTINCT FROM feature simplifies null-safe comparisons between columns. By treating NULL values as comparable, it eliminates the need for complex ISNULL or COALESCE functions, making comparisons more intuitive.

2. DATE_BUCKET Function
The DATE_BUCKET function is a valuable addition for time series data analysis. It allows users to “bucket” data by specifying intervals, which is particularly useful for aggregating data over a fixed time span, such as minutes, hours, or days. This function is a game-changer for reporting and analytics on time-based data.

3. DATETRUNC Function

The DATETRUNC function truncates a datetime to a specified precision, such as day, month, or year, making it easier to group data at different time granularities. This simplification can reduce code complexity when working with datetime calculations.

4. LEAST and GREATEST Functions
SQL Server 2022 introduces LEAST and GREATEST functions, which return the smallest or largest value from a list of expressions. This new functionality allows for easier comparisons and is highly efficient for complex conditional logic.

5. STRING_SPLIT with Ordinal Option
The updated STRING_SPLIT function now includes an ordinal parameter, allowing users to retain the original sequence of split elements. This improvement is crucial when ordering and reconstructing data based on position.

6. Enhanced TRIM Function
SQL Server 2022 expands the TRIM function to allow multiple characters to be trimmed from a string, not just whitespace. This enhancement makes it more flexible for cleaning and formatting data in place.

7. GENERATE_SERIES Function
The GENERATE_SERIES function allows users to create a range of values in a single query, simplifying tasks like generating time series or producing sequences without needing complex loops or temp tables.

8. Windowing Function Enhancement
New windowing capabilities enhance functions like LAG and LEAD, making them more efficient and performant. These improvements offer more control and flexibility for analytic functions within partitions.

9. BIT Functions

SQL Server 2022 also introduces bitwise functions that simplify the manipulation of binary data. These include BIT_AND, BIT_OR, and BIT_XOR, which provide streamlined methods for bitwise calculations and are particularly useful in fields that require binary data manipulation.

Conclusion

These T-SQL enhancements reflect Microsoft’s focus on making SQL Server more powerful and developer-friendly. With each function, SQL Server users gain new tools for cleaner syntax, better performance, and easier data handling, enabling more efficient workflows and advanced analytics. If you’re looking to leverage the full capabilities of SQL Server 2022, these features are a must-know.

HostForLIFEASP.NET SQL Server 2022 Hosting

 



European SQL Server 2022 Hosting :: Making Subquery Workable

clock November 6, 2024 09:11 by author Peter

When working with SubQuery, the behavior or requirement might not be the save as ordinary query.

This query is working:

However, as a subquery it does not working any more:

The reason is the subquery needs an alias: when we added it, the original error is gone, but a new one is coming:

This error is due to the missing column name. That is good for ordinary query, but not for subquery:

adding column name, it is working now:


Adding "ORDER BY", this does not work, with errors:

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

As suggested, adding TOP it is working

HostForLIFEASP.NET SQL Server 2022 Hosting


 



About HostForLIFE

HostForLIFE is European Windows Hosting Provider which focuses on Windows Platform only. We deliver on-demand hosting solutions including Shared hosting, Reseller Hosting, Cloud Hosting, Dedicated Servers, and IT as a Service for companies of all sizes.

We have offered the latest Windows 2019 Hosting, ASP.NET 5 Hosting, ASP.NET MVC 6 Hosting and SQL 2019 Hosting.


Month List

Tag cloud

Sign in