Tuesday, January 13, 2009

Joining data and differences of using UNION and UNION ALL in SQL Server

Problem
Sometimes there is a need to combine data from multiple tables or views into one comprehensive dataset. This may be for like tables within the same database or maybe there is a need to combine like data across databases or even across servers. I have read about the UNION and UNION ALL commands, but how do these work and how do they differ?

Solution
In SQL Server you have the ability to combine multiple datasets into one comprehensive dataset by using the UNION or UNION ALL operators. There is a big difference in how these work as well as the final result set that is returned, but basically these commands join multiple datasets that have similar structures into one combined dataset.

Here is a brief description:

  • UNION - this command will allow you to join multiple datasets into one dataset and will remove any duplicates that exist. Basically it is performing a DISTINCT operation across all columns in the result set.
  • UNION ALL - this command again allows you to join multiple datasets into one dataset, but it does not remove any duplicate rows. Because this does not remove duplicate rows this process is faster, but if you don't want duplicate records you will need to use the UNION operator instead.

Rules to union data:

  • Each query must have the same number of columns
  • Each column must have compatible data types
  • Column names for the final result set are taken from the first query
  • ORDER BY and COMPUTE clauses can only be issued for the overall result set and not within each individual result set
  • GROUP BY and HAVING clauses can only be issued for each individual result set and not for the overall result set
Tip

If you don't have the exact same columns in all queries use a default value or a NULL value such as:

SELECT firstName, lastName, company FROM businessContacts
UNION ALL
SELECT firstName, lastName, NULL FROM nonBusinessContacts

or

SELECT firstName, lastName, createDate FROM businessContacts
UNION ALL
SELECT firstName, lastName, getdate() FROM nonBusinessContacts

Examples:

Let's take a look at a few simple examples of how these commands work and how they differ. As you will see the final resultsets will differ, but there is some interesting info on how SQL Server actually completes the process.

In this first example we are using the UNION ALL operator against the Employee table from the AdventureWorks database. This is probably not something you would do, but this helps illustrate the differences of these two operators.

There are 290 rows in table dbo.Employee.

SELECT * FROM dbo.Employee
UNION ALL
SELECT * FROM dbo.Employee
UNION ALL
SELECT * FROM dbo.Employee

When this query is run the result set has 870 rows. This is the 290 rows returned 3 times. The data is just put together one dataset on top of the other dataset.

Here is the execution plan for this query. We can see that the table was queried 3 times and SQL Server did a Concatenation step to concatenate all of the data.


In this next example we are using the UNION operator against the Employee table again from the AdventureWorks database.

SELECT * FROM dbo.Employee
UNION
SELECT * FROM dbo.Employee
UNION
SELECT * FROM dbo.Employee

When this query is run the result set has 290 rows. Even though we combined the data three times the UNION operator removed the duplicate records and therefore returns just the 290 unique rows.

Here is the execution plan for this query. We can see that SQL Server first queried 2 of the tables, then did a Merge Join operation to combine the first two tables and then it did another Merge Join along with querying the third table in the query. So we can see there was much more worked that had to be performed to get this result set compared to the UNION ALL.


If we take this a step further and do a SORT of the data using the Clustered Index column we get these execution plans. From this we can see that the execution plan that SQL Server is using is identical for each of these operations even though the final result sets will still contain 870 rows for the UNION ALL and 290 rows for the UNION.

UNION ALL query

UNION query


Here is another example doing the same thing, but this time doing a SORT on a non indexed column. As you can see the execution plans are again identical for these two queries, but this time instead of using a MERGE JOIN, a CONCATENATION and SORT operations are used.

UNION ALL query

UNION query

Fungsi-fungsi String Pada SQL Server 2005

Beberapa fungsi string pada SQL Server adalah sebagai berikut:

ASCII,
Digunakan untuk mengembalikan kode ASCII dari ekspresi yang kita masukkan pada parameter fungsi ini.
Contohnya:

select ascii('A')

Hasilnya:

-----------
65

(1 row(s) affected)


CHAR,
Digunakan untuk konversi suatu ekspresi integer yang berupa kode ASCII menjadi sebuah karakter.
Contohnya:

select char(65)

Hasilnya:

----
A

(1 row(s) affected)


CHARINDEX,
Digunakan untuk menampilkan posisi awal dari suatu string terhadap string lainnya.
Contohnya:

select charindex('AS','PASCAL')

Hasilnya:

-----------
2

(1 row(s) affected)


DIFFERENCE,
Digunakan untuk menampilkan nilai perbedaan antara dua buah nilai ekspresi SOUNDEX. Nilai perbedaan disini berkisar antara 0 sampai 4. Angka 4 menyatakan bahwa dua buah ekspresi SOUNDEX tadi sangat identik. Mengenai deskripsi fungsi SOUNDEX silakan Anda baca pada bagian bawah.
Contohnya:

select soundex('army'), soundex('armee'), difference('army','armee')

Hasilnya:

----- ----- -----------
A650 A650 4

(1 row(s) affected)


LEFT,
Digunakan untuk menampilkan suatu ekspresi string dari sebelah kiri sebanyak N karakter.
Contohnya:

select left('ABCD',2)

Hasilnya:

----
AB

(1 row(s) affected)


LEN,
Digunakan untuk menampilkan panjang suatu ekspresi string.
Contohnya:

select len('SONY AK')

Hasilnya:

-----------
7

(1 row(s) affected)


LOWER,
Digunakan untuk membuat ekspresi string menjadi huruf kecil semua.
Contohnya:

select lower('Hebat Sekali')

Hasilnya:

------------------------
hebat sekali

(1 row(s) affected)


LTRIM,
Digunakan untuk menghilangkan semua karakter blank (spasi) pada awal suatu ekspresi string.
Contohnya:

select ltrim(' Tiga spasi di depan')

Hasilnya:

----------------------
Tiga spasi di depan

(1 row(s) affected)


NCHAR,
Digunakan untuk menampilkan suatu karakter Unicode dari suatu nilai integer yang diberikan.
Contohnya:

select nchar(251)

Hasilnya:

----
û

(1 row(s) affected)


PATINDEX,
Digunakan untuk mengetahui posisi awal dari suatu pattern string pada kesempatan pertama. PATINDEX akan memberikan nilai 0 jika pattern tidak ditemukan.
Contohnya:

SELECT PATINDEX('%band%', 'ada band')

Hasilnya:

-----------
5

(1 row(s) affected)


REPLACE,
Digunakan untuk mengganti string yang diberikan pada parameter kedua dengan string pada parameter ketiga pada string pada parameter pertama. Bingung ya? Lihat saja contohnya di bawah ini.
Contohnya:

select replace('sony arianto kurniawan','an','??')

Hasilnya:

-----------------------
sony ari??to kurniaw??

(1 row(s) affected)


QUOTENAME,
Digunakan untuk mendapatkan string dengan demiliter sesuai keinginan kita dan valid menurut SQL Server. Delimiter disini bisa berupa single quotation mark ('), kurung siku ([]) atau double quotation mark ("). Jika parameter ini tidak disertakan maka secara otomatis akan menggunakan tanda kurung siku.
Contohnya:

SELECT QUOTENAME('sony arianto','''')

Hasilnya:

---------------
'sony arianto'

(1 row(s) affected)


Contoh 2:

SELECT QUOTENAME('sony arianto')

Hasilnya:

---------------
[sony arianto]

(1 row(s) affected)


REPLICATE,
Digunakan untuk mengulang sebuah ekspresi karakter sebanyak beberapa kali yang Anda inginkan.
Contohnya:

select replicate('sony.com ',3)

Hasilnya:

---------------------------
sony.com sony.com sony.com

(1 row(s) affected)


REVERSE,
Digunakan untuk membalik ekspresi string yang diberikan.
Contohnya:

select reverse('kasur rusak 2')

Hasilnya:

-------------
2 kasur rusak

(1 row(s) affected)


RIGHT,
Digunakan untuk mengambil string sebanyak n-buah diambil dari sebelah kanan.
Contohnya:

select right('web development',4)

Hasilnya:

----
ment

(1 row(s) affected)


RTRIM,
Digunakan untuk mendapatkan string serta menghapus semua blank yang ada di belakang string tersebut.
Contohnya:

select rtrim('good boy ')

Hasilnya:

----------
good boy

(1 row(s) affected)


SOUNDEX,
Digunakan untuk mendapatkan empat karakter kode SOUNDEX untuk mengevaluasi kesamaan terhadap dua buah string.
Contohnya:

select soundex('cold'), soundex('colt')

Hasilnya:

----- -----
C430 C430

(1 row(s) affected)


SPACE,
Digunakan untuk menghasilkan string yang terdiri dari sejumlah karakter spasi yang diulang sebanyak n-kali.
Contohnya:

select 'Sony'+space(4)+'AK'

Hasilnya:

----------
Sony AK

(1 row(s) affected)


STR,
Digunakan untuk mengkonversikan data numerik kedalam bentuk string.
Contohnya:

select str(65.73)

Hasilnya:

-----
66

(1 row(s) affected)


STUFF,
Digunakan untuk menghapus suatu substring dari suatu string dan meng-insert suatu substring lainnya pada suatu posisi tertentu.
Contohnya:

select stuff('web depment',7,0,'velo')

Hasilnya:

---------------
web development

(1 row(s) affected)


SUBSTRING,
Digunakan untuk mendapatkan suatu substring dari suatu string, bisa dimulai dari posisi tertentu dan sebanyak n-buah karakter.
Contohnya:

select substring('sony-ak.com',5,1)

Hasilnya:

----
-

(1 row(s) affected)


UNICODE,
Digunakan untuk mendapatkan nilai integer dari suatu string Unicode. Ini merupakan kebalikan dari NCHAR.
Contohnya:

select unicode('û')

Hasilnya:

-----------
251

(1 row(s) affected)


UPPER,
Digunakan untuk merubah suatu string menjadi hurufnya besar semua.
Contohnya:

select upper('ini lower')

Hasilnya:

---------
INI LOWER

(1 row(s) affected)

Date and Time Manipulation in SQL Server 2005

SQL Server 2000 does not have separate data types for date and time. Instead the Microsoft SQL Server Team chose to combine both the data types into one and store it as a datetime data type. Date and time can be stored in SQL Server in datetime or smalldatetime. The datetime data type can store dates from January 1, 1753 to December 31, 9999 to an accuracy of up to 0.003 part of a second. The smalldatetime data type can store data from January 1, 1900 to June 6, 2079 with accuracy of up to the minute.

SQL Server takes into account a system reference date, which is called the base date for SQL Server. This base date is January 1st, 1900. It is from here that the main problem stems. SQL Server stores the datetime data type internally as two 4 byte integers and smalldatetime as two 2 byte integers. The first integer in both the cases stores the number of day difference from the base date. The second integer part stores the number of milliseconds/minutes since midnight.

Date and Time Data Entry

When only the time part is provided as input, the base date is appended to the time. If only the date part is provided the time appended is as of midnight. Some example code to observe the same is as follows:

use pubs

go

---------------- Inserting only the time part into a datetime column --------------
/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn datetime
)
go

/* Inserting the test value into the table */
insert into MyDateTest99 values ('10:00 AM')
go

/* Selecting the result */
select DateColumn from MyDateTest99
go

/* Performing Cleanup */
drop table MyDateTest99
go

---------------- Inserting only the date part into a datetime column --------------
use pubs

go

/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn datetime
)
go

/* Inserting the test value into the table */
insert into MyDateTest99 values ('January 1, 2000')
go

/* Selecting the result */
select DateColumn from MyDateTest99
go

/* Performing Cleanup */
drop table MyDateTest99
go

So, the most common question that is asked is:

Q: How do I get SQL Server to return only the Date component or only the Time component from the datetime data type?
A: By using the Convert function. The syntax for using the convert function is:

CONVERT ( data_type [ ( length ) ] , expression [ , style ] )

By varying the datatype and length, we can get the desired component. Moreover, the style argument in the Convert function is provided exclusively for use with date and time data. Some sample code illustrating the same is as follows:

use pubs

go

---------------- Selecting only the date part from a datetime column --------------
/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn datetime
)
go

/* Inserting the test value into the table */
insert into MyDateTest99 values (getdate())
go

/* Selecting the result */
select convert(varchar,DateColumn,101) from MyDateTest99
go

/* Performing Cleanup */
drop table MyDateTest99
go

use pubs

go

---------------- Selecting only the date part from a datetime column --------------
/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn datetime
)
go

/* Inserting the test value into the table */
insert into MyDateTest99 values (getdate())
go

/* Selecting the result */
select convert(varchar,DateColumn,108) from MyDateTest99
go

/* Performing Cleanup */
drop table MyDateTest99
go

The list of styles that can be used are:

Style ID

Style Type

0 or 100 mon dd yyyy hh:miAM (or PM)
101 mm/dd/yy
102 yy.mm.dd
103 dd/mm/yy
104 dd.mm.yy
105 dd-mm-yy
106 dd mon yy
107 Mon dd, yy
108 hh:mm:ss
9 or 109 mon dd yyyy hh:mi:ss:mmmAM (or PM)
110 mm-dd-yy
111 yy/mm/dd
112 yymmdd
13 or 113 dd mon yyyy hh:mm:ss:mmm(24h)
114 hh:mi:ss:mmm(24h)
20 or 120 yyyy-mm-dd hh:mi:ss(24h)
21 or 121 yyyy-mm-dd hh:mi:ss.mmm(24h)
126 yyyy-mm-dd Thh:mm:ss.mmm(no spaces)
130 dd mon yyyy hh:mi:ss:mmmAM
131 dd/mm/yy hh:mi:ss:mmmAM

These styles are the format of input to be used when converting character data into datetime and format of output while converting datetime data into characters:

use pubs

go

---------------- Example for the demonstration of use of style while input of data--------------
/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn datetime
)
go


/* Inserting the test values into the table */
-- Inserting in US format
insert into MyDateTest99 select convert(datetime,'05/08/2004',101)
-- Inserting in UK format
insert into MyDateTest99 select convert(datetime,'08/05/2004',103)
-- Inserting in ISO Format
insert into MyDateTest99 select convert(datetime,'20040508',112)
go

/* Selecting the result */
select DateColumn from MyDateTest99
go

/* Performing Cleanup */
drop table MyDateTest99
go

use pubs

go

---------------- Example for the demonstration of use of style while output of data--------------

/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn datetime
)
go
/* Inserting the test values into the table */
insert into MyDateTest99 select convert(datetime,'05/08/2004',101)

go

/* Selecting the result */
-- In US Format
select convert(varchar,DateColumn,101) from MyDateTest99
-- In UK Format
select convert(varchar,DateColumn,103) from MyDateTest99
-- In ISO Format
select convert(varchar,DateColumn,112) from MyDateTest99

go
/* Performing Cleanup */
drop table MyDateTest99
go

Some other functions that can be used for various purposes are DATEADD, DATEDIFF, DATENAME, DATEPART, DAY, GETDATE, MONTH, and YEAR. Here's some further detail on these functions as well as a code sample showing their use:

Dateadd: Returns a new datetime value based on adding an interval to the specified date.

Syntax: DATEADD ( datepart, number, date )

Datediff: Returns the number of date and time boundaries crossed between two specified dates.

Syntax: DATEDIFF ( datepart, startdate, enddate )

Datename: Returns a character string representing the specified datepart of the specified date.

Syntax: DATENAME ( datepart, date )

Datepart: Returns an integer representing the specified datepart of the specified date.

Syntax: DATEPART ( datepart, date )

Day: Returns an integer representing the day datepart of the specified date.

Syntax: DAY ( date )

Getdate: Returns the current system date and time in the Microsoft® SQL Server™ standard internal format for datetime values.

Syntax: GETDATE ( )

Month: Returns an integer that represents the month part of a specified date.

Syntax: MONTH ( date )

Year: Returns an integer that represents the year part of a specified date.

Syntax: YEAR ( date )

declare @datevar datetime
select @datevar = getdate()

/*Example for getdate() : getting current datetime*/
select getdate() [Current Datetime]

/*Example for dateadd : getting date 7 days from current datetime*/
select dateadd(dd, 7, @datevar) [Date 7 days from now]

/*Example for datediff : getting no of days passed since 01-01-2004*/
select datediff(dd,'20040101',@datevar) [No of days since 01-01-2004]

/*Example for datename : getting month name*/
select datename(mm, @datevar) [Month Name]

/*Example for datepart : getting week from date*/
select datepart(wk, @datevar ) [Week No]

/*Example for day : getting day part of date*/
select day (@datevar) [Day]

/*Example for month : getting month part of date*/
select month(@datevar) [Month]

/*Example for year : getting year part of date*/
select year(@datevar) [Year]

Now I will provide you with some code samples which you can use for various tasks. I will try to include as many examples I can think of, but this list is not exhaustive:

1. To find the first day of a month:

select dateadd(dd,-(day(DateColumn)-1),DateColumn)

2. To find last day of a month:

select dateadd(dd,-(day(dateadd(mm,1,DateColumn))),dateadd(mm,1,DateColumn))

3. To find birthdays in next seven days:

use pubs

go

/* Creating a Test Table */
Create Table MyDateTest99
(
Birthday datetime
)
go
/* Inserting the test value into the table */
insert into MyDateTest99 select convert (varchar(10),'19780129',120)
insert into MyDateTest99 select convert (varchar(10),'19670821',120)
insert into MyDateTest99 select convert (varchar(10),'19910112',120)
insert into MyDateTest99 select convert (varchar(10),dateadd(dd,2,getdate()),120)
insert into MyDateTest99 select convert (varchar(10),'19791016',120)


go
/* Selecting the result */
select
Birthday
from
MyDateTest99
where
datediff
(
dd
,convert(datetime,'1900/'+cast(month(getdate()) as varchar)+'/'+cast (day(getdate()) as varchar),111)
,convert(datetime,'1900/'+cast(month(Birthday) as varchar)+'/'+cast (day(Birthday) as varchar),111)
) between 0 and 7
go
/* Performing Cleanup */
drop table MyDateTest99
go

4. Number of hours until weekend, that is until Friday at 5 PM (my favorite):

use pubs

go

Create function udf_Time_to_Weekend (@d1 datetime) returns datetime
as
begin
declare @d2 datetime
select @d2 = case when (datepart(hh,dateadd(dd,(7-datepart(dw,@d1)),@d1)) >= 17 and 7-datepart(dw,@d1) = 0)
then dateadd(hh,17,convert(varchar(10),dateadd(dd,7,@d1),101))
else dateadd(hh,17,convert(varchar(10),dateadd(dd,(7-datepart(dw,@d1)),@d1),101))
end
return @D2
END
go
Create procedure HoursTillWeekend as
set datefirst 6
select DATEDIFF(MI,GETDATE(),dbo.udf_Time_to_Weekend(getdate()))/60 "Hours Till Weekend"
go
exec HoursTillWeekend
go
drop procedure HoursTillWeekend
go
drop function udf_Time_to_Weekend
go

5. First and last days of quarter, in which a date falls:

use pubs
go
/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn datetime
)
go
/* Inserting the test value into the table */
insert into MyDateTest99 select convert (varchar(10),'19780129',120)
insert into MyDateTest99 select convert (varchar(10),'19670821',120)
insert into MyDateTest99 select convert (varchar(10),'19910112',120)
insert into MyDateTest99 select convert (varchar(10),'19791016',120)
go
/* Selecting the result */
select
datepart(qq,DateColumn) QuarterNo
,dateadd(qq,datepart(qq,DateColumn),dateadd(dd,-(datepart(dy,DateColumn)-1),DateColumn)) FirstDayOfQuarter
,dateadd(qq,datepart(qq,DateColumn)+1,dateadd(dd,-(datepart(dy,DateColumn)),DateColumn)) LastDayOfQuarter
from
MyDateTest99
go
/* Performing Cleanup */
drop table MyDateTest99
go

6. Number of days in a month:

Create Function
udf_getNoOfDaysInMonth
(
@month int
,@year int
)
returns
int
as
begin
return datepart( dd,dateadd(dd,-1,(dateadd(mm,@month,dateadd( yyyy,@year-1900,'19000101')))))
end

go

select dbo.udf_getNoOfDaysInMonth(2,2004)

go

A very common question asked in forums is regarding the change from a character column to a datetime column. The error encountered by developers is :

The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.

This is common because the varchar column does not provide any validations against the data and as a result, some invalid entries creep in. So, while converting to datetime, SQL Server is not able to change the character data to datetime and throws up an error. The easiest way to identify the rows that are causing problems and contain invalid datetime data is by using the isdate() function:

/* Example to show how to find invalid records */

use pubs
go
/* Creating a Test Table */
Create Table MyDateTest99
(
DateColumn varchar(8)
)
go
/* Inserting the test value into the table */
insert into MyDateTest99 select '19780129'
insert into MyDateTest99 select '19670229'
insert into MyDateTest99 select '19910112'
insert into MyDateTest99 select '19791016'
go
/* Selecting the result */
select
DateColumn
from
MyDateTest99
where
isdate(DateColumn) = 0
go
/* Performing Cleanup */
drop table MyDateTest99
go

Another common mistake made by developers is that while searching for all records on a particular day a where clause is used like "where logdate = @logdate", when they are passing @logdate as '01/01/2004'. '01/01/2004' really means '01/01/2004 00:00:00.000' and will not return data for the complete day. The problem can be solved by using the between clause. The where clause for such a query should be "where logdate between @logdate and @logdate2", where @logdate2 is @logdate + 1. The between clause can make use of an index if it exists, where using a convert function like "where convert(varchar,logdate,101) = @logdate" would not and it would slow down the query.