Sunday, March 11, 2012

convert subqueries to join

Hi all,
Recently saw some posts about how bad it is using subqueries (nested queries) and things like all the queries can be written with JOIN instead. However, for a query like the followings, I cannot figure out how to do it properly.

The data looks like the following, it contains 3 columns, id, which is the primary key of the table (unique), group, as which group this entry belongs to, and the article type, where 0 is not an article.

id grp article
1 1 0
2 1 0
3 1 0
4 1 3
5 1 2
6 1 0
7 2 0
8 2 3
9 2 1
10 2 0
11 2 0
12 2 0
13 3 5
14 3 2
15 3 0
16 3 0
17 3 1
18 3 1

What the query is trying to achieve is for each group, get the latest (maximum) id, and the article type (bear in mind "0" is not one of the allowed type). so for data above, the result will look something like this...

id grp article
5 1 2
9 2 1
18 3 1

The following is the query I come up with, the problem is, I don't know how to get the same result without using subquery. I don't even see how it is possible. Guys, please share some light with me, and educate me on this.

select a.id, a.grp, b.article
from
(
SELECT max(id) as id, grp
from list a
where article != 0
group by grp
) a
join list b
on a.id = b.id

thanks in advance!!!

Here it is,

Code Snippet

Create Table #data (

[id] int ,

[grp] int ,

[article] int

);

Insert Into #data Values('1','1','0');

Insert Into #data Values('2','1','0');

Insert Into #data Values('3','1','0');

Insert Into #data Values('4','1','3');

Insert Into #data Values('5','1','2');

Insert Into #data Values('6','1','0');

Insert Into #data Values('7','2','0');

Insert Into #data Values('8','2','3');

Insert Into #data Values('9','2','1');

Insert Into #data Values('10','2','0');

Insert Into #data Values('11','2','0');

Insert Into #data Values('12','2','0');

Insert Into #data Values('13','3','5');

Insert Into #data Values('14','3','2');

Insert Into #data Values('15','3','0');

Insert Into #data Values('16','3','0');

Insert Into #data Values('17','3','1');

Insert Into #data Values('18','3','1');

Code Snippet

Select data.* from #data data

inner join (Select Max(id) id, grp from #data group By grp) maxid

on maxid.id=data.id

On SQL Server 2005,

Code Snippet

;With CTE

as

(

Select data.*,Max(id) Over(partition by grp) maxid from #data data

)

Select id,grp,article from cte where id=maxid

|||Hi Manivannan,

thank you for the quick reply, one problem that I saw was you didn't check that article cannot be 0, but anyway, it doesn't matter.

I got a few questions tho. the first query that you posted is actually very similiar to the one I posted. You still use a nested query. the inter query identifies the max id of the entries grouped by the grp, then, join to the tabe (#data) to get the article#. The only difference I saw was that you put the inner query after the join statement. so instead of select * from a join b on a.id = b.id, you did something like select * from b join a on a.id = b.id

the second query, which is MSSQL 2005 specific, which to me, is still using a nested query. In there, you first specify CTE, then you do a select statement on top of it, which still look like a nested query to me.

Is it possible to use a straightforward query to achieve what I wanted to achieve? i.e. do just normal join with data, etc. because i reallly don't see how it can be done.

thanks so much about the answer tho, appreciated!
|||

Hi Ken,

For your requirement we have to use the subquery. Without subquery we can't achive this result.

In SQL server 2005, CTE is best option to use..

-Mani

|||

Do not get confused with "derived table" and "correlated subquery". Try to avoid the second one, if possible. Always test before deciding for the final approach.

The first one, the one you are using in your post, is executed once. The second one seems to be executed for every row in the outer reference, but it is better to verify the execution plan.

Examples:

select a.*

from list as a

where a.[id] = (select max([b.id]) from list as b where b.grp = a.grp)

go

select a.*

from list as a

where not exists (

select *

from list as b

where b.grp = a.grp and b.[id] > a.[id]

)

go

AMB

|||so, is

select a.*

from list as a

where a.[id] = (select max([b.id]) from list as b where b.grp = a.grp)

a correlated subquery?
because it looks to me for every row retrieved from list, this "(select max([b.id]) from list as b where b.grp = a.grp)" statement will be executed.

So I should execute the query the way I did (for SQL2000) and use CTE for SQL 2005? by the way, I tried the query in SQL 2005, and the execution time is reduced quite a lot. What made it so different to the subqueries I have?

somehow, the text looks different in the edit mode and browse mode >"<

|||

Can you change the font of your post, please?

I can not read it at all.

Thanks,

AMB

No comments:

Post a Comment