Querying XML Using the Edge Table Format
top of page

Querying XML Using the Edge Table Format

Recently I was asked by one of our customers about querying XML in a relational format. “Sure”, I answered, “you can use OPENXML to retrieve a relational representation of the XML”.

For example, if the XML looks like this:

XHTML

<Customers>   <Customer Id="1" Name="Mickey Mouse">     <Order Number="12345" Date="2010-01-17"/>     <Order Number="23456" Date="2010-01-19"/>   </Customer>   <Customer Id="2" Name="Donald Duck">     <Order Number="34567" Date="2010-01-12"/>     <Order Number="45678" Date="2010-01-21"/>     <Order Number="56789" Date="2010-01-28"/>   </Customer>   <Customer Id="3" Name="Tom & Jerry">     <Order Number="67890" Date="2010-01-30"/>   </Customer> </Customers>

1

2

3

4

5

6

7

8

9

10

11

12

13

14

<Customers>

<Customer Id="1" Name="Mickey Mouse">

<Order Number="12345" Date="2010-01-17"/>

<Order Number="23456" Date="2010-01-19"/>

</Customer>

<Customer Id="2" Name="Donald Duck">

<Order Number="34567" Date="2010-01-12"/>

<Order Number="45678" Date="2010-01-21"/>

<Order Number="56789" Date="2010-01-28"/>

</Customer>

<Customer Id="3" Name="Tom & Jerry">

<Order Number="67890" Date="2010-01-30"/>

</Customer>

</Customers>

Then the following query will return a relational format of the orders:

Transact-SQL

SELECT CustomerId , CustomerName , OrderNumber , OrderDate FROM OPENXML (@XMLHandle , N'/Customers/Customer/Order' , 1) WITH ( CustomerId INT N'../@Id' , CustomerName NVARCHAR(50) N'../@Name' , OrderNumber INT N'@Number' , OrderDate DATE N'@Date' );

1

2

3

4

5

6

7

8

9

10

11

12

13

14

SELECT

CustomerId ,

CustomerName ,

OrderNumber ,

OrderDate

FROM

OPENXML (@XMLHandle , N'/Customers/Customer/Order' , 1)

WITH

(

CustomerId INT N'../@Id' ,

CustomerName NVARCHAR(50) N'../@Name' ,

OrderNumber INT N'@Number' ,

OrderDate DATE N'@Date'

);

“That’s fine”, he told me, “but it’s not good enough for me, because I don’t know the schema of the XML”.

This is a weird scenario, I thought. After a few more questions, I realized that this guy is trying to write a generic XML parser, so not knowing the XML schema is actually a requirement in his case. Now, why would someone want to write an XML parser in T-SQL? Good question, but let’s leave it for another post.

Assuming we do need to write an XML parser in T-SQL, we still need to find a way to query the XML without knowing its schema. Sure, we can build the OPENXML statement dynamically, but we don’t know the names of the elements and attributes, or the hierarchical relationships between them. At this point I had no idea how to solve the problem, so I started to google around. I discovered that if I use the OPENXML statement without the WITH clause, it returns a rowset in a format called “Edge elle Format”. Funny, I always used OPENXML with the WITH clause, and I have never even thought about the option to use it without the WITH clause.

So what is the edge table? This table describes every edge (hence, its name) in the XML in a separate row. An edge is every piece of data or metadata in the XML, such as: element, attribute, text, namespace declaration, etc. The edge table also describes the hierarchical structure of the XML by relating every edge to its parent edge. So, if we execute the following query for the XML shown above:

Transact-SQL

SELECT id , parentid , nodetype , localname , prefix , namespaceuri , datatype , prev , text FROM OPENXML (@intXMLHandle , N'' , 1) ORDER BY id ASC;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

SELECT

id ,

parentid ,

nodetype ,

localname ,

prefix ,

namespaceuri ,

datatype ,

prev ,

text

FROM

OPENXML (@intXMLHandle , N'' , 1)

ORDER BY

id ASC;

We will get the following result:

id

parentid

nodetype

localname

prefix

namespaceuri

datatype

prev

text0NULL1CustomersNULLNULLNULLNULLNULL201CustomerNULLNULLNULLNULLNULL322IdNULLNULLNULLNULLNULL422NameNULLNULLNULLNULLNULL521OrderNULLNULLNULLNULLNULL652NumberNULLNULLNULLNULLNULL752DateNULLNULLNULLNULLNULL821OrderNULLNULLNULL5NULL982NumberNULLNULLNULLNULLNULL1082DateNULLNULLNULLNULLNULL1101CustomerNULLNULLNULL2NULL12112IdNULLNULLNULLNULLNULL13112NameNULLNULLNULLNULLNULL14111OrderNULLNULLNULLNULLNULL15142NumberNULLNULLNULLNULLNULL16142DateNULLNULLNULLNULLNULL17111OrderNULLNULLNULL14NULL18172NumberNULLNULLNULLNULLNULL19172DateNULLNULLNULLNULLNULL20111OrderNULLNULLNULL17NULL21202NumberNULLNULLNULLNULLNULL22202DateNULLNULLNULLNULLNULL2301CustomerNULLNULLNULL11NULL24232IdNULLNULLNULLNULLNULL25232NameNULLNULLNULLNULLNULL26231OrderNULLNULLNULLNULLNULL27262NumberNULLNULLNULLNULLNULL28262DateNULLNULLNULLNULLNULL2933#textNULLNULLNULLNULL13043#textNULLNULLNULLNULLMickey Mouse3163#textNULLNULLNULLNULL123453273#textNULLNULLNULLNULL2010-01-173393#textNULLNULLNULLNULL2345634103#textNULLNULLNULLNULL2010-01-1935123#textNULLNULLNULLNULL236133#textNULLNULLNULLNULLDonald Duck37153#textNULLNULLNULLNULL3456738163#textNULLNULLNULLNULL2010-01-1239183#textNULLNULLNULLNULL4567840193#textNULLNULLNULLNULL2010-01-2141213#textNULLNULLNULLNULL5678942223#textNULLNULLNULLNULL2010-01-2843243#textNULLNULLNULLNULL344253#textNULLNULLNULLNULLTom and Jerry45273#textNULLNULLNULLNULL6789046283#textNULLNULLNULLNULL2010-01-30

Clearly, we can see that all the information one needs to know about the schema as well as the contents of the XML is in this table. For example, we can see that there are two orders (5 and 8 ) under the first customer (2). We know that “Order” is an element, because nodetype = 1. We can also see that there are two attributes (nodetype = 2) under each order: “Number” and “Date”. The value of the first order number (6) is given in row #31 (nodetype = 3). And so on…

For more information about the edge table format, view the section “Using the Edge Table Format” under the topic “Querying XML Using OPENXML” in Books Online.

Next time you need to query an XML without knowing its schema, remember that you can use the OPENXML statement without the WITH clause.

0 comments

STAY IN TOUCH

Get New posts delivered straight to your inbox

Thank you for subscribing!

bottom of page