Recently I was asked by one of our customers about querying XML in a relational format. “Sure”, I answered, “you can use OPENXML to retrieve a relational representation of the XML”.
For example, if the XML looks like this:
XHTML
<Customers> <Customer Id="1" Name="Mickey Mouse"> <Order Number="12345" Date="2010-01-17"/> <Order Number="23456" Date="2010-01-19"/> </Customer> <Customer Id="2" Name="Donald Duck"> <Order Number="34567" Date="2010-01-12"/> <Order Number="45678" Date="2010-01-21"/> <Order Number="56789" Date="2010-01-28"/> </Customer> <Customer Id="3" Name="Tom & Jerry"> <Order Number="67890" Date="2010-01-30"/> </Customer> </Customers>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<Customers>
<Customer Id="1" Name="Mickey Mouse">
<Order Number="12345" Date="2010-01-17"/>
<Order Number="23456" Date="2010-01-19"/>
</Customer>
<Customer Id="2" Name="Donald Duck">
<Order Number="34567" Date="2010-01-12"/>
<Order Number="45678" Date="2010-01-21"/>
<Order Number="56789" Date="2010-01-28"/>
</Customer>
<Customer Id="3" Name="Tom & Jerry">
<Order Number="67890" Date="2010-01-30"/>
</Customer>
</Customers>
Then the following query will return a relational format of the orders:
Transact-SQL
SELECT CustomerId , CustomerName , OrderNumber , OrderDate FROM OPENXML (@XMLHandle , N'/Customers/Customer/Order' , 1) WITH ( CustomerId INT N'../@Id' , CustomerName NVARCHAR(50) N'../@Name' , OrderNumber INT N'@Number' , OrderDate DATE N'@Date' );
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
CustomerId ,
CustomerName ,
OrderNumber ,
OrderDate
FROM
OPENXML (@XMLHandle , N'/Customers/Customer/Order' , 1)
WITH
(
CustomerId INT N'../@Id' ,
CustomerName NVARCHAR(50) N'../@Name' ,
OrderNumber INT N'@Number' ,
OrderDate DATE N'@Date'
);
“That’s fine”, he told me, “but it’s not good enough for me, because I don’t know the schema of the XML”.
This is a weird scenario, I thought. After a few more questions, I realized that this guy is trying to write a generic XML parser, so not knowing the XML schema is actually a requirement in his case. Now, why would someone want to write an XML parser in T-SQL? Good question, but let’s leave it for another post.
Assuming we do need to write an XML parser in T-SQL, we still need to find a way to query the XML without knowing its schema. Sure, we can build the OPENXML statement dynamically, but we don’t know the names of the elements and attributes, or the hierarchical relationships between them. At this point I had no idea how to solve the problem, so I started to google around. I discovered that if I use the OPENXML statement without the WITH clause, it returns a rowset in a format called “Edge elle Format”. Funny, I always used OPENXML with the WITH clause, and I have never even thought about the option to use it without the WITH clause.
So what is the edge table? This table describes every edge (hence, its name) in the XML in a separate row. An edge is every piece of data or metadata in the XML, such as: element, attribute, text, namespace declaration, etc. The edge table also describes the hierarchical structure of the XML by relating every edge to its parent edge. So, if we execute the following query for the XML shown above:
Transact-SQL
SELECT id , parentid , nodetype , localname , prefix , namespaceuri , datatype , prev , text FROM OPENXML (@intXMLHandle , N'' , 1) ORDER BY id ASC;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
id ,
parentid ,
nodetype ,
localname ,
prefix ,
namespaceuri ,
datatype ,
prev ,
text
FROM
OPENXML (@intXMLHandle , N'' , 1)
ORDER BY
id ASC;
We will get the following result:
id
parentid
nodetype
localname
prefix
namespaceuri
datatype
prev
text0NULL1CustomersNULLNULLNULLNULLNULL201CustomerNULLNULLNULLNULLNULL322IdNULLNULLNULLNULLNULL422NameNULLNULLNULLNULLNULL521OrderNULLNULLNULLNULLNULL652NumberNULLNULLNULLNULLNULL752DateNULLNULLNULLNULLNULL821OrderNULLNULLNULL5NULL982NumberNULLNULLNULLNULLNULL1082DateNULLNULLNULLNULLNULL1101CustomerNULLNULLNULL2NULL12112IdNULLNULLNULLNULLNULL13112NameNULLNULLNULLNULLNULL14111OrderNULLNULLNULLNULLNULL15142NumberNULLNULLNULLNULLNULL16142DateNULLNULLNULLNULLNULL17111OrderNULLNULLNULL14NULL18172NumberNULLNULLNULLNULLNULL19172DateNULLNULLNULLNULLNULL20111OrderNULLNULLNULL17NULL21202NumberNULLNULLNULLNULLNULL22202DateNULLNULLNULLNULLNULL2301CustomerNULLNULLNULL11NULL24232IdNULLNULLNULLNULLNULL25232NameNULLNULLNULLNULLNULL26231OrderNULLNULLNULLNULLNULL27262NumberNULLNULLNULLNULLNULL28262DateNULLNULLNULLNULLNULL2933#textNULLNULLNULLNULL13043#textNULLNULLNULLNULLMickey Mouse3163#textNULLNULLNULLNULL123453273#textNULLNULLNULLNULL2010-01-173393#textNULLNULLNULLNULL2345634103#textNULLNULLNULLNULL2010-01-1935123#textNULLNULLNULLNULL236133#textNULLNULLNULLNULLDonald Duck37153#textNULLNULLNULLNULL3456738163#textNULLNULLNULLNULL2010-01-1239183#textNULLNULLNULLNULL4567840193#textNULLNULLNULLNULL2010-01-2141213#textNULLNULLNULLNULL5678942223#textNULLNULLNULLNULL2010-01-2843243#textNULLNULLNULLNULL344253#textNULLNULLNULLNULLTom and Jerry45273#textNULLNULLNULLNULL6789046283#textNULLNULLNULLNULL2010-01-30
Clearly, we can see that all the information one needs to know about the schema as well as the contents of the XML is in this table. For example, we can see that there are two orders (5 and 8 ) under the first customer (2). We know that “Order” is an element, because nodetype = 1. We can also see that there are two attributes (nodetype = 2) under each order: “Number” and “Date”. The value of the first order number (6) is given in row #31 (nodetype = 3). And so on…
For more information about the edge table format, view the section “Using the Edge Table Format” under the topic “Querying XML Using OPENXML” in Books Online.
Next time you need to query an XML without knowing its schema, remember that you can use the OPENXML statement without the WITH clause.
Comments