Converting XML documents to Relational Tables using OPENXML
Posted by Sourabh Agarwal on February 22, 2012
In earlier posts, I discussed how to generate XML output from relational tables using the FOR XML Queries. Today we will discuss how to generate relational table like output from an XML document using the OPENXML query format.
For a complete list of related topics, please refer here.
To generate relational like output, we need to first load the XML document and then parse it. Also we need to make sure that once we have got out desired results, we remove or unload the XML document.Thankfully SQL Server provides for 2 Stored Procedures which will do the work for us.
sp_xml_preparedocument: Reads the XML input and parses it using MSXML parser. After parsing, it creates an internal tree presentation of the nodes in the XML input. It returns a unique handle to the XML document, which can then be used by OPENXML to generate a table output and sp_xml_remove_document to remove the document.
sp_xml_removedocument: Removes the internal representation of the XML document. MSXML parsers can use a lot of SQL Server Address Space, and if the documents are removed post processing, it might lead to memory issues on SQL Server.
DECLARE @idoc int
DECLARE @doc varchar(1000)
SET @doc ='<ROOT>
<Customer CustomerID="10" ContactName="Paul Henriot">
<Order CustomerID="10" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
<OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
<OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
<Customer CustomerID="20" ContactName="Carlos Gonzlez">
<Order CustomerID="20" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
<OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
–Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
– Execute a SELECT statement that uses the OPENXML rowset provider.
FROM OPENXML (@idoc, ‘/ROOT/Customer’,1)
WITH (CustomerID varchar(10),
— Remove the document
EXEC sp_xml_removedocument @idoc
Before we get to the output, lets spend a few minutes on the query itself.
The first part of the query, takes an AML document and parses it using the PrepareDocument command. It returns an handle to this parsed document, which is assigned to the @idoc variable.
Later we are using the OPENXML Command to convert this parsed document into a tabular representation. In the end we use the RemoveDocument to remove the internal representation to the XML.
<<<<<<<<<<<<<<<<<< Snippet Output >>>>>>>>>>>>>
<<<<<<<<<<<<<<<<<< Snippet Output >>>>>>>>>>>>>>
The OPENXML is basically a rowset provider. It reads the parsed XML and then provides a rowset view of the XML.
The General syntax of the OPENXML is
idoc int [in],
[WITH (SchemaDeclaration | TableName)]
The idoc int, variable is used to provide the Handle for the parsed XML document prepared using the sp_xml_prepared_document.
RowPattern is XPATH for the nodes in the XML document, which will form a part of the rowset output. In the above example, I have specified “/ROOT/Customer”, which means that create a rowset output for the Customer element in the XML document. If the changed the OPENXML rowpattern to as mentioned below, we will get all the ORDER elements as rowset.
FROM OPENXML (@idoc, ‘/ROOT/Customer/Order’,1)
WITH (CustomerID varchar(10),
FLAGS: The flags parameter is used to indicate the mapping that should be used between the XML data and the relational rowset, and how the spill-over column should be filled. flags is an optional input parameter, and can be one of these values.
0 — Defaults to attribute-centric mapping.
1 — Use the attribute-centric mapping. Can be combined with XML_ELEMENTS; in which case, attribute-centric mapping is applied first, and then element-centric mapping is applied for all columns not yet dealt with.
2 — Use the element-centric mapping. Can be combined with XML_ATTRIBUTES; in which case, attribute-centric mapping is applied first, and then element-centric mapping is applied for all columns not yet dealt with.
8 — Can be combined (logical OR) with XML_ATTRIBUTES or XML_ELEMENTS.
Lastly, we have the WITH clause, which defines the schema for the rowset output. We can either specify the details of the columns to be used, for we can provide a TableName, if the table with the desired schema already exists.
In SQL 2005 and SQL 2008, the OPENXML and sp_xml_preparedocument have been enhanced.
OPENXML in the WITH clause support has been added to specify the new data types introduced in SQL.
sp_xml_prepare_document can now accept input in the form off an XML data type variable or an XML column.
Please Note: In SQL 2005 and 2008, support was added for usage of XQuery to play with XML. I would recommended using XQquery nodes() mehod when trying to parse an XML document. Nodes() method typically performs better than OPENXML, in terms of CPU and memory usage.
I will be discussing usage of XQuery support for XML in subsequent posts.