XML Data Type Support in ADO.NET 2.0: Handling XML from SQL Server 2005
Microsoft SQL Server 2005
Microsoft ADO.NET 2.0
Summary: See how improvements to XML support in Microsoft ADO.NET 2.0 and Microsoft SQL Server 2005 work together to make handling XML data in your applications easier. (13 printed pages)
One of the quantum changes in Microsoft SQL Server 2005 is the inclusion of the XML data type. This data type is a first class type, just like INT or VARCHAR, and SQL Server 2005 allows in-place querying and processing of this data type using a series of XML-specific functions. There is also support for storing collections of XML schemas in the database, enabling database-based schema validation. In addition, SQL Server 2005 greatly expands the functionality of XML composition (the SELECT ... FOR XML dialects), extends the OpenXML() XML decomposition function, and provides a new nodes() function on the XML data type for more lightweight decomposition.
With all of this new and enhanced XML functionality on the database server, it's no surprise that the SqlClient data provider in Microsoft ADO.NET 2.0 has also been enhanced. There are also changes to the ADO.NET DataSet to support a DataColumn of type XML, and the "integration points" between System.Data and System.Xmlhave been widened. I'll explore using the SQL Server 2005 XML data type on the client in this article.
There are two types of XML output that SQL Server 2005 can produce. The statement SELECT * FROM AUTHORS FOR XML AUTO produces a stream of XML, not a one-column, one-row rowset. This type of output is unchanged from SQL Server 2000. The XML stream output appears in SQL Server Query Analyzer as a one-column, one-row rowset only because of the limitations of the Query Analyzer tool. You can distinguish this stream from a "normal" column by its special unique identifier name "XML_F52E2B61-18A1-11d1-B105-000805F49916B". This name is actually an indicator to the underlying TDS (that's tabular data stream, the SQL Server network format) parser that the column should be streamed to the client rather than sent as an ordinary rowset would be. There is a special method, SqlCommand.ExecuteXmlReader, to retrieve this special stream on the client. In SQL Server 2005, the SELECT ... FOR XML dialect has been enhanced in a many ways. To mention just a few of them:
- There is a new, easy-to-use FOR XML PATH mode that you can use in most cases where you needed the FOR XML EXPLICIT mode in SQL Server 2000.
- You can produce an XML data type column in addition to the stream using a TYPE directive.
- You can nest FOR XML expressions.
- SELECT ... FOR XML can produce XML documents as well as XML fragments using a ROOT directive.
- You can prepend a standard XSD schema to the stream.
You get your first indication that XML is now a first-class relational database type by referencing the relational datatype enumerations in ADO.NET 2.0.System.Data.DbType and System.Data.SqlDbType contain additional values for DbType.Xml and SqlDbType.Xml, respectively. There is also a new class in theSystem.Data.SqlTypes namespace, SqlXml. This class acts as a factory for XmlReader instances on top of the XML type value. I'll demonstrate by means of some simple code. Say that I have a SQL Server table that looks like this:
I can access this table on the client using the following ADO.NET 2.0 code.
The column metadata that is returned when I browse through the DataTable produced by GetSchemaTable correctly identifies the column:
Looks like any other type that's built-into SQL Server. Note that the ".NET type" of this column is XmlReader, and to .NET it looks just like any XML that is loaded from a file or produced with the XmlDocument class. Using the XML data type column as a parameter in a stored procedure or parameterized statement in ADO.NET 2.0 is just as straightforward:
Both of the methods in the preceding code use the SQL Server-specific data type in SqlTypes. When I use the more generic accessor method of SqlReader, GetValue(), the value is quite different. The column does not appear as an XmlReader but rather as a .NET String class. Note that, even though the metadata identifies the column's .NET data type as XmlReader, you can't just cast the column to an XmlReader. Using any accessor but GetSqlXml() returns a string.
Even using the SqlReader.GetProviderSpecificValue() method returns a string. This is somewhat of an anomaly because GetProviderSpecificFieldType correctly returnsSystem.Sql.Types.SqlXml. This looks like it might be a problem in the current beta version of the provider; I'd stay away from using this method for the time being.
SqlClient provides symmetric functionality for XML parameters; you can also use the String data type with these. Being able to pass in a string (NVARCHAR) where anXML type is expected relies on the fact that SQL Server provides automatic conversion of VARCHAR or NVARCHAR to the XML data type. Note that this conversion can also happen on the client side, as shown in the following example. Giving the stored procedure insert_xml either means of auto-converting string/NVARCHAR to XMLwill work.
The XML data type in SQL Server 2005 supports both XML documents and XML document fragments. A fragment differs from a document in that fragments can contain multiple top-level elements and bare text nodes. With typed XML columns/variables/parameters, you can specify whether fragments are allowed using the DOCUMENT (fragments not allowed) or CONTENT (fragments are allowed) specification. CONTENT is the default and untyped XML allows fragments. This T-SQL code illustrates support for fragments:
XML fragments are also produced by SELECT ... FOR XML. The statement SELECT job_id, min_lvl, max_lvl FROM jobs FOR XML AUTO produces the following output. Note that there are multiple root elements.
Both documents and fragments are supported using SqlXml. The SqlXml CreateReader() method always creates an XmlReader that supports fragments by using the new XmlReaderSettings class like this:
You can use XML fragments in input parameters if you construct your XmlReader the same way. Although fragment support is built in when using the SqlXml type, you need to be careful about handling an XmlReader that contains fragments. Be aware that calling XmlReader.GetOuterXml() will only provide the first fragment; to position the XmlReader to get succeeding fragments you must call the XmlReader Read method again. I'll demonstrate this later in this article.
T-SQL's "SELECT ... FOR XML" produced a stream of XML rather than a one-column one-row rowset. It also provided the XML in a binary format rather than XML's standard serialization. Because of the format differences and also because "SELECT ... FOR XML" always produced fragments, a special method was needed to consume it. SqlClient implements a provider-specific method, SqlCommand.ExecuteXmlReader, for this purpose. Using SQL Server 2000 and ADO 1.0/1.1, it's required to useExecuteXmlReader to get the results of a FOR XML query, unless you want to use some fairly ugly workarounds that require string concatenation. With the SQL Server 2005 FOR XML enhancements and the support of XML as a data type, you'd only need to use ExecuteXmlReader to get a single stream of XML from SQL Server. Since all code written for SQL Server 2000 used it, this method is supported and enhanced in ADO.NET 2.0.
You can use ExecuteXmlReader to retrieve any stream from a "FOR XML" query as in previous versions. In addition, this method supports retrieving an XML data type column produced by an ordinary SELECT statement. The only caveat is that when an ordinary SELECT statement returns more than one row, ExecuteXmlReader returns only the content of the first row. Here's an example that illustrates this using the same table as in previous examples:
To finish the discussion of getting XML in ADO.NET 2.0 it's helpful to mention the lifetime of the XmlReader content in various usage scenarios. Investigating the lifetime of the XmlReader also helps to understand the buffering done by SqlClient and how to use this data for maximum performance. The XmlReader uses resources and to free those resources, the Close() or Dispose() method should be called, just like with SqlConnection, SqlCommand, and SqlDataReader. In the case of reading XML columns through a SqlDataReader, there can be an XmlReader allocated for each row. Remember that, in order to support moving backward through columns in the same row or moving to the next row, the XmlReader's contents will be buffered in memory. When you use SqlCommand's CommandBehavior.SequentialAccess the entire XmlReader will not be buffered in memory, but you must be more careful when using this access method. The XmlReader associated with a column must be completely consumed before moving to the next column in the rowset when using CommandBehavior.SequentialAccess; after moving to the next column, theXmlReader appears to be valid, but calling its Read() method produces no data. When you use ExecuteXmlReader or ExecuteScalar instead of ExecuteReader, you don't need to be aware of this behavior as much, but don't forget to Close/Dispose the XmlReader here, either.
SQL Server 2005 supports strongly typed XML, meaning that the XML must conform to an XML schema or set of XML schemas. This support is enabled by using XML Schema Collections in SQL Server. XML Schema Collections are defined like any other SQL Server object, and the XML schemas are stored in SQL Server. The T-SQL DDL CREATE statement and XML Schema collection usage looks like this:
When you use strongly typed XML data inside SQL Server 2005 from the client, validation is done on the server, not on the client. As an example, if you use the AddARowmethod shown in a previous example to add a row to the typed_xml table, the data is sent across the network to SQL Server before validation occurs. With a little work though, it's possible to retrieve the XML schemas from the SQL Server XML SCHEMA COLLECTIONs and stash them on the client to achieve client-side validation. This can save some round trips by preventing a user or web service from sending schema-invalid XML to your client for SQL Server storage, but there are two caveats/clarifications that must be considered. First, relying on XML schema information fetched from SQL Server is like relying on any cached client-side metadata. It's possible that someone may have altered the schema collection using the T-SQL ALTER XML SCHEMA statement, or even dropped the schema collection and recreated it, rendering your client-side checking useless. You can guard against surprises by using a new EVENT NOTIFICATION on the CREATE/ALTER/DROP XML SCHEMA DDLstatements. You would then use custom code to monitor the Service Broker SERVICE similar to when you use SqlNotificationRequest with query notifications. EVENT NOTIFICATIONs are new in SQL Server 2005, but are similar to the Query Notifications discussed in my previous article, Query Notifications in ADO.NET 2.0. Second, bear in mind that even though you do XML schema validation on the client side, SQL Server will redo your checking on the server. There is no way to indicate to SQL Server "I know this instance of SQL Server schema-typed XML is schema-valid, don't you bother to check it yourself."
To do client-side XML Schema validation, you can retrieve the SQL Server XML SCHEMA COLLECTION by using the T-SQL function xml_schema_namespace(). This requires an XML schema collection name and database schema name, since XML schema collections are database schema scoped. We can hardcode them into the program or extract them from client side rowset metadata. Using the SqlDataReader GetSchemaTable method above or using the new SqlMetaData class, it's easy to get the name of the schema collection associated with a particular column. Here's a short code example:
Once you have figured out which XML Schema collection to retrieve, you can use the T-SQL function to retrieve it into a client-side XmlSchemaSet. Note that this example also demonstrates how to retrieve a fragment using an XmlReader. This is required because there may be more than one XML schema in an XML SCHEMA COLLECTION; the xml_schema_namespace function will return all of the XML schemas in the collection as a fragment.
With the XmlSchemaSet in hand, we can integrate client-side XML validation code into our Add routine, and Voila! client-side validation.
Although client-side XML Schema validation is not something that every application requires, it's nice to know that it's available if your application needs it. XML schemas are also produced when you use the new SQL Server 2005 XMLSCHEMA option on SELECT...FOR XML like this:
You could break up the fragment and retrieve the XmlSchemaSet on the client-side for these types of queries as well, though there are some limitations doing this in practice in the current beta.
I've described using the new XML type and new FOR XML features to consume XML from Microsoft SQL Server in the Microsoft ADO.NET SqlClient and also how to use SqlClient to insert XML into SQL Server tables. There's actually improved XML functionality in the DataSet that I haven't mentioned; there will be an upcoming article series on ADO 2.0 DataSet enhancements. You can read more about using the SQL Server XML data in the MSDN online article XML Support in Microsoft SQL Server 2005 by Shankar Pal, Mark Fussell, and Irwin Dolobowsky; more about the enhancement to SELECT...FOR XML in the MSDN online article What's New in FOR XML in Microsoft SQL Server 2005 by Michael Rys; and more about System.Xml in .NET 2.0 in Mark Fussell's What's New in System.Xml for Visual Studio 2005 and the .NET Framework 2.0 Release. It's safe to say that XML is integrated with ADO.NET 2.0 and SqlClient at many levels; the integration of data models is the tightest ever.
About the author
Bob Beauchemin is an instructor, course author, and database curriculum course liaison for DevelopMentor. He has over twenty-five years of experience as an architect, programmer, and administrator for data-centric distributed systems. He's written articles on ADO.NET, OLE DB, and SQL Server for Microsoft Systems Journal, SQL Server Magazine, and others, and is the author of A First Look at SQL Server 2005 for Developers and Essential ADO.NET.