[edit: April 2019] This was originally posted to my internal (to Microsoft) blog on May 29, 2013 as part of a series I called ‘SQL Tips’ for coworkers based on an email distribution list I sent out before the internal blog. I have left this as originally written but have updated formatting for my WordPress theme.
In the last SQL Tip we learned about APPLY because it was something I felt needed to be known before moving into some more XML related querying. This time we’ll address some of the fundamentals of XML in SQL Server.
XML Collections & Namespaces
An XML document or fragment can be stored in SQL via the “xml” data type but can also be stored as text (nvarchar for example). This is pretty straight forward. However, there is also the idea of creating an XML Schema Collection within which one or more namespaces are declared.
So, what is an XML Schema Collection? It is a way to “express a set of rules to which an XML document must conform in order to be considered ‘valid’ according to that schema” (from Wikipedia). This means you can define a set of rules for an XML document and store that in SQL Server. Within the collection you’ll have at least one namespace defined.
What is a namespace then? Wikipedia has a straight forward definition I like: “XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.”
So why create an XML schema collection in SQL Server? You do this to specify the rules for an XML document you will be storing in SQL. If an XML column is created and bound to an XML schema then SQL will validate that whatever is stored in the column meets the requirements defined. When this is done the XML in the schema bound XML column of SQL is referred to as “Typed XML”. If a column is not schema bound it is considered “Untyped XML”. Usually “XML fragments” (portions of an XML document) fall into the latter type. This is important because when you start to use XQuery to query data in XML columns in SQL the method/syntax for the XQuery is different for “Typed XML” and “Untyped XML”.
Does my database have an XML Schema Collection defined?
To know if your database does have one defined then use the following query to find out. You can ignore the ‘internal’ collections – those with a schema of “sys”.
SELECT sch.name AS [SchemaName] ,col.name AS [CollectionName] ,nms.name AS [Namespace] FROM sys.xml_schema_collections col INNER JOIN sys.xml_schema_namespaces nms ON col.xml_collection_id = nms.xml_collection_id INNER JOIN sys.schemas sch ON col.schema_id = sch.schema_id WHERE sch.name != N'sys';
How do I know if something is Typed or Untyped XML?
Well, if an xml column has been bound to an XML Schema Collection then that’s relatively easy to find via a query also. I’ve added a few extra pieces of information which would be interesting to see (I’m also limiting the results to tables).
SELECT obj.name AS [ObjectName] ,obj.type_desc AS [ObjectType] ,col.name AS [ColumnName] ,xsc.name AS [CollectionName] ,nms.name AS [Namespace] ,XML_SCHEMA_NAMESPACE(sch.name,xsc.name,nms.name) AS [XMLSchemaNamespace_Definition] FROM sys.columns col INNER JOIN sys.objects obj ON col.object_id = obj.object_id INNER JOIN sys.xml_schema_collections xsc ON col.xml_collection_id = xsc.xml_collection_id INNER JOIN sys.xml_schema_namespaces nms ON xsc.xml_collection_id = nms.xml_collection_id INNER JOIN sys.schemas sch ON xsc.schema_id = sch.schema_id WHERE sch.name != N'sys' AND col.xml_collection_id != 0 AND obj.type = N'U';
I just ran this against a ConfigMgr 2012 database and got zero results; why? Because although there are typed XML columns in the database the binding isn’t done at the SQL level but rather at the application level. What does this mean? This means that the query above isn’t 100% accurate or reliable when it comes to determining what is typed or untyped (only what is bound at the SQL level) because applications can choose not to bind the schema to a column but still adhere to a typed XML formation. ConfigMgr 2007 did have some schema binding so you can test this query on an old environment or perhaps on a different product altogether. So besides “just knowing” if a column is typed XML you can also look to see if an XML column has some information like “xmlns=”http://someURI” in the first line or so; if it does, I’d say it’s typed XML.
Does it really matter if it’s typed or not?
As I mentioned previously, this does matter. I know we haven’t talked about some of the methods to query XML in SQL so the following example is really only intended to help show the difference in some syntax and the results. If, for example, I use a column called “SDMPackageDigest” (which I know to be ‘typed’) in the “CI_ConfigurationItems” table for a particular application “Photosynth” I’d see that the xml for this record looks like the following (a lot of xml has been removed and comments added to keep the sample somewhat smaller):
<AppMgmtDigest xmlns="http://schemas.microsoft.com/SystemCenterConfigurationManager/2009/AppMgmtDigest" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Application AuthoringScopeId="ScopeId_73F3BB5E-5EDC-4928-87BD-4E75EB4BBC34" LogicalName="Application_b2579012-7c52-4fb5-94a9-06dc6d4ac1bf" Version="7"> <SourceId AuthoringScopeId="ScopeId_0009B3DF-B15E-4764-8A79-4EC81E025BBB" LogicalName="Application_9f1fe6cf-ac1f-44d7-a16e-837fd9f3e31b" /> <DisplayInfo DefaultLanguage="en-US"> <Info Language="en-US"> <Title>Photosynth</Title> <Description>Photosynth is a...</Description> <!-- trimmed for space --> </Info> </DisplayInfo> <!-- trimmed for space --> </Application> <!-- trimmed for space --> </AppMgmtDigest>
Notice how the “xmlns” has the same URI as what we see from running the first query in this post? Nice huh?
Now, if I were to want to write a query against this XML to pull out the “Language” (the highlighted line above) I could do that by using the “value” method and providing the XQuery required for the method to get this data. As you’ll see from the example query below though I wrote the XQuery to do so twice – once with a namespace declaration (for typed XML) and once with no namespace declaration (for untyped XML). (The namespace declaration is the highlighted line below).
SELECT app.SDMPackageDigest.value('declare namespace app="http://schemas.microsoft.com/SystemCenterConfigurationManager/2009/AppMgmtDigest"; (/app:AppMgmtDigest/app:Application/app:DisplayInfo/app:Info/@Language)','varchar(max)') AS [Typed_Language] ,app.SDMPackageDigest.value('(/AppMgmtDigest/Application/DisplayInfo/Info/@Language)','varchar(max)') AS [Untyped_Language] FROM dbo.CI_ConfigurationItems app WHERE app.CI_ID = 123456;
The syntax is very similar between the two, so what is the difference? The biggest difference is that one of them will return data and one will not. Here are the results:
Notice that the column where we used the namespace declaration returned results and the other did not? This is why it is important to know whether a column is typed or not; so you know how to retrieve data from the column!
Don’t worry…we’ll get to the value method and some others in detail in some later posts. 🙂