.docx XMLを細断処理する方法？

xml（実際にはdocxファイル）をSQL Server 2008データベースにインポートしようとしています。私はXMLプログラミングの初心者です。私はたくさんググりましたが、そこにあるほとんどすべての例は、単純なxmlファイルです。ここでxmlファイルは少し複雑です（以下を参照してください）。このXMLのテーブルを作成する方法と、SQLサーバーで実行するクエリを教えてください。すべてのタグに値が必要です。たとえば、w：rsidP、w：rsidRDefault、w：rsidRのw：p、w：pStyle、w：bookmarkStart、w：tタグなどです。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
<w:body>
<w:p w:rsidR="00EF42E0" w:rsidRDefault="00EF42E0" w:rsidP="00EF42E0">
<w:pPr><w:pStyle w:val="Heading1"/>
</w:pPr><w:bookmarkStart w:id="0" w:name="_Toc212523610"/>
<w:r>
<w:t>Summary</w:t>
</w:r>
<w:bookmarkEnd w:id="0"/>
</w:p>
<w:p w:rsidR="00EF42E0" w:rsidRDefault="00EF42E0" w:rsidP="00EF42E0"><w:pPr><w:pStyle w:val="mainbodytext"/><w:ind w:right="-694"/><w:rPr><w:b/><w:bCs/></w:rPr></w:pPr><w:r><w:rPr><w:b/><w:bCs/></w:rPr><w:t>What is the Group Defined Practice for Integrity Management?</w:t></w:r></w:p>
<w:p w:rsidR="00EF42E0" w:rsidRDefault="00EF42E0" w:rsidP="00EF42E0"><w:pPr><w:pStyle w:val="mainbodytext"/></w:pPr><w:r><w:t xml:space="preserve">This Practice is derived from the GP Group Standard, GRP 01 January 2006, </w:t></w:r><w:proofErr w:type="gramStart"/><w:r><w:t>Integrity</w:t></w:r><w:proofErr w:type="gramEnd"/><w:r><w:t xml:space="preserve"> Management.  In developing QMS it has been possible to embed much of the content of the IM Standard directly into the Group Essentials statements.  For elements 2, 7, 8 and 9 of the Standard it was possible to do that in their entirety and therefore content of those elements are not repeated within this Practice.</w:t></w:r></w:p></w:body></w:document>

sql-server sql-server-2008 xml

— user23683
ソース

SQL ServerでXMLを操作する場合は、xmlデータ型メソッドを使用し、XMLドキュメントを細断処理する場合はnodes()、通常、およびvalue()メソッドを使用します。ここにあるXMLにはいくつかの名前空間も含まれているため、WITH XMLNAMESPACES（Transact-SQL）を使用して必要な名前空間を指定する必要があります。

XMLは非常に複雑なので、どのようにデータを抽出するかを知らなくても、必要なものに変更できるいくつかのサンプルクエリしか提供できません。

4つのw:pノードがあり、これらはこれらのノードから属性をフェッチするクエリです。を使用@すると、それが必要な属性の値であることを指定できます。

with xmlnamespaces('http://schemas.openxmlformats.org/wordprocessingml/2006/main' as w)
select P.X.value('@w:rsidR', 'char(8)') as rsidR,
       P.X.value('@w:rsidRDefault', 'char(8)') as rsidRDefault,
       P.X.value('@w:rsidP', 'char(8)') as rsidP
from @doc.nodes('/w:document/w:body/w:p') as P(X);

SQLフィドル

さらに、w:tノード内のテキストが必要な場合は、ノード内のXMLを細断処理cross applyする2番目のnodes()句を実行する必要がありw:pます。

with xmlnamespaces('http://schemas.openxmlformats.org/wordprocessingml/2006/main' as w)
select P.X.value('@w:rsidR', 'char(8)') as rsidR,
       P.X.value('@w:rsidRDefault', 'char(8)') as rsidRDefault,
       P.X.value('@w:rsidP', 'char(8)') as rsidP,
       T.X.value('text()[1]', 'nvarchar(max)') as Text
from @doc.nodes('/w:document/w:body/w:p') as P(X)
  cross apply P.X.nodes('w:r/w:t') as T(X);

SQLフィドル

質問で、すべてのタグから値を取得する必要があると述べました。これがどれほど便利かはわかりませんが、XMLのすべての属性と要素を含む名前と値のリストを作成できます。

これにより、すべての要素が得られます。

select T.X.value('local-name(.)', 'nvarchar(max)') Name,
       T.X.value('.', 'nvarchar(max)') Value
from @doc.nodes('//*') as T(X)

に変更'//*'する'//@*'と、すべての属性が取得されます。

select T.X.value('local-name(.)', 'nvarchar(max)') Name,
       T.X.value('.', 'nvarchar(max)') Value
from @doc.nodes('//@*') as T(X)

また、それらを1つのクエリに結合することもできます。

select T.X.value('local-name(.)', 'nvarchar(max)') Name,
       T.X.value('.', 'nvarchar(max)') Value
from @doc.nodes('//@*, //*') as T(X)

SQLフィドル

— ミカエル・エリクソン
ソース

.docx XMLを細断処理す​​る方法？

.docx XMLを細断処理する方法？