.net 如何在C#中检索XML实体值?

lokaqttq  于 2023-04-08  发布在  .NET
关注(0)|答案(5)|浏览(133)

我希望能够在C#/.NET 4.0应用程序中显示实体名称和值的列表。
我能够使用XmlDocument.DocumentType.Entities很容易地检索实体名称,但是有没有一种好的方法来检索这些实体的值呢?
我注意到,我可以使用InnerText检索纯文本实体的值,但这不适用于包含XML标记的实体。
是使用正则表达式的最佳方法吗?
假设我有一个这样的文档:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ENTITY test "<para>only a test</para>">
  <!ENTITY wwwc "World Wide Web Corporation">
  <!ENTITY copy "&#xA9;">
]>

<document>
  <!-- The following image is the World Wide Web Corporation logo. -->
  <graphics image="logo" alternative="&wwwc; Logo"/>
</document>

我想向用户显示一个包含三个实体名称(test、wwwc和copy)沿着值(名称后面的引号中的文本)的列表。我没有考虑过实体嵌套在其他实体中的问题,因此我对一个解决方案感兴趣,该解决方案要么完全扩展实体值,要么显示引号中的文本。

pu82cl6c

pu82cl6c1#

尽管这可能不是最优雅的解决方案,但我想出了一个似乎能很好地满足我的目的的方法。首先,我解析原始文档并从该文档中检索实体节点。然后,我在内存中创建一个小的XML文档,并向其中添加所有实体节点。接下来,我向临时XML中的所有实体添加实体引用。最后,我从所有引用中检索了InnerXml。
下面是一些示例代码:

// parse the original document and retrieve its entities
        XmlDocument parsedXmlDocument = new XmlDocument();
        XmlUrlResolver resolver = new XmlUrlResolver();
        resolver.Credentials = CredentialCache.DefaultCredentials;
        parsedXmlDocument.XmlResolver = resolver;
        parsedXmlDocument.Load(path);

        // create a temporary xml document with all the entities and add references to them
        // the references can then be used to retrieve the value for each entity
        XmlDocument entitiesXmlDocument = new XmlDocument();
        XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
        entitiesXmlDocument.AppendChild(dec);
        XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
        entitiesXmlDocument.AppendChild(newDocType);
        XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
        entitiesXmlDocument.AppendChild(root);
        XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;

        // build a dictionary of entity names and values
        Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
        for (int i = 0; i < entitiesMap.Count; i++)
        {
            XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
            XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
            entityElement.AppendChild(entityRefElement);
            root.AppendChild(entityElement);
            if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
            {
                // do not add parameter entities or invalid entities
                // this can be determined by checking for an empty string
                entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
            }
        }
tkqqtvp1

tkqqtvp12#

这是一种方法(未经测试),它使用XMLReader和该类的ResolveEntity()方法:

private Dictionary<string, string> GetEntities(XmlReader xr)
{
    Dictionary<string, string> entityList = new Dictionary<string, string>();

    while (xr.Read())
    {
        HandleNode(xr, entityList);
    }
    return entityList;
}

StringBuilder sbEntityResolver;
int extElementIndex = 0;
int resolveEntityNestLevel = -1;
string dtdCurrentTopEntity = "";

private void HandleNode(XmlReader inReader, Dictionary<string, string> entityList)
{
    if (inReader.NodeType == XmlNodeType.Element)
    {
        if (resolveEntityNestLevel < 0)
        {
                while (inReader.MoveToNextAttribute())
                {
                    HandleNode(inReader, entityList); // for namespaces
                    while (inReader.ReadAttributeValue())
                    {
                        HandleNode(inReader, entityList); // recursive for resolving entity refs in attributes
                    }                       
                }
        }
        else
        {
            extElementIndex++;
            sbEntityResolver.Append(inReader.ReadOuterXml());
            resolveEntityNestLevel--;
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.EntityReference)
    {
        if (inReader.Name[0] != '#' && !entityList.ContainsKey(inReader.Name))
        {
            if (resolveEntityNestLevel < 0)
            {
                sbEntityResolver = new StringBuilder(); // start building entity
                dtdCurrentTopEntity = inReader.Name;
            }
            // entityReference can have contents that contains other
            // entityReferences, so keep track of nest level
            resolveEntityNestLevel++;
            inReader.ResolveEntity();
        }
    }
    else if (inReader.NodeType == XmlNodeType.EndEntity)
    {
        resolveEntityNestLevel--;
        if (resolveEntityNestLevel < 0)
        {
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.Text)
    {
        if (resolveEntityNestLevel > -1)
        {
            sbEntityResolver.Append(inReader.Value);
        }
    }
}
lzfw57am

lzfw57am3#

如果您有一个XmlDocument对象,递归遍历每个XmlNode对象可能会更容易(来自XmlDocument.ChildNodes),对于每个节点,您可以使用Name属性来获取节点的名称。(InnerXml用于字符串表示,ChildNodes用于编程访问XmlNode对象,这些对象可以转换为XmlEntity/XmlAttribute/XmlText)。

f87krz0w

f87krz0w4#

只需递归地遍历树,就可以轻松地显示XML文档的表示。
这个小类碰巧使用了一个控制台,但是您可以根据需要轻松地修改它。

public static class XmlPrinter {
   private const Int32 SpacesPerIndent = 3;

   public static void Print(XDocument xDocument) {
      if (xDocument == null) {
         Console.WriteLine("No XML Document Provided");
         return;
      }

      PrintElementRecursive(xDocument.Root);
   }

   private static void PrintElementRecursive(XElement element, Int32 indentationLevel = 0) {
      if(element == null) return;

      PrintIndentation(indentationLevel);
      PrintElement(element);
      PrintNewline();

      foreach (var xAttribute in element.Attributes()) {
         PrintIndentation(indentationLevel + 1);
         PrintAttribute(xAttribute);
         PrintNewline();
      }

      foreach (var xElement in element.Elements()) {
         PrintElementRecursive(xElement, indentationLevel+1);
      }
   }

   private static void PrintAttribute(XAttribute xAttribute) {
      if (xAttribute == null) return;

      Console.Write("[{0}] = \"{1}\"", xAttribute.Name, xAttribute.Value);
   }

   private static void PrintElement(XElement element) {
      if (element == null) return;

      Console.Write("{0}", element.Name);

      if(!String.IsNullOrWhiteSpace(element.Value))
         Console.Write(" : {0}", element.Value);
   }

   private static void PrintIndentation(Int32 level) {
      Console.Write(new String(' ', level * SpacesPerIndent));
   }

   private static void PrintNewline() {
      Console.Write(Environment.NewLine);
   }
}

使用这个类很简单。下面是一个打印出当前配置文件的例子:

static void Main(string[] args) {
   XmlPrinter.Print(XDocument.Load(
      ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None).FilePath
                        ));

   Console.ReadKey();
}

自己尝试一下,你应该能够快速修改以获得你想要的。

dfty9e19

dfty9e195#

我在使用公认的解决方案时遇到了问题。特别是:

  • 在我的文档中,实体引用需要一个自定义解析器来从外部源加载它们。因此,从原始文档创建元素(只是不追加它们)是一种更容易的方法,而不是试图在新的XmlDocument中复制DTD和解析器。
  • 此外,InnerXml属性一直返回实体引用而不是它的扩展。为了解决这个问题,我采取了将XML复制到自动解析实体的XElement中的方法。
private IEnumerable<KeyValuePair<string, string>> AllEntityExpansions(XmlDocument doc)
{
  var entities = doc.DocumentType.Entities;
  foreach (var entity in entities.OfType<XmlEntity>()
    .OrderBy(e => e.Name, StringComparer.OrdinalIgnoreCase))
  {
    var xmlString = default(string);
    try
    {
      var element = doc.CreateElement("e");
      element.AppendChild(doc.CreateEntityReference(entity.Name));
      using (var r = new XmlNodeReader(element))
      {
        var elem = XElement.Load(r);
        xmlString = elem.ToString();
      }
    }
    catch (XmlException) { }

    if (xmlString?.Length > 7)
      yield return new KeyValuePair<string, string>(entity.Name, xmlString.Substring(3, xmlString.Length - 7));
  }
}

相关问题