linux awk中的转义HTML特殊字符

ohfgkhjo  于 2024-01-06  发布在  Linux
关注(0)|答案(2)|浏览(114)

从awk脚本中,我想生成一个HTML文件。我的字符串可能包含像“<”和“&"这样的字符。有没有一个简短的、经过验证的awk函数来执行转义?

atmip9wb

atmip9wb1#

为了逃避最低限度,你可以这样做:

  1. function escapeHtml(t)
  2. {
  3. # Must do this one first
  4. gsub(/&/, "\\&amp;", t);
  5. gsub(/"/, "\\&quot;", t)
  6. gsub(/</, "\\&lt;", t);
  7. gsub(/>/, "\\&gt;", t);
  8. return t;
  9. }

字符串

vwkv1x7d

vwkv1x7d2#

当然。只需为您想要转换的每一行调用makeEntities()$0)。或者修改它以接受参数。我这样做是为了使用英国国家语料库,它与HTML实体有高度的重叠,但 * 不是100%*,所以如果您需要一些外来字符,您应该验证它们是否正确。

  1. function makeEntities() {
  2. gsub(/á/, "\\&aacute;");
  3. gsub(/Á/, "\\&Aacute;");
  4. gsub(/ă/, "\\&abreve;");
  5. gsub(/â/, "\\&acirc;");
  6. gsub(/´/, "\\&acute;");
  7. gsub(/æ/, "\\&aelig;");
  8. gsub(/Æ/, "\\&AElig;");
  9. gsub(/α/, "\\&agr;");
  10. gsub(/à/, "\\&agrave;");
  11. gsub(/ā/, "\\&amacr;");
  12. gsub(/Ā/, "\\&Amacr;");
  13. gsub(/&/, "\\&amp;");
  14. gsub(/ą/, "\\&aogon;");
  15. gsub(/å/, "\\&aring;");
  16. gsub(/Å/, "\\&Aring;");
  17. gsub(/ã/, "\\&atilde;");
  18. gsub(/ä/, "\\&auml;");
  19. gsub(/Ä/, "\\&Auml;");
  20. gsub(/β/, "\\&bgr;");
  21. gsub(/\\/, "\\&bsol;");
  22. gsub(/•/, "\\&bull;");
  23. gsub(/ć/, "\\&cacute;");
  24. gsub(/č/, "\\&ccaron;");
  25. gsub(/Č/, "\\&Ccaron;");
  26. gsub(/ç/, "\\&ccedil;");
  27. gsub(/Ç/, "\\&Ccedil;");
  28. gsub(/ĉ/, "\\&ccirc;");
  29. gsub(/✓/, "\\&check;");
  30. gsub(/ˆ/, "\\&circ;");
  31. gsub(/@/, "\\&commat;");
  32. gsub(/©/, "\\&copy;");
  33. gsub(/‐/, "\\&dash;");
  34. gsub(/ď/, "\\&dcaron;");
  35. gsub(/°/, "\\&deg;");
  36. gsub(/δ/, "\\&dgr;");
  37. gsub(/Δ/, "\\&Dgr;");
  38. gsub(/¨/, "\\&die;");
  39. gsub(/\$/, "\\&dollar;");
  40. gsub(/đ/, "\\&dstrok;");
  41. gsub(/é/, "\\&eacute;");
  42. gsub(/É/, "\\&Eacute;");
  43. gsub(/ě/, "\\&ecaron;");
  44. gsub(/ê/, "\\&ecirc;");
  45. gsub(/è/, "\\&egrave;");
  46. gsub(/È/, "\\&Egrave;");
  47. gsub(/ε/, "\\&egr;");
  48. gsub(/ē/, "\\&emacr;");
  49. gsub(/Ē/, "\\&Emacr;");
  50. gsub(/ę/, "\\&eogon;");
  51. gsub(/ð/, "\\&eth;");
  52. gsub(/ë/, "\\&euml;");
  53. gsub(/Ë/, "\\&Euml;");
  54. gsub(/♭/, "\\&flat;");
  55. gsub(/½/, "\\&frac12;");
  56. gsub(/⅓/, "\\&frac13;");
  57. gsub(/¼/, "\\&frac14;");
  58. gsub(/⅕/, "\\&frac15;");
  59. gsub(/⅙/, "\\&frac16;");
  60. gsub(/⅛/, "\\&frac18;");
  61. gsub(/⅔/, "\\&frac23;");
  62. gsub(/⅖/, "\\&frac25;");
  63. gsub(/¾/, "\\&frac34;");
  64. gsub(/⅗/, "\\&frac35;");
  65. gsub(/⅜/, "\\&frac38;");
  66. gsub(/⅘/, "\\&frac45;");
  67. gsub(/⅝/, "\\&frac58;");
  68. gsub(/⅞/, "\\&frac78;");
  69. gsub(/′/, "\\&ft;");
  70. gsub(/γ/, "\\&ggr;");
  71. gsub(/>/, "\\&gt;");
  72. gsub(/½/, "\\&half;");
  73. gsub(/ħ/, "\\&hstrok;");
  74. gsub(/í/, "\\&iacute;");
  75. gsub(/Í/, "\\&Iacute;");
  76. gsub(/î/, "\\&icirc;");
  77. gsub(/Î/, "\\&Icirc;");
  78. gsub(/ì/, "\\&igrave;");
  79. gsub(/ī/, "\\&imacr;");
  80. gsub(/″/, "\\&ins;");
  81. gsub(/¿/, "\\&iquest;");
  82. gsub(/ï/, "\\&iuml;");
  83. gsub(/Ï/, "\\&Iuml;");
  84. gsub(/ĺ/, "\\&lacute;");
  85. gsub(/Ĺ/, "\\&Lacute;");
  86. gsub(/\{/, "\\&lcub;");
  87. gsub(/≤/, "\\&le;");
  88. gsub(/λ/, "\\&lgr;");
  89. gsub(/_/, "\\&lowbar;");
  90. gsub(/\[/, "\\&lsqb;");
  91. gsub(/ł/, "\\&lstrok;");
  92. gsub(/Ł/, "\\&Lstrok;");
  93. gsub(/</, "\\&lt;");
  94. gsub(/—/, "\\&mdash;");
  95. gsub(/μ/, "\\&mgr;");
  96. gsub(/µ/, "\\&micro;");
  97. gsub(/·/, "\\&middot;");
  98. gsub(/ń/, "\\&nacute;");
  99. gsub(/ň/, "\\&ncaron;");
  100. gsub(/ņ/, "\\&ncedil;");
  101. gsub(/–/, "\\&ndash;");
  102. gsub(/ñ/, "\\&ntilde;");
  103. gsub(/Ñ/, "\\&Ntilde;");
  104. gsub(/#/, "\\&num;");
  105. gsub(/ó/, "\\&oacute;");
  106. gsub(/Ó/, "\\&Oacute;");
  107. gsub(/ô/, "\\&ocirc;");
  108. gsub(/œ/, "\\&oelig;");
  109. gsub(/ò/, "\\&ograve;");
  110. gsub(/Ω/, "\\&ohm;");
  111. gsub(/ō/, "\\&omacr;");
  112. gsub(/ø/, "\\&oslash;");
  113. gsub(/Ø/, "\\&Oslash;");
  114. gsub(/õ/, "\\&otilde;");
  115. gsub(/ö/, "\\&ouml;");
  116. gsub(/Ö/, "\\&Ouml;");
  117. gsub(/φ/, "\\&phgr;");
  118. gsub(/\+/, "\\&plus;");
  119. gsub(/±/, "\\&plusmn;");
  120. gsub(/£/, "\\&pound;");
  121. gsub(/ŕ/, "\\&racute;");
  122. gsub(/√/, "\\&radic;");
  123. gsub(/ř/, "\\&rcaron;");
  124. gsub(/Ř/, "\\&Rcaron;");
  125. gsub(/\}/, "\\&rcub;");
  126. gsub(/®/, "\\&reg;");
  127. gsub(/-/, "\\&rehy;");
  128. gsub(/\]/, "\\&rsqb;");
  129. gsub(/ś/, "\\&sacute;");
  130. gsub(/Ś/, "\\&Sacute;");
  131. gsub(/š/, "\\&scaron;");
  132. gsub(/Š/, "\\&Scaron;");
  133. gsub(/ş/, "\\&scedil;");
  134. gsub(/Ş/, "\\&Scedil;");
  135. gsub(/ŝ/, "\\&scirc;");
  136. gsub(/σ/, "\\&sgr;");
  137. gsub(/♯/, "\\&sharp;");
  138. gsub(/\//, "\\&shilling;");
  139. gsub(/∼/, "\\&sim;");
  140. gsub(/\//, "\\&sol;");
  141. gsub(/²/, "\\&sup2;");
  142. gsub(/ß/, "\\&szlig;");
  143. gsub(/ť/, "\\&tcaron;");
  144. gsub(/ţ/, "\\&tcedil;");
  145. gsub(/τ/, "\\&tgr;");
  146. gsub(/þ/, "\\&thorn;");
  147. gsub(/Þ/, "\\&THORN;");
  148. gsub(/×/, "\\&times;");
  149. gsub(/™/, "\\&trade;");
  150. gsub(/ú/, "\\&uacute;");
  151. gsub(/Ú/, "\\&Uacute;");
  152. gsub(/û/, "\\&ucirc;");
  153. gsub(/ù/, "\\&ugrave;");
  154. gsub(/ū/, "\\&umacr;");
  155. gsub(/¨/, "\\&uml;");
  156. gsub(/ů/, "\\&uring;");
  157. gsub(/ü/, "\\&uuml;");
  158. gsub(/Ü/, "\\&Uuml;");
  159. gsub(/\|/, "\\&verbar;");
  160. gsub(/ŵ/, "\\&wcirc;");
  161. gsub(/ý/, "\\&yacute;");
  162. gsub(/ŷ/, "\\&ycirc;");
  163. gsub(/¥/, "\\&yen;");
  164. gsub(/ÿ/, "\\&yuml;");
  165. gsub(/Ÿ/, "\\&Yuml;");
  166. gsub(/ź/, "\\&zacute;");
  167. gsub(/Ž/, "\\&Zcaron;");
  168. gsub(/ž/, "\\&zcaron;");
  169. gsub(/ż/, "\\&zdot;");
  170. }

字符串

展开查看全部

相关问题