平时在编程时,会遇到各种各样的编码和解码,那么为什么有这么不同的编码类型呢?
程序最终都是通过字节码的形式存储在硬盘文件上,而字节码就是 byte 数组。因此编码就是将人类眼睛所能看到的字符串,转为字节数组,即 String 转为 byte[];反之,解码就是将字节数组转为字符串,即 byte[] 转为 String。
之所以会出现各种类型编码类型,其实是一个历史问题。在计算机发展初期,美国等少数国家最先给自己的语言设置一套编码,即 ASCII。由于这些国家使用的都是英语,而英语只需要 26 个英文字母以及一些常见的符号,因此只需要用1个字节的7位,即128个整数就能完全表示英文字符。但随着计算机的普及,西欧的一些其他国家也需要给自己的语言设置一套编码(如法语),而 ASCII 只能表示 128 个字符,显然不能满足需求,因此就产生了第二套编码类型 ISO-8859-1~ISO-8859-15,其中使用最广泛的是 ISO-8859-1,ISO-8859-1 使用了一个字节的8位,可以表示 256 个字符。为了避免乱码问题,ISO-8859-1 完全兼容 ASCII,即 ISO-8859-1 中的前 128 个字符与 ASCII 完全一致,后 128 个字符才是 ISO-8859-1 自身新扩展的字符编码。
之后,我国为了给汉语也设置一些编码,提出了适合汉语的编码集 GB2312。GB2312 包含了 682 个英文、字母等符号及常见的 6763 个简体中文。同时,我国台湾地区也给繁体中文设置了一套编码,成为 BIG5。GB2312 和 BIG5 都兼容 ASCII,都使用一个字节存储 ASCII 中的英文、数字、常见符号,使用两个字节存储简体中文(GB2312)或繁体(BIG5)。
再往后,为了将简体中文和繁体中文容纳到一个字符集里,我国又发布了新的编码 GBK。GBK 实际上是 GB2312 的扩展(兼容 GB2312),支持简体中文和繁体中文,也是使用1个字节存储 ASCII 中的字符,使用2个字节存储一个中文汉字(简体或繁体)。
再后来,为了将中文、生僻字、中国少数名族文字、日文、朝鲜语等纳入一套编码,又将 GBK 升级为 GB18030 。GB18030 兼容 GBK,可以使用1个字节、2个字节和4个字节存储一个字符。
最后,国际社会为了给世界的所有字符设置一套统一的编码,出台了一个统一的字符集规范 Unicode(国际标准字符集)。但 Unicode 仅仅是一套规范,并不能直接使用(类似于接口的概念),能够使用的是 Unicode 的具体实现 UTF-8、UTF-16等(类似于实现类)。实际上,Unicode 是通过一定的算法将每种语言中的每个字符转为了 UTF-8、UTF-16 等具体的编码类型。Unicode 使用 4 个字节存储一个字符(其中包含了2个字节的附加字符),而最常用的 UTF-8 存储一个字符所使用的字节数不是固定的。此外,UTF-8 是 ASCII 的超集,即 ASCII 中每个字符的编码与在 UTF-8 中完全一致的,因此当使用 UTF-8 存储汉字或其他字符时,可能会使用 2 个、3 个或 4 个字节。用 UTF-8 存储一个常见的字符所占用的字节数如下。
| <br>字符种类 UTF-8 存储一个该种类的字符<br> | <br>所占用的字节数<br> |
| <br>英文、数字、回车符、各类常见符号<br> | <br>1<br> |
| <br>常见汉字(即在 GBK 中存在的汉字)<br> | <br>3<br> |
| <br>中日韩等超大字符集里的汉字<br> | <br>4<br> |
| <br>个别特殊符号<br> | <br>2<br> |
public static void test1() {
System.out.println("当前环境默认的编码类型:" + Charset.defaultCharset());
Charset.forName("utf-8");
Set<Map.Entry<String, Charset>> entries = Charset.availableCharsets().entrySet();
System.out.println("当前jdk共支持编码类型数:" + entries.size());
System.out.println("当前环境支持的所有编码类型:");
for (Map.Entry<String, Charset> entry : entries) {
System.out.println("key:" + entry.getKey() + "\tvalue:" + entry.getValue());
}
}
当前环境默认的编码类型:UTF-8
当前jdk共支持编码类型数:170
当前环境支持的所有编码类型:
key:Big5 value:Big5
key:Big5-HKSCS value:Big5-HKSCS
key:CESU-8 value:CESU-8
key:EUC-JP value:EUC-JP
key:EUC-KR value:EUC-KR
key:GB18030 value:GB18030
key:GB2312 value:GB2312
key:GBK value:GBK
key:IBM-Thai value:IBM-Thai
key:IBM00858 value:IBM00858
key:IBM01140 value:IBM01140
key:IBM01141 value:IBM01141
key:IBM01142 value:IBM01142
key:IBM01143 value:IBM01143
key:IBM01144 value:IBM01144
key:IBM01145 value:IBM01145
key:IBM01146 value:IBM01146
key:IBM01147 value:IBM01147
key:IBM01148 value:IBM01148
key:IBM01149 value:IBM01149
key:IBM037 value:IBM037
key:IBM1026 value:IBM1026
key:IBM1047 value:IBM1047
key:IBM273 value:IBM273
key:IBM277 value:IBM277
key:IBM278 value:IBM278
key:IBM280 value:IBM280
key:IBM284 value:IBM284
key:IBM285 value:IBM285
key:IBM290 value:IBM290
key:IBM297 value:IBM297
key:IBM420 value:IBM420
key:IBM424 value:IBM424
key:IBM437 value:IBM437
key:IBM500 value:IBM500
key:IBM775 value:IBM775
key:IBM850 value:IBM850
key:IBM852 value:IBM852
key:IBM855 value:IBM855
key:IBM857 value:IBM857
key:IBM860 value:IBM860
key:IBM861 value:IBM861
key:IBM862 value:IBM862
key:IBM863 value:IBM863
key:IBM864 value:IBM864
key:IBM865 value:IBM865
key:IBM866 value:IBM866
key:IBM868 value:IBM868
key:IBM869 value:IBM869
key:IBM870 value:IBM870
key:IBM871 value:IBM871
key:IBM918 value:IBM918
key:ISO-2022-CN value:ISO-2022-CN
key:ISO-2022-JP value:ISO-2022-JP
key:ISO-2022-JP-2 value:ISO-2022-JP-2
key:ISO-2022-KR value:ISO-2022-KR
key:ISO-8859-1 value:ISO-8859-1
key:ISO-8859-13 value:ISO-8859-13
key:ISO-8859-15 value:ISO-8859-15
key:ISO-8859-2 value:ISO-8859-2
key:ISO-8859-3 value:ISO-8859-3
key:ISO-8859-4 value:ISO-8859-4
key:ISO-8859-5 value:ISO-8859-5
key:ISO-8859-6 value:ISO-8859-6
key:ISO-8859-7 value:ISO-8859-7
key:ISO-8859-8 value:ISO-8859-8
key:ISO-8859-9 value:ISO-8859-9
key:JIS_X0201 value:JIS_X0201
key:JIS_X0212-1990 value:JIS_X0212-1990
key:KOI8-R value:KOI8-R
key:KOI8-U value:KOI8-U
key:Shift_JIS value:Shift_JIS
key:TIS-620 value:TIS-620
key:US-ASCII value:US-ASCII
key:UTF-16 value:UTF-16
key:UTF-16BE value:UTF-16BE
key:UTF-16LE value:UTF-16LE
key:UTF-32 value:UTF-32
key:UTF-32BE value:UTF-32BE
key:UTF-32LE value:UTF-32LE
key:UTF-8 value:UTF-8
key:windows-1250 value:windows-1250
key:windows-1251 value:windows-1251
key:windows-1252 value:windows-1252
key:windows-1253 value:windows-1253
key:windows-1254 value:windows-1254
key:windows-1255 value:windows-1255
key:windows-1256 value:windows-1256
key:windows-1257 value:windows-1257
key:windows-1258 value:windows-1258
key:windows-31j value:windows-31j
key:x-Big5-HKSCS-2001 value:x-Big5-HKSCS-2001
key:x-Big5-Solaris value:x-Big5-Solaris
key:x-euc-jp-linux value:x-euc-jp-linux
key:x-EUC-TW value:x-EUC-TW
key:x-eucJP-Open value:x-eucJP-Open
key:x-IBM1006 value:x-IBM1006
key:x-IBM1025 value:x-IBM1025
key:x-IBM1046 value:x-IBM1046
key:x-IBM1097 value:x-IBM1097
key:x-IBM1098 value:x-IBM1098
key:x-IBM1112 value:x-IBM1112
key:x-IBM1122 value:x-IBM1122
key:x-IBM1123 value:x-IBM1123
key:x-IBM1124 value:x-IBM1124
key:x-IBM1166 value:x-IBM1166
key:x-IBM1364 value:x-IBM1364
key:x-IBM1381 value:x-IBM1381
key:x-IBM1383 value:x-IBM1383
key:x-IBM300 value:x-IBM300
key:x-IBM33722 value:x-IBM33722
key:x-IBM737 value:x-IBM737
key:x-IBM833 value:x-IBM833
key:x-IBM834 value:x-IBM834
key:x-IBM856 value:x-IBM856
key:x-IBM874 value:x-IBM874
key:x-IBM875 value:x-IBM875
key:x-IBM921 value:x-IBM921
key:x-IBM922 value:x-IBM922
key:x-IBM930 value:x-IBM930
key:x-IBM933 value:x-IBM933
key:x-IBM935 value:x-IBM935
key:x-IBM937 value:x-IBM937
key:x-IBM939 value:x-IBM939
key:x-IBM942 value:x-IBM942
key:x-IBM942C value:x-IBM942C
key:x-IBM943 value:x-IBM943
key:x-IBM943C value:x-IBM943C
key:x-IBM948 value:x-IBM948
key:x-IBM949 value:x-IBM949
key:x-IBM949C value:x-IBM949C
key:x-IBM950 value:x-IBM950
key:x-IBM964 value:x-IBM964
key:x-IBM970 value:x-IBM970
key:x-ISCII91 value:x-ISCII91
key:x-ISO-2022-CN-CNS value:x-ISO-2022-CN-CNS
key:x-ISO-2022-CN-GB value:x-ISO-2022-CN-GB
key:x-iso-8859-11 value:x-iso-8859-11
key:x-JIS0208 value:x-JIS0208
key:x-JISAutoDetect value:x-JISAutoDetect
key:x-Johab value:x-Johab
key:x-MacArabic value:x-MacArabic
key:x-MacCentralEurope value:x-MacCentralEurope
key:x-MacCroatian value:x-MacCroatian
key:x-MacCyrillic value:x-MacCyrillic
key:x-MacDingbat value:x-MacDingbat
key:x-MacGreek value:x-MacGreek
key:x-MacHebrew value:x-MacHebrew
key:x-MacIceland value:x-MacIceland
key:x-MacRoman value:x-MacRoman
key:x-MacRomania value:x-MacRomania
key:x-MacSymbol value:x-MacSymbol
key:x-MacThai value:x-MacThai
key:x-MacTurkish value:x-MacTurkish
key:x-MacUkraine value:x-MacUkraine
key:x-MS932_0213 value:x-MS932_0213
key:x-MS950-HKSCS value:x-MS950-HKSCS
key:x-MS950-HKSCS-XP value:x-MS950-HKSCS-XP
key:x-mswin-936 value:x-mswin-936
key:x-PCK value:x-PCK
key:x-SJIS_0213 value:x-SJIS_0213
key:x-UTF-16LE-BOM value:x-UTF-16LE-BOM
key:X-UTF-32BE-BOM value:X-UTF-32BE-BOM
key:X-UTF-32LE-BOM value:X-UTF-32LE-BOM
key:x-windows-50220 value:x-windows-50220
key:x-windows-50221 value:x-windows-50221
key:x-windows-874 value:x-windows-874
key:x-windows-949 value:x-windows-949
key:x-windows-950 value:x-windows-950
key:x-windows-iso2022jp value:x-windows-iso2022jp
版权说明 : 本文为转载文章, 版权归原作者所有 版权申明
原文链接 : https://blog.csdn.net/chengqiuming/article/details/124951461
内容来源于网络,如有侵权,请联系作者删除!