javascript代码的Erlang等效项PointAt？

brvekthn 于 2022-12-08 发布在 Erlang

关注(0)|答案(2)|浏览(169)

js中是否有一个与codePointAt等价的erlang？它可以从字节偏移量开始获取码点，而不修改底层字符串/二进制文件？

erlang

来源：https://stackoverflow.com/questions/72458612/erlang-equivalent-of-javascript-codepointat

2条答案

按热度按时间

zyfwsgd61#

您可以使用bit syntax模式匹配跳过前N个字节，并将剩余字节中的第一个字符解码为UTF-8：

1> CodePointAt = fun(Binary, Offset) ->
  <<_:Offset/binary, Char/utf8, _/binary>> = Binary,
  Char
end.

测试项目：

2> CodePointAt(<<"πr²"/utf8>>, 0).
960
3> CodePointAt(<<"πr²"/utf8>>, 1).
** exception error: no match of right hand side value <<207,128,114,194,178>>
4> CodePointAt(<<"πr²"/utf8>>, 2).
114
5> CodePointAt(<<"πr²"/utf8>>, 3).
178
6> CodePointAt(<<"πr²"/utf8>>, 4).
** exception error: no match of right hand side value <<207,128,114,194,178>>
7> CodePointAt(<<"πr²"/utf8>>, 5).
** exception error: no match of right hand side value <<207,128,114,194,178>>

正如您所看到的，如果偏移量不在有效的UTF-8字符边界内，函数将抛出一个错误。如果需要，您可以使用case表达式以不同的方式处理该错误。

赞(0）回复(0）举报 2022-12-08

jfewjypa2#

首先，请记住在Erlang中只有二进制字符串使用UTF-8。纯双引号字符串已经只是代码点的列表（很像UTF-32）。unicode：chardata（）类型表示这两种类型的字符串，包括混合列表，如["Hello", $\s, [<<"Filip"/utf8>>, $!]]。如果需要，您可以使用unicode:characters_to_list(Chardata)或unicode:characters_to_binary(Chardata)来获得扁平版本。
Meanwhile, the JS codePointAt function works on UTF-16 encoded strings, which is what JavaScript uses. Note that the index in this case is not a byte position, but the index of the 16-bit units of the encoding. And UTF-16 is also a variable length encoding: code points that need more than 16 bits use a kind of escape sequence called "surrogate pairs" - for example emojis like 👍 - so if such characters can occur, the index is misleading: in "a👍z" (in JavaScript), the a is at 0, but the z is not at 2 but at 3.
你想要的可能是所谓的"字素簇"--那些在打印出来时看起来像一个单一的东西（参见Erlang的字符串模块的文档：https://www.erlang.org/doc/man/string.html）。而且你不能真正使用数字索引从字符串中挖掘出字形簇--你需要从头开始迭代字符串，一次提取一个。这可以通过string:next_grapheme(Chardata)（请参阅https：//www.erlang.org/doc/man/string.html#next_grapheme-1）来实现，或者如果你出于某种原因确实需要用数字索引它们，你可以在数组中插入单个簇子字符串（请参阅https://www.erlang.org/doc/man/array.html）。例如：array:from_list(string:to_graphemes(Chardata)).

赞(0）回复(0）举报 2022-12-08

我来回答

javascript代码的Erlang等效项PointAt？

2条答案

相关问题

热门标签

最新问答