R语言 字符串矩阵化

j9per5c4  于 2023-03-10  发布在  其他
关注(0)|答案(1)|浏览(153)

我有一个数据集,其中一列值的格式为:“"'a ':1,' b ':2,'c':3}”我想把它转换成一个矩阵,或者至少列出:
| 项目a|B|(c)秘书长的报告|
| - ------|- ------|- ------|
| 1个|第二章|三个|
我试过在str_extract()中使用RegEx,这使我能够非常一致地提取名称和值,尽管可能还有一些我还没有遇到的边缘情况。

str_extract_all(df[row, "tags"], "[a-zA-Z- ]{2,}|[0-9]+[']*s", simplify = TRUE)

这对价值观是有效的:

str_extract_all(str_extract_all(df[row, "tags"], "[0-9]+[,}]+", simplify = T), "[0-9]+", simplify = T)

虽然我知道可能有比嵌套提取更好的方法,但到目前为止,这是我想到的所有方法。实际上,获取这些值并通过编程将它们转换为矩阵是难倒我的。
编辑:数据集是一个 Dataframe ,可以在here中找到。特别是“steamspy_data”csv。我正在尝试将“tag”列从一个字符串转换为单独的行或某种列表,以便可以轻松分析标记及其关联值(频率)之间的关系。

> head(steam_super[,"tags"], 5)
[1] "{'Action': 2681, 'FPS': 2048, 'Multiplayer': 1659, 'Shooter': 1420, 'Classic': 1344, 'Team-Based': 943, 'First-Person': 799, 'Competitive': 790, 'Tactical': 734, \"1990's\": 564, 'e-sports': 550, 'PvP': 480, 'Military': 367, 'Strategy': 329, 'Score Attack': 200, 'Survival': 192, 'Old School': 164, 'Assassin': 151, '1980s': 144, 'Violent': 40}"
[2] "{'Action': 208, 'FPS': 188, 'Multiplayer': 172, 'Classic': 152, 'Shooter': 134, 'Class-Based': 124, 'Team-Based': 115, 'First-Person': 109, \"1990's\": 71, 'Co-op': 62, 'Competitive': 48, 'Old School': 46, 'Fast-Paced': 39, 'Online Co-Op': 28, 'Retro': 27, 'Remake': 27, 'Violent': 26, 'Mod': 24, 'Funny': 20, 'Adventure': 15}"                  
[3] "{'FPS': 138, 'World War II': 122, 'Multiplayer': 115, 'Action': 99, 'Shooter': 95, 'War': 80, 'Team-Based': 79, 'Classic': 61, 'Class-Based': 55, 'First-Person': 50, 'Historical': 28, 'Military': 19, 'Singleplayer': 16, 'Tactical': 14, 'Co-op': 12, 'World War I': 5}"                                                                              
[4] "{'Action': 85, 'FPS': 71, 'Multiplayer': 58, 'Classic': 50, 'Shooter': 49, 'First-Person': 33, 'Arena Shooter': 22, 'Sci-fi': 16}"                                                                                                                                                                                                                       
[5] "{'FPS': 235, 'Action': 211, 'Sci-fi': 166, 'Singleplayer': 148, 'Classic': 146, 'Shooter': 144, 'First-Person': 126, 'Aliens': 122, 'Adventure': 87, \"1990's\": 77, 'Atmospheric': 73, 'Military': 50, 'Story Rich': 40, 'Silent Protagonist': 33, 'Co-op': 27, 'Great Soundtrack': 25, 'Puzzle': 18, 'Gore': 18, 'Moddable': 16, 'Masterpiece': 16}"

dput

structure(list(appid = 10L, type = "game", name = "Counter-Strike", 
    is_free = "False", dlc = "", detailed_description = "Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.", 
    about_the_game = "Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.", 
    short_description = "Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.", 
    fullgame = NA, developers = "['Valve']", publishers = "['Valve']", 
    price_overview = "{'currency': 'GBP', 'initial': 719, 'final': 719, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '£7.19'}", 
    platforms = "{'windows': True, 'mac': True, 'linux': True}", 
    metacritic = "{'score': 88, 'url': 'https://www.metacritic.com/game/pc/counter-strike?ftag=MCD-06-10aaa1f'}", 
    reviews = "", categories = "[{'id': 1, 'description': 'Multi-player'}, {'id': 36, 'description': 'Online Multi-Player'}, {'id': 37, 'description': 'Local Multi-Player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]", 
    genres = "[{'id': '1', 'description': 'Action'}]", release_date = "{'coming_soon': False, 'date': '1 Nov, 2000'}", 
    content_descriptors = "{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}", 
    developer = "Valve", publisher = "Valve", score_rank = NA_integer_, 
    positive = 124534L, negative = 3339L, userscore = 0L, owners = "10,000,000 .. 20,000,000", 
    average_forever = 17612L, average_2weeks = 709L, median_forever = 317L, 
    median_2weeks = 26L, price = 999L, initialprice = 999L, discount = 0L, 
    languages = "English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean", 
    genre = "Action", ccu = 14923L, tags = "{'Action': 2681, 'FPS': 2048, 'Multiplayer': 1659, 'Shooter': 1420, 'Classic': 1344, 'Team-Based': 943, 'First-Person': 799, 'Competitive': 790, 'Tactical': 734, \"1990's\": 564, 'e-sports': 550, 'PvP': 480, 'Military': 367, 'Strategy': 329, 'Score Attack': 200, 'Survival': 192, 'Old School': 164, 'Assassin': 151, '1980s': 144, 'Violent': 40}"), row.names = 1L, class = "data.frame")
km0tfn4u

km0tfn4u1#

使用jsonlite或rjson包来处理json数据。我们将任何在{或空格后面的“转换为“,并将任何”后面的冒号也转换为“。然后应用fromJSON

library(jsonlite)

x <- "{'Tactical': 734, \"1990's\": 564, 'e-sports': 550}"

x |>
  gsub("([ {])'", '\\1"', x = _) |>
  gsub("':", '":', x = _) |>
  fromJSON()

给出:

$Tactical
[1] 734

$`1990's`
[1] 564

$`e-sports`
[1] 550

另一种方法是将输入转换为dcf格式,然后运行read.dcf

res <- x |>
  chartr("{},", "  \n", x = _) |>
  textConnection() |>
  readLines() |>
  trimws() |>
  textConnection() |>
  read.dcf()
colnames(res) <- sub(".(.*).", "\\1", colnames(res))
res

给出:

Tactical 1990's e-sports
[1,] "734"    "564"  "550"

相关问题