我一直在寻求从URL导入JSON格式数据的帮助(就处理JSON而言,我是一个新手),并收到了对这个问题的很好的回答。
然而,我遇到了一个复杂的问题。我的一些属性名称包含空格。例如,“Property1”和我上一个问题中的其他几个属性名称实际上可能是“Property1_word1 Property1_word2”。目前的解决方案只保留属性名称的第一个单词。我一开始可以这样做,但现在需要所有单词。如果有人可以给我任何提示,我会很感激的。我还没找到。
编辑(在这里提供所有信息,这样就不需要参考以前的帖子):
我想从网站导入数据。首先,我将网站的内容(如下)保存保存为文件。在我的上一个问题中,每个属性名称仅由一个单词组成。现在我正在处理由多个单词组成的属性名称。我在下面提供了一个示例,其中Property1,Property4和Property8的名称包含多个单词。
{
"payload": {
"allShortcutsEnabled": false,
"fileTree": {
"": {
"items": [
{
"name": "thing",
"path": "thing",
"contentType": "directory"
},
{
"name": ".repurlignore",
"path": ".repurlignore",
"contentType": "file"
},
{
"name": "README.md",
"path": "README.md",
"contentType": "file"
},
{
"name": "thing2",
"path": "thing2",
"contentType": "file"
},
{
"name": "thing3",
"path": "thing3",
"contentType": "file"
},
{
"name": "thing4",
"path": "thing4",
"contentType": "file"
},
{
"name": "thing5",
"path": "thing5",
"contentType": "file"
},
{
"name": "thing6",
"path": "thing6",
"contentType": "file"
},
{
"name": "thing7",
"path": "thing7",
"contentType": "file"
},
{
"name": "thing8",
"path": "thing8",
"contentType": "file"
},
{
"name": "thing9",
"path": "thing9",
"contentType": "file"
},
{
"name": "thing10",
"path": "thing10",
"contentType": "file"
},
{
"name": "thing11",
"path": "thing11",
"contentType": "file"
}
],
"totalCount": 500
}
},
"fileTreeProcessingTime": 5.262188,
"foldersToFetch": [],
"reducedMotionEnabled": null,
"repo": {
"id": 1234567,
"defaultBranch": "main",
"name": "repository",
"ownerLogin": "contributor",
"currentUserCanPush": false,
"isFork": false,
"isEmpty": false,
"createdAt": "2023-10-31",
"ownerAvatar": "https://avatars.repurlusercontent.com/u/98765432?v=1",
"public": true,
"private": false,
"isOrgOwned": false
},
"symbolsExpanded": false,
"treeExpanded": true,
"refInfo": {
"name": "main",
"listCacheKey": "v0:13579",
"canEdit": false,
"refType": "branch",
"currentOid": "identifier"
},
"path": "thing2",
"currentUser": null,
"blob": {
"rawLines": [
" C_1H_4 Methane ",
" 5.00000 Property1_word1 Property1_word2 ",
" 20.00000 Property2 ",
" 500.66500 Property3 ",
" 100.00000 Property4_word1 Property4_word2 ",
" -4453.98887 Property5 ",
" 100.48200 Property6 ",
" 59.75258 Property7 ",
" 5.33645 Property8_word1 Property8_word2 ",
" 0.00000 Property9 ",
" 645.07777 Property10 ",
" 0.00000 Property11 ",
" 0.00000 Property12 ",
" 0.00000 Property13 ",
" 0.00000 Property14 ",
" 0.00000 Property15 ",
" 0.00000 Property16 ",
" 0.00000 Property17 ",
" 0.00000 Property18 ",
" 0.00000 Property19 ",
" 0.00000 Property20 ",
" 0.00000 Property21 ",
" 0.00000 Property22 ",
" 0.00000 Property23 ",
" 0.00000 Property24 ",
" 0.00000 Property25 ",
" 0.57876 Property26 ",
" 4.00000 Property27 ",
" 0.00000 Property28 ",
" 0.00000 Property29 ",
" 0.00000 Property30 ",
" 0.00000 Property31 ",
" 0.00000 Property32 ",
" 1.00000 Property33 ",
" 0.00000 Property34 ",
" 26.00000 Property35 ",
" 1.44571 Property36 ",
" 1.08756 Property37 ",
" 0.00000 Property38 ",
" 0.00000 Property39 ",
" 0.00000 Property40 ",
" 6.00000 Property41 ",
" 9.00000 Property42 ",
" 0.00000 Property43 "
],
"stylingDirectives": [
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[]
],
"csv": null,
"csvError": null,
"dependabotInfo": {
"showConfigurationBanner": false,
"configFilePath": null,
"networkDependabotPath": "/contributor/repository/network/updates",
"dismissConfigurationNoticePath": "/settings/dismiss-notice/dependabot_configuration_notice",
"configurationNoticeDismissed": null,
"repoAlertsPath": "/contributor/repository/security/dependabot",
"repoSecurityAndAnalysisPath": "/contributor/repository/settings/security_analysis",
"repoOwnerIsOrg": false,
"currentUserCanAdminRepo": false
},
"displayName": "thing2",
"displayUrl": "https://repurl.com/contributor/repository/blob/main/thing2?raw=true",
"headerInfo": {
"blobSize": "3.37 KB",
"deleteInfo": {
"deleteTooltip": "You must be signed in to make or propose changes"
},
"editInfo": {
"editTooltip": "XXX"
},
"ghDesktopPath": "https://desktop.repurl.com",
"repurlLfsPath": null,
"onBranch": true,
"shortPath": "5678",
"siteNavLoginPath": "/login?return_to=identifier",
"isCSV": false,
"isRichtext": false,
"toc": null,
"lineInfo": {
"truncatedLoc": "33",
"truncatedSloc": "33"
},
"mode": "executable file"
},
"image": false,
"isCodeownersFile": null,
"isPlain": false,
"isValidLegacyIssueTemplate": false,
"issueTemplateHelpUrl": "https://docs.repurl.com/articles/about-issue",
"issueTemplate": null,
"discussionTemplate": null,
"language": null,
"languageID": null,
"large": false,
"loggedIn": false,
"newDiscussionPath": "/contributor/repository/issues/new",
"newIssuePath": "/contributor/repository/issues/new",
"planSupportInfo": {
"repoOption1": null,
"repoOption2": null,
"requestFullPath": "/contributor/repository/blob/main/thing2",
"repoOption4": null,
"repoOption5": null,
"repoOption6": null,
"repoOption7": null
},
"repoOption8": {
"repoOption9": "/settings/dismiss-notice/repoOption10",
"releasePath": "/contributor/repository/releases/new=true",
"repoOption11": false,
"repoOption12": false
},
"rawBlobUrl": "https://repurl.com/contributor/repository/raw/main/thing2",
"repoOption13": false,
"richText": null,
"renderedFileInfo": null,
"shortPath": null,
"tabSize": 8,
"topBannersInfo": {
"overridingGlobalFundingFile": false,
"universalPath": null,
"repoOwner": "contributor",
"repoName": "repository",
"repoOption14": false,
"citationHelpUrl": "https://docs.repurl.com/en/repurl/archiving/about",
"repoOption15": false,
"repoOption16": null
},
"truncated": false,
"viewable": true,
"workflowRedirectUrl": null,
"symbols": {
"timedOut": false,
"notAnalyzed": true,
"symbols": []
}
},
"collabInfo": null,
"collabMod": false,
"wtsdf_signifier": {
"/contributor/repository/branches": {
"post": "identifier"
},
"/repos/preferences": {
"post": "identifier"
}
}
},
"title": "repository/thing2 at main \\u0000 contributor/repository"
}
字符串
下面是处理由一个单词组成的属性名称的代码(去除空格的命令只导入由多个单词组成的名称的第一个单词):
import json
import pandas as pd
f = open("yourJson.json", "r")
data = json.load(f)
f.close()
# Get what we want to extract from the json
to_extract = data["payload"]["blob"]["rawLines"]
# Remove useless whitespace
stripped = [e.strip() for e in to_extract]
trimmed = [" ".join(e.split()) for e in stripped]
# Transform the list of string to a dict
as_dict = {e.split(' ')[0]: e.split(' ')[1] for e in trimmed}
# Load the dict with pandas
df = pd.DataFrame(as_dict.items(), columns=['Value', 'Property'])
型
我已经尝试了各种解决方案(例如,不剥离空白,指定与我需要的数据相关联的确切属性名称),但我对JSON如此迷失,以至于错误没有意义。
2条答案
按热度按时间goucqfw61#
让我们将您的示例分解为两行数据。
字符串
这将为我们提供清理后的数据:
第一个月
在代码的下一部分中,您将拆分此列表中的字符串并构造字典。让我们看看我们在这里得到了什么:
型
生成的列表如下所示
型
正如你所看到的,第二个字符串被拆分成了一个包含3个部分的列表,而第三个部分(索引
2
)在你的代码中丢失了。你可以再次将这些部分连接在一起,但有一个更简单的方法。split
方法有一个maxsplit
参数,我们可以使用它来只进行一次拆分。型
两个列表现在都只有2个条目。
型
所以你只需要改变你的旧代码
as_dict = {e.split(' ')[0]: e.split(' ')[1] for e in trimmed}
至as_dict = {e.split(' ')[0]: e.split(' ', 1)[1] for e in trimmed}
的值。“此外,还应:我不喜欢我们做两次
split
,而且在构造trimmed
时,先拆分然后重新连接字符串似乎也太麻烦了。我们可以抛开中间创建的
stripped
和trimmed
,并将所有这些归结为:as_dict = dict(line.strip().split(None, 1) for line in to_extract)
个其结果是:
{'C_1H_4': 'Methane', '5.00000': 'Property1_word1 Property1_word2'}
个8cdiaqws2#
你可以在json键中使用空格,如果这是你的问题,它不是无效的。
字符串
另外,如果你想从字符串中删除不需要的空格,你可以使用这个:
型
如果你能在一个问题中编辑所有的材料,而不参考旧的问题,那么看到问题和代码就更容易了。