字符串数组的JSON提取器

bihw5rsg  于 2023-08-08  发布在  其他
关注(0)|答案(2)|浏览(133)

在Riak中,我有这个基本的user模式,附带一个user索引(我省略了riak特定的字段,如_yz_id等):

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="user" version="1.5">

 <fields>
   <field name="email"    type="string"   indexed="true"  stored="false"/>   
   <field name="name"     type="string"   indexed="true"  stored="false"/>   
   <field name="groups"   type="string"   indexed="true"  stored="false" multiValued="true"/>

   <dynamicField name="*" type="ignored"  indexed="false" stored="false" multiValued="true"/>

   ..riak-specific fields.. 

 </fields>

 <uniqueKey>_yz_id</uniqueKey>                                                 

 <types>                                                                       
   <fieldType name="string"  class="solr.StrField"     sortMissingLast="true"/>
   <fieldType name="_yz_str" class="solr.StrField"     sortMissingLast="true"/>
   <fieldtype name="ignored" class="solr.StrField"/>                           
 </types>

</schema>

字符串
我的用户JSON看起来像这样:

{
   "name" : "John Smith",
   "email" : "jsmith@gmail.com",
   "groups" : [
      "3304cf79",
      "abe155cf"
   ]
}


当我尝试使用此查询进行搜索时:

curl http://localhost:10018/search/query/user?wt=json&q=groups:3304cf79


我没有得到docs
为什么会这样呢?JSON提取器是否为组创建索引条目?

mgdq6dx1

mgdq6dx11#

架构正确。问题是它不是我用来设置bucket属性的原始schema。Yokozuna GitHub上的This问题是罪魁祸首。我在插入新数据后更新了模式,以为索引会重新加载。目前,他们没有。

n6lpvg4x

n6lpvg4x2#

这个怎么样?你可以一次提取所有的基础,它是通用的

import json
import numpy as np
import pandas as pd
from jsonpath_ng import jsonpath, parse

def explode_list(df, col):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.len())
    return df.iloc[i].assign(**{col: np.concatenate(s)})

def process_json_data(data_file, mapping_file, root):
    # Load the JSON data
    with open(data_file) as f:
        data = json.load(f)

    # Load the mapping
    with open(mapping_file) as f:
        mapping = json.load(f)

    # Prepare an empty dataframe to hold the results
    df = pd.DataFrame()

    # Iterate over each datapoint in the data file
    for i, datapoint in enumerate(data[root]):
        # Prepare an empty dictionary to hold the results for this datapoint
        datapoint_dict = {}
        # Iterate over each field in the mapping file
        for field, path in mapping.items():
            # Prepare the JSONPath expression
            jsonpath_expr = parse(path)
            # Find the first match in the datapoint
            match = jsonpath_expr.find(datapoint)
            if match:
                # If a match was found, add it to the dictionary
                datapoint_dict[field] = [m.value for m in match]
            else:
                # If no match was found, add 'no path' to the dictionary
                datapoint_dict[field] = ['no path']

        # Create a temporary dataframe for this datapoint
        frames = [pd.DataFrame({k: np.repeat(v, max(map(len, datapoint_dict.values())))}) for k, v in datapoint_dict.items()]
        temp_df = pd.concat(frames, axis=1)

        # Identify list-like columns and explode them
        while True:
            list_cols = [col for col in temp_df.columns if any(isinstance(i, list) for i in temp_df[col])]
            if not list_cols:
                break
            for col in list_cols:
                temp_df = explode_list(temp_df, col)

        # Append the temporary dataframe to the main dataframe
        df = df.append(temp_df)

    df.reset_index(drop=True, inplace=True)
    return df.style.set_properties(**{'border': '1px solid black'})

# Calling the function
df = process_json_data('/content/jsonShredd/data.json', '/content/jsonShredd/mapping.json', 'datapoints')
df

字符串

相关问题