按日期对Json Object中的数组排序,而不反序列化对象

n8ghc7c1  于 2023-05-02  发布在  其他
关注(0)|答案(2)|浏览(81)

我有一个JSON对象,它看起来像

{
  "results": [
    {
      "id": "abc456",
      "groups": [
        {
          "parent_group": null,
          "type": "D"
        },
        {
          "parent_group": null,
          "type": "DEPARTMENT"
        }
      ],
      "examples": [
        {
          "id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
          "date": "2020-05-10T00:00:00Z"
        },
        {
          "id": "bd31d475-6137-4409-8d17-535f1bf94071",
          "date": "2021-05-11T00:00:00Z"
        },
        {
          "id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
          "date": "2019-05-11T00:00:00Z"
        }
      ]
    },
    {
      "id": "def456",
      "groups": [
        {
          "parent_group": null,
          "type": "D"
        },
        {
          "parent_group": null,
          "type": "D"
        }
      ],
      "examples": [
        {
          "id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
          "date": "2020-05-10T00:00:00Z"
        },
        {
          "id": "bd31d475-6137-4409-8d17-535f1bf94071",
          "date": "2021-05-11T00:00:00Z"
        },
        {
          "id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
          "date": "2019-05-11T00:00:00Z"
        }
      ]
    }
  ]
}

我必须对result对象中示例数组中的项进行排序,并以JSON line format的格式返回它们。
我现在拥有的解决方案迭代结果对象中的每个数组,并按日期对示例数组进行排序,然后替换

var jsonlBuilder = new StringBuilder();
var serializer = JsonSerializer.CreateDefault(new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc });

using (var textWriter = new StringWriter(jsonlBuilder))
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
    foreach (var obj in jsonArray)
    {
        var employments = obj.SelectToken("examples");
        if (employments.Count() > 1)
        {
            var b = employments.ToObject<JArray>().OrderBy(c => c.SelectToken("date").ToObject<DateTime>(serializer));
            var newEmploymentArray = new JArray(b);
            obj["examples"].Replace(newEmploymentArray);
        }
        obj.WriteTo(jsonWriter);
        jsonWriter.WriteWhitespace("\n");
    }
}

这不是很好。如果没有if (employments.Count() > 1)块中的代码,则需要大约6ms,如果有if块,则需要30ms。有没有更好的办法?

jyztefdp

jyztefdp1#

使用所示的JSON,您的代码的性能并不像您所说的那么差。我看到:

  • 0.0629毫秒平均运行时间/代表为10000代表您的当前代码。
  • 10000次重复的平均运行时间/重复为0.0246 ms(去除排序)。

演示小提琴#1 here
话虽如此,你可以做一些改进:
1.将所有工作移到OrderBy()之外,因为通常情况下,比较将被调用n*log(n)次。
1.使用JTokenindexer而不是SelectToken()。索引器只需要字典查找,而SelectToken()将首先将传入的字符串解析为JSONPath组件,然后根据当前标记评估每个组件,最终执行相同的字典查找。
1.不是为每个"date"值调用序列化器,而是通过使用DateTimeZoneHandling.Utc + DateParseHandling.DateTime反序列化JToken层次结构来仅调用一次序列化器。如果这样做,DateTime值将在阅读过程中被识别,随后将不需要串行化程序。
1.避免克隆JTokens。当执行employments.ToObject<JArray>()时,您有效地克隆了数组的内容。无论何时将JToken添加到父令牌而不将其从当前父令牌中删除,您都可以克隆JToken。(参见this answer以了解原因。)
1.从文件或流阅读时,请确保直接从流反序列化,而不是加载到中间字符串中,如性能提示中所述:优化内存使用。
还可以考虑直接写入文件,而不是写入中间StringBuilder
1.如果你的JSON有一个固定的模式,你可以考虑设计一个相应的数据模型并反序列化它。根据mwatson的11 Ways to Improve JSON Performance & Usage,解析到JToken层次结构可能比反序列化到某些数据模型慢20%。
把#1 - #5放在一起,你的代码可以重写如下:

// Deserialize with DateTimeZoneHandling.Utc.
// This recogizes all DateTime values automatically and populates then in the JToken hierarchy, thereby avoiding the need to deserialize each one individually
var settings = new JsonSerializerSettings { 
    DateTimeZoneHandling = DateTimeZoneHandling.Utc, 
    DateParseHandling = DateParseHandling.DateTime 
};

// Deserialize directly from stream (if reading from file) rather than loading into a string.
using var textReader = new StringReader(json); // If reading from a file, use a StreamReader and read directly.
using var jsonReader = new JsonTextReader(textReader);
    var root = JsonSerializer.CreateDefault(settings).Deserialize<JToken>(jsonReader);
var jsonArray = (JArray)root["results"];

var jsonlBuilder = new StringBuilder();
using (var textWriter = new StringWriter(jsonlBuilder))  // If writing to a file, use a StreamWriter and write directly.
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
    foreach (var obj in jsonArray)
    {
        var employments = obj is JObject ? obj["examples"] as JArray : null; // Use indexer instead of SelectToken()
        if (employments.Count > 1) // Use the Count property rather than the LINQ Count() extension method, https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1829
        {
            var sortedList = employments
                .Select(e => (e, date : e["date"].Value<DateTime>())) // Use the indexer and cache the DateTime value
                .OrderBy(p => p.date).Select(p => p.e)                // And sort by the cached value
                .ToList();
            employments.Clear();  // Prevent cloning of JTokens by clearing the array before re-adding the items.
            foreach (var item in sortedList)
                employments.Add(item); // Add the existing items rather than creating new items.
        }
        obj.WriteTo(jsonWriter);
        jsonWriter.WriteWhitespace("\n");
    }
}

或者,使用方法#6,您的数据模型将如下所示:

public class Example
{
    public string id { get; set; }
    public DateTime date { get; set; }
}

public class Group
{
    public object parent_group { get; set; }
    public string type { get; set; }
}

public class Result
{
    public string id { get; set; }
    public List<Group> groups { get; set; } = new ();
    public List<Example> examples { get; set; } = new ();
}

public class Root
{
    public List<Result> results { get; set; } = new ();
}

你的代码:

var settings = new JsonSerializerSettings { 
    DateTimeZoneHandling = DateTimeZoneHandling.Utc, 
};
var serializer = JsonSerializer.CreateDefault(settings);

// Deserialize directly from stream (if reading from file) rather than loading into a string.
using var textReader = new StringReader(json); // If reading from a file, use a StreamReader and read directly.
using var jsonReader = new JsonTextReader(textReader); 
    var root = serializer.Deserialize<Root>(jsonReader);

var jsonArray = root.results;

var jsonlBuilder = new StringBuilder();
using (var textWriter = new StringWriter(jsonlBuilder)) // If writing to a file, use a StreamWriter and write directly.
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
    foreach (var obj in jsonArray)
    {
        if (obj?.examples != null && obj.examples.Count > 0)
            obj.examples.Sort((x, y) => x.date.CompareTo(y.date));
        serializer.Serialize(jsonWriter, obj);
        jsonWriter.WriteWhitespace("\n");
    }
}

通过这些更改,我看到两个方法的平均运行时间如下:

  • 10000次重复原始代码的平均每次重复时间:0.0590毫秒。
  • 10000次修改代码的平均重复时间:0.0293ms(50.快41%)。
  • 10000次重复的反序列化数据模型的平均每次重复时间:0.0228ms(61.快33%)。

Demo fiddle #2 here .

2admgd59

2admgd592#

这个代码对我有用

string[] lines = File.ReadLines(@"C:\...").ToArray();

    //or if you have already text from another source
    string[] lines = text.Split("\r\n");

    var arrStart = false;
    List<int> indexes = new();
    List<KeyValuePair<DateTime, string[]>> dates = new();
    for (int i = 0; i < lines.Length; i++)
    {
        if (lines[i].Contains("examples"))
        {
            arrStart = true;
            continue;
        }
        if (arrStart && lines[i].Contains("date"))
        {
            DateTime dateTime = (DateTime)JObject.Parse("{" + ((string)lines[i])
                                                 .Trim()
                                                 .Replace("\"\"", "\"") + "}")["date"];
                                                 
            //Or if you don't want to use any serializer
            //var l = ((string)lines[i]).Replace("\"", "").Trim();
            //var s = l.Substring(l.IndexOf(":")+1).Replace("\"\"", "\"");
            //var dateTime1 = Convert.ToDateTime(s);
        
            dates.Add(new KeyValuePair<DateTime, string[]>(dateTime, new string[] { lines[i - 1], lines[i] }));
            indexes.Add(i);
        }
        else if (arrStart && lines[i].Contains("]"))
        {
            arrStart = false;
            dates = dates.OrderBy(x => x.Key).ToList();

            var j = 0;
            foreach (var index in indexes)
            {
                lines[index - 1] = dates[j].Value[0];
                lines[index] = dates[j].Value[1];
                j++;
            }
            dates.Clear();
            indexes.Clear();
        }
    }
    var text = string.Join("\r\n", lines);

相关问题