按日期对Json Object中的数组排序，而不反序列化对象

n8ghc7c1 于 2023-05-02 发布在其他

关注(0)|答案(2)|浏览(81)

我有一个JSON对象，它看起来像

{
  "results": [
    {
      "id": "abc456",
      "groups": [
        {
          "parent_group": null,
          "type": "D"
        },
        {
          "parent_group": null,
          "type": "DEPARTMENT"
        }
      ],
      "examples": [
        {
          "id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
          "date": "2020-05-10T00:00:00Z"
        },
        {
          "id": "bd31d475-6137-4409-8d17-535f1bf94071",
          "date": "2021-05-11T00:00:00Z"
        },
        {
          "id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
          "date": "2019-05-11T00:00:00Z"
        }
      ]
    },
    {
      "id": "def456",
      "groups": [
        {
          "parent_group": null,
          "type": "D"
        },
        {
          "parent_group": null,
          "type": "D"
        }
      ],
      "examples": [
        {
          "id": "e13b1e97-31e3-46e6-9d8f-9776c52e5ce0",
          "date": "2020-05-10T00:00:00Z"
        },
        {
          "id": "bd31d475-6137-4409-8d17-535f1bf94071",
          "date": "2021-05-11T00:00:00Z"
        },
        {
          "id": "0e0806ba-56f6-4527-8fd7-7e0061e30783",
          "date": "2019-05-11T00:00:00Z"
        }
      ]
    }
  ]
}

我必须对result对象中示例数组中的项进行排序，并以JSON line format的格式返回它们。
我现在拥有的解决方案迭代结果对象中的每个数组，并按日期对示例数组进行排序，然后替换

var jsonlBuilder = new StringBuilder();
var serializer = JsonSerializer.CreateDefault(new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc });

using (var textWriter = new StringWriter(jsonlBuilder))
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
    foreach (var obj in jsonArray)
    {
        var employments = obj.SelectToken("examples");
        if (employments.Count() > 1)
        {
            var b = employments.ToObject<JArray>().OrderBy(c => c.SelectToken("date").ToObject<DateTime>(serializer));
            var newEmploymentArray = new JArray(b);
            obj["examples"].Replace(newEmploymentArray);
        }
        obj.WriteTo(jsonWriter);
        jsonWriter.WriteWhitespace("\n");
    }
}

这不是很好。如果没有if (employments.Count() > 1)块中的代码，则需要大约6ms，如果有if块，则需要30ms。有没有更好的办法？

JSON

来源：https://stackoverflow.com/questions/76105021/sort-arrays-within-a-json-object-by-date-without-deserializing-the-object

2条答案

按热度按时间

jyztefdp1#

使用所示的JSON，您的代码的性能并不像您所说的那么差。我看到：

0.0629毫秒平均运行时间/代表为10000代表您的当前代码。
10000次重复的平均运行时间/重复为0.0246 ms（去除排序）。

演示小提琴#1 here。
话虽如此，你可以做一些改进：
1.将所有工作移到OrderBy()之外，因为通常情况下，比较将被调用n*log(n)次。
1.使用JTokenindexer而不是SelectToken()。索引器只需要字典查找，而SelectToken()将首先将传入的字符串解析为JSONPath组件，然后根据当前标记评估每个组件，最终执行相同的字典查找。
1.不是为每个"date"值调用序列化器，而是通过使用DateTimeZoneHandling.Utc + DateParseHandling.DateTime反序列化JToken层次结构来仅调用一次序列化器。如果这样做，DateTime值将在阅读过程中被识别，随后将不需要串行化程序。
1.避免克隆JTokens。当执行employments.ToObject<JArray>()时，您有效地克隆了数组的内容。无论何时将JToken添加到父令牌而不将其从当前父令牌中删除，您都可以克隆JToken。（参见this answer以了解原因。）
1.从文件或流阅读时，请确保直接从流反序列化，而不是加载到中间字符串中，如性能提示中所述：优化内存使用。
还可以考虑直接写入文件，而不是写入中间StringBuilder。
1.如果你的JSON有一个固定的模式，你可以考虑设计一个相应的数据模型并反序列化它。根据mwatson的11 Ways to Improve JSON Performance & Usage，解析到JToken层次结构可能比反序列化到某些数据模型慢20%。
把#1 - #5放在一起，你的代码可以重写如下：

// Deserialize with DateTimeZoneHandling.Utc.
// This recogizes all DateTime values automatically and populates then in the JToken hierarchy, thereby avoiding the need to deserialize each one individually
var settings = new JsonSerializerSettings { 
    DateTimeZoneHandling = DateTimeZoneHandling.Utc, 
    DateParseHandling = DateParseHandling.DateTime 
};

// Deserialize directly from stream (if reading from file) rather than loading into a string.
using var textReader = new StringReader(json); // If reading from a file, use a StreamReader and read directly.
using var jsonReader = new JsonTextReader(textReader);
    var root = JsonSerializer.CreateDefault(settings).Deserialize<JToken>(jsonReader);
var jsonArray = (JArray)root["results"];

var jsonlBuilder = new StringBuilder();
using (var textWriter = new StringWriter(jsonlBuilder))  // If writing to a file, use a StreamWriter and write directly.
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
    foreach (var obj in jsonArray)
    {
        var employments = obj is JObject ? obj["examples"] as JArray : null; // Use indexer instead of SelectToken()
        if (employments.Count > 1) // Use the Count property rather than the LINQ Count() extension method, https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1829
        {
            var sortedList = employments
                .Select(e => (e, date : e["date"].Value<DateTime>())) // Use the indexer and cache the DateTime value
                .OrderBy(p => p.date).Select(p => p.e)                // And sort by the cached value
                .ToList();
            employments.Clear();  // Prevent cloning of JTokens by clearing the array before re-adding the items.
            foreach (var item in sortedList)
                employments.Add(item); // Add the existing items rather than creating new items.
        }
        obj.WriteTo(jsonWriter);
        jsonWriter.WriteWhitespace("\n");
    }
}

或者，使用方法#6，您的数据模型将如下所示：

public class Example
{
    public string id { get; set; }
    public DateTime date { get; set; }
}

public class Group
{
    public object parent_group { get; set; }
    public string type { get; set; }
}

public class Result
{
    public string id { get; set; }
    public List<Group> groups { get; set; } = new ();
    public List<Example> examples { get; set; } = new ();
}

public class Root
{
    public List<Result> results { get; set; } = new ();
}

你的代码：

var settings = new JsonSerializerSettings { 
    DateTimeZoneHandling = DateTimeZoneHandling.Utc, 
};
var serializer = JsonSerializer.CreateDefault(settings);

// Deserialize directly from stream (if reading from file) rather than loading into a string.
using var textReader = new StringReader(json); // If reading from a file, use a StreamReader and read directly.
using var jsonReader = new JsonTextReader(textReader); 
    var root = serializer.Deserialize<Root>(jsonReader);

var jsonArray = root.results;

var jsonlBuilder = new StringBuilder();
using (var textWriter = new StringWriter(jsonlBuilder)) // If writing to a file, use a StreamWriter and write directly.
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.None })
{
    foreach (var obj in jsonArray)
    {
        if (obj?.examples != null && obj.examples.Count > 0)
            obj.examples.Sort((x, y) => x.date.CompareTo(y.date));
        serializer.Serialize(jsonWriter, obj);
        jsonWriter.WriteWhitespace("\n");
    }
}

通过这些更改，我看到两个方法的平均运行时间如下：

10000次重复原始代码的平均每次重复时间：0.0590毫秒。
10000次修改代码的平均重复时间：0.0293ms（50.快41%）。
10000次重复的反序列化数据模型的平均每次重复时间：0.0228ms（61.快33%）。

Demo fiddle #2 here .

赞(0）回复(0）举报 2023-05-02

2admgd592#

这个代码对我有用

string[] lines = File.ReadLines(@"C:\...").ToArray();

    //or if you have already text from another source
    string[] lines = text.Split("\r\n");

    var arrStart = false;
    List<int> indexes = new();
    List<KeyValuePair<DateTime, string[]>> dates = new();
    for (int i = 0; i < lines.Length; i++)
    {
        if (lines[i].Contains("examples"))
        {
            arrStart = true;
            continue;
        }
        if (arrStart && lines[i].Contains("date"))
        {
            DateTime dateTime = (DateTime)JObject.Parse("{" + ((string)lines[i])
                                                 .Trim()
                                                 .Replace("\"\"", "\"") + "}")["date"];
                                                 
            //Or if you don't want to use any serializer
            //var l = ((string)lines[i]).Replace("\"", "").Trim();
            //var s = l.Substring(l.IndexOf(":")+1).Replace("\"\"", "\"");
            //var dateTime1 = Convert.ToDateTime(s);
        
            dates.Add(new KeyValuePair<DateTime, string[]>(dateTime, new string[] { lines[i - 1], lines[i] }));
            indexes.Add(i);
        }
        else if (arrStart && lines[i].Contains("]"))
        {
            arrStart = false;
            dates = dates.OrderBy(x => x.Key).ToList();

            var j = 0;
            foreach (var index in indexes)
            {
                lines[index - 1] = dates[j].Value[0];
                lines[index] = dates[j].Value[1];
                j++;
            }
            dates.Clear();
            indexes.Clear();
        }
    }
    var text = string.Join("\r\n", lines);

赞(0）回复(0）举报 2023-05-02

我来回答

按日期对Json Object中的数组排序，而不反序列化对象

2条答案

相关问题

热门标签

最新问答