使用java将嵌套记录写入bigquery

rsaldnfx  于 2021-07-09  发布在  Java
关注(0)|答案(1)|浏览(288)

我想使用apachebeam将一些嵌套数据写入bigquery,并想知道模式是否´我们为bigquery创建的表是正确的。在这里´这是我的数据在xml中的外观:

<ID>5<ID>
<Addresses>
    <Address>
        <Street>Lincoln St.</Street>
        <ZipCode>03483</ZipCode>
    </Address>
</Addresses>

以下是我创建bigquery架构以反映上述数据的方式:

[{
    "name": "ID",
    "type": "STRING"
  },
  {
    "name": "Addresses",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {
        "name": "Address",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
          {
            "name": "Street",
            "type": "STRING"
          },
          {
            "name": "ZipCode",
            "type": "STRING"
          }
        ]
      }
    ]
  }]

我就是这样解析上面的结构来创建一个bigquery的 TableRow 在 java 。

List<Address> addresses = getAddresses;

if (!addresses.isEmpty()) {
    List<TableCell> repeatedRecordInstanceList = new ArrayList<>();

    for (Address address : addresses) {
        List<TableCell> childObject = new ArrayList<>();

        if (address.getStreet() != null) {
            childObject.add(new TableCell().set("Street", address.getStreet()));
        } else { childObject.add(new TableCell().set("Street", null)); }

        if (address.getZipCode() != null) {
            childObject.add(new TableCell().set("ZipCode", address.getZipCode()));
        } else { childObject.add(new TableCell().set("ZipCode", null)); }

        repeatedRecordInstanceList.add(new TableCell().set("Address", childObject));
    }
    tableRow.set("Addresses", repeatedRecordInstanceList);
} else {
        tableRow.set("Addresses", null);
  }

但出于某种原因,我的数据在bigquery中是这样的:
idaddresses.address.streetaddresses.address.zipcode5lincoln stNull03483地址
好像每个人 Address ,的 Street 以及 ZipCode 在两次迭代中完成。
我两个都要 Street 及其各自的 ZipCode 在同一行中没有任何空值。我该怎么做?我会很感激你的帮助。谢谢。

zpqajqem

zpqajqem1#

据我所知,您的代码生成的对象可以是json格式的:

{
  "ID" : "5",
  "Addresses" : [
    { "Address" : [{"Street" : "abc", "ZipCode": "1564"},
                   {"Street" : "abd", "ZipCode": "1565"}]
    },
    {"Address" : [{"Street" : "abe", "ZipCode": "1566"},
                  {"Street" : "abf", "ZipCode": "1567"},
                  {"Street" : "abg", "ZipCode": "1568"}]
    }
  ]
}

我不认为你想要那个-在“地址”中可以有多个地址,然后在“地址”中有多个“地址”。我觉得“地址”不应该是
mode REPEATED (这意味着它是一个数组)。这也意味着 childObject 不应该是arraylist,因为如果添加新元素,则会向数组中添加新条目,不是吗?

相关问题