用文本限定符加载

6ovsh4lw  于 2021-06-21  发布在  Pig
关注(0)|答案(1)|浏览(718)

我正在尝试用pig拉丁语脚本加载数据文件,数据有2列,但第2列中有一个文本限定符,示例数据如下:

  1. DEVICE_ID,SUPPORTED_TECH
  2. a2334,"GSM900,GSM1500,GSM200"
  3. a54623,"GSM900,GSM1500"
  4. a86646,"GSM1500,GSM200"

当我尝试按如下方式加载日期时,第2列不被识别为1列

  1. deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );

加载数据集时如何定义文本限定符?

guicsvcw

guicsvcw1#

试试这个,如果你需要不同的输出格式请告诉我
输入文件

  1. DEVICE_ID,SUPPORTED_TECH
  2. a2334,"GSM900,GSM1500,GSM200"
  3. a54623,"GSM900,GSM1500"
  4. a86646,"GSM1500,GSM200

Pig手稿:

  1. A = LOAD 'input.txt' AS line;
  2. deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
  3. DUMP deviceList;

输出:

  1. (DEVICE_ID,SUPPORTED_TECH)
  2. (a2334,"GSM900,GSM1500,GSM200")
  3. (a54623,"GSM900,GSM1500")
  4. (a86646,"GSM1500,GSM200")
展开查看全部

相关问题