处理段落

处理段落

文件内容如下：

{ "ent_id" : MinKey, "_id" : MinKey } -->> {
        "ent_id" : NumberLong("aaaaa"),
        "_id" : ObjectId("bbbbb")
} on : shard04 Timestamp(685, 0)
{
        "ent_id" : NumberLong("ccccc"),
        "_id" : ObjectId("ddddd")
} -->> {
        "ent_id" : NumberLong("eeeee"),
        "_id" : ObjectId("fffff")
} on : shard04 Timestamp(331, 1)
{
        "ent_id" : NumberLong("ggggg"),
        "_id" : ObjectId("hhhhh")
} -->> {
        "ent_id" : NumberLong("iiiii"),
        "_id" : ObjectId("jjjjj")
} on : shard04 Timestamp(680, 0)

期望结果：

MinKey,MinKey,NumberLong("aaaaa"),ObjectId("bbbbb"),shard04
NumberLong("ccccc"),ObjectId("ddddd"),NumberLong("eeeee"),ObjectId("fffff"),shard04
NumberLong("ggggg"),ObjectId("hhhhh"),NumberLong("iiiii"),ObjectId("jjjjj"),shard04

awk代码：

BEGIN{
  # 以Timestamp...为输入记录分隔符，一次读取一段
  RS=" Timestamp\\([0-9]+, [0-9]\\)"
}
{
  # 将一段中所有冒号后的内容保存到数组
  patsplit($0,arr,": ([0-9a-zA-Z\"\\(\\)])+")
  for(i in arr){
    # 移除冒号，并使用逗号分隔串联各元素
    str = str gensub(": ","","g",arr[i])","
  }
  # 移除尾部逗号
  print(substr(str,1,length(str)-1))
  str=""
}

使用Perl或Ruby则更简单：

perl -0nE 'BEGIN{$,=","}say $& =~ /: \K[^\s,]+/g while /{.*?} on : \S+/sg' test.log
ruby -ne 'BEGIN{$/=nil};$_.scan(/{.*?} on : \S+/m){|s|puts s.scan(/: \K[^\s,]+/).join(",")}' test.log