处理段落

文件内容如下:

  1. { "ent_id" : MinKey, "_id" : MinKey } -->> {
    "ent_id" : NumberLong("aaaaa"),
    "_id" : ObjectId("bbbbb")
    } on : shard04 Timestamp(685, 0)
    {
    "ent_id" : NumberLong("ccccc"),
    "_id" : ObjectId("ddddd")
    } -->> {
    "ent_id" : NumberLong("eeeee"),
    "_id" : ObjectId("fffff")
    } on : shard04 Timestamp(331, 1)
    {
    "ent_id" : NumberLong("ggggg"),
    "_id" : ObjectId("hhhhh")
    } -->> {
    "ent_id" : NumberLong("iiiii"),
    "_id" : ObjectId("jjjjj")
    } on : shard04 Timestamp(680, 0)

期望结果:

  1. MinKey,MinKey,NumberLong("aaaaa"),ObjectId("bbbbb"),shard04
    NumberLong("ccccc"),ObjectId("ddddd"),NumberLong("eeeee"),ObjectId("fffff"),shard04
    NumberLong("ggggg"),ObjectId("hhhhh"),NumberLong("iiiii"),ObjectId("jjjjj"),shard04

awk代码:

  1. BEGIN{
    # 以Timestamp...为输入记录分隔符,一次读取一段
    RS=" Timestamp\\([0-9]+, [0-9]\\)"
    }
    {
    # 将一段中所有冒号后的内容保存到数组
    patsplit($0,arr,": ([0-9a-zA-Z\"\\(\\)])+")
    for(i in arr){
    # 移除冒号,并使用逗号分隔串联各元素
    str = str gensub(": ","","g",arr[i])","
    }
    # 移除尾部逗号
    print(substr(str,1,length(str)-1))
    str=""
    }

使用Perl或Ruby则更简单:

  1. perl -0nE 'BEGIN{$,=","}say $& =~ /: \K[^\s,]+/g while /{.*?} on : \S+/sg' test.log
    ruby -ne 'BEGIN{$/=nil};$_.scan(/{.*?} on : \S+/m){|s|puts s.scan(/: \K[^\s,]+/).join(",")}' test.log