处理段落
文件内容如下:
{ "ent_id" : MinKey, "_id" : MinKey } -->> {
"ent_id" : NumberLong("aaaaa"),
"_id" : ObjectId("bbbbb")
} on : shard04 Timestamp(685, 0)
{
"ent_id" : NumberLong("ccccc"),
"_id" : ObjectId("ddddd")
} -->> {
"ent_id" : NumberLong("eeeee"),
"_id" : ObjectId("fffff")
} on : shard04 Timestamp(331, 1)
{
"ent_id" : NumberLong("ggggg"),
"_id" : ObjectId("hhhhh")
} -->> {
"ent_id" : NumberLong("iiiii"),
"_id" : ObjectId("jjjjj")
} on : shard04 Timestamp(680, 0)
期望结果:
MinKey,MinKey,NumberLong("aaaaa"),ObjectId("bbbbb"),shard04
NumberLong("ccccc"),ObjectId("ddddd"),NumberLong("eeeee"),ObjectId("fffff"),shard04
NumberLong("ggggg"),ObjectId("hhhhh"),NumberLong("iiiii"),ObjectId("jjjjj"),shard04
awk代码:
BEGIN{
# 以Timestamp...为输入记录分隔符,一次读取一段
RS=" Timestamp\\([0-9]+, [0-9]\\)"
}
{
# 将一段中所有冒号后的内容保存到数组
patsplit($0,arr,": ([0-9a-zA-Z\"\\(\\)])+")
for(i in arr){
# 移除冒号,并使用逗号分隔串联各元素
str = str gensub(": ","","g",arr[i])","
}
# 移除尾部逗号
print(substr(str,1,length(str)-1))
str=""
}
使用Perl或Ruby则更简单:
perl -0nE 'BEGIN{$,=","}say $& =~ /: \K[^\s,]+/g while /{.*?} on : \S+/sg' test.log
ruby -ne 'BEGIN{$/=nil};$_.scan(/{.*?} on : \S+/m){|s|puts s.scan(/: \K[^\s,]+/).join(",")}' test.log