指定した一致に基づいてファイルから重複を削除する

指定した一致に基づいてファイルから重複を削除する

重複を削除するファイルには、次のサンプルテキストがあります。最後の目標は、このファイル(そのうちの1つはweb:webapi)からすべての重複インスタンスを削除することです。

このファイルは600MB以上のファイルです。

"nirmal" -> ["app:am","app:am","app:identity_gateway","app:identity_gateway","app:loginsvc","app:loginsvc","app:loginui","app:loginui","app:ticket","app:ticket","app:webapi","app:webapi","ds:config_store","ds:config_store","ds:cts_store","ds:cts_store","ds:user_store","ds:user_store","web:am","web:am","web:identity_gateway","web:identity_gateway","web:loginsvc","web:loginsvc","web:loginui","web:loginui","web:ticket","web:ticket","web:webapi","web:webapi"];
"mbl" -> ["app:phx","web:phx","app:vas","development:mobile","s2:detsvc","s2core:detsvc","txn:detsvc","web:detsvc","app:fidoproxy","app:landing","app:mobile","app:noknok","app:optchart","app:redis","app:sentinel","app:spring","cws:mesg","cws3:wsproxy","s2:billpay","s2:services","s2core:billpay","s2core:services","web:fidoproxy","web:spring","at:admin","at:eqsroll","at:oqsroll","batch:admin","cws:ctnt","cws:risk","cws:user","cws3:acctaggtr","cws3:content","cws3:risk","cws3:rtao","cws3:rtmm","ets:ord","fhs:eqs","fhs:oqs","s2:aarcomm","s2:acctcomm","s2:espsvc","s2:ibsvc","s2core:aarcomm","s2core:espsvc","s2core:ibsvc","txb:b2bsvc","txn:acct","txn:ibank2","txn:olsvc","txn:rtmm","txn:services","txn:wtools","web:aempros_mpublish","web:b2b","web:etsecxml","web:ibxml","web:olxml","web:prospect","web:tablet","web:ticket","web:wtxml","web:xmlacct","web:xmlrtmm","s2:asset","s2core:asset","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg","app:phxcfgsvr","app:phxdshbrd","app:webapiagg","s2core:mblsvc","s2core:snapquotes","s2:mblsvc","s2:snapquotes","web:landing","web:mobile","web:phxcfgsvr","web:phxdshbrd","web:webapiagg"];

Linuxではこれをどのように実行しますか?

各行に同じ形式のテキストを含む完全なファイル。各ファイルで最初の文字列(「->」で区切られた)を検索し、その値でコンマで区切られた重複エントリを見つけようとします。重複したコンテンツが見つかった場合は削除する必要があります。

ベストアンサー1

そしてsed

sed -e :1 -e 's/\("[^",]*"\)\(.*\),\1/\1\2/;t1'
  • :1ループを表示するジャンプマーカー
  • "[^",]*"フィールドです。パターンからカンマを除外すると、パターンは","フィールドとして扱われません。このフィールドを入力すると、\(\)同じフィールドを再参照できます。\1
  • このsコマンドは、カンマで同じフィールドの2番目の発生を削除します。
  • 置換が行われると、tコマンドは最初のジャンプマーカーにジャンプします。

おすすめ記事