次のファイルがあります(list_20.txt)。
[{"d_prime":"0.475425","variation1":"rs909776","r2":"0.057940","variation2":"rs16991816","population_name":"1000GENOMES:phase_3:KHV"}]
[{"r2":"0.057940","variation1":"rs909776","d_prime":"0.475425","population_name":"1000GENOMES:phase_3:KHV","variation2":"rs16991819"}]
[{"variation1":"rs909776","r2":"0.078476","d_prime":"0.546491","population_name":"1000GENOMES:phase_3:KHV","variation2":"rs8114269"}]
[{"population_name":"1000GENOMES:phase_3:KHV","variation2":"rs8114269","r2":"0.073418","variation1":"rs6130034","d_prime":"0.528588"}]
[{"population_name":"1000GENOMES:phase_3:KHV","variation2":"rs1201686","r2":"0.060239","variation1":"rs3746539","d_prime":"0.271891"}]
[{"variation2":"rs1201686","population_name":"1000GENOMES:phase_3:KHV","d_prime":"0.280262","r2":"0.058212","variation1":"rs2144011"}]
[{"population_name":"1000GENOMES:phase_3:KHV","variation2":"rs10485662","r2":"0.058826","variation1":"rs844808","d_prime":"0.423639"}]
[{"variation2":"rs6065565","population_name":"1000GENOMES:phase_3:KHV","d_prime":"0.638509","r2":"0.110749","variation1":"rs6139746"}]
[{"r2":"0.110749","variation1":"rs6139746","d_prime":"0.638509","population_name":"1000GENOMES:phase_3:KHV","variation2":"rs6072936"}]
[{"population_name":"1000GENOMES:phase_3:KHV","variation2":"rs6065562","variation1":"rs6139746","r2":"0.091021","d_prime":"0.606214"}]
[{"variation1":"rs6139746","r2":"0.910749","d_prime":"0.638509","population_name":"1000GENOMES:phase_3:KHV","variation2":"rs6072937"}]
...
"r2":" の後に値が 0.7 より大きく、1 以下の行だけを抽出したいと思います。
この例で予想される出力は次の行です。
[{"variation1":"rs6139746","r2":"0.910749","d_prime":"0.638509","population_name":"1000GENOMES:phase_3:KHV","variation2":"rs6072937"}]
私はこれを試しました:
awk '$NF >= 0.8 && $NF <1 {print $0}' list_20.txt > 20.out
しかし、空のファイルを取得します。さらに、このコマンドは関心文字列 "r2"に限定されません。
ベストアンサー1
これはJSONと似ているため、コマンドラインJSONパーサーを使用してください。
$ jq '.[] | select((.r2|tonumber) > 0.7 and (.r2|tonumber) <= 1)' file
{
"variation1": "rs6139746",
"r2": "0.910749",
"d_prime": "0.638509",
"population_name": "1000GENOMES:phase_3:KHV",
"variation2": "rs6072937"
}
r2
キー値を文字列から正しい数値に変換するために使用する必要がありますtonumber
が、それ以外は単純なフィルタですselect()
。
わずかに短縮したり、少なくとも各数値を変換することを避けることができます。二重と
jq '.[] | (.r2|tonumber) as $r2 | select($r2 > 0.7 and $r2 <= 1)' file
結果が入力と同じ形式である場合は、次のようにします。
$ jq -c '.[] | (.r2|tonumber) as $r2 | select($r2 > 0.7 and $r2 <= 1) | [.]' file
[{"variation1":"rs6139746","r2":"0.910749","d_prime":"0.638509","population_name":"1000GENOMES:phase_3:KHV","variation2":"rs6072937"}]
つまり、「コンパクトな出力」を要求し、フィルタを介して抽出され-c
た各結果の配列を生成するために使用します。select()
[.]