html/xmlファイルから特定の単語とそのデータを抽出する

html/xmlファイルから特定の単語とそのデータを抽出する

サンプル入力は

<bre rt="1600" et="1550794901464" st="1550794899864" tid="8390500116294391399" mh="N" cn="" lc="" ts="N/A" cidc="" IDC="" eidc="BRE-S-TRA-0085418501"/>
    <r1>
        <gr1>
            <a="1" b="smaple data with spaces" c="Created TrasctionInfo" d="1550794901228"/>
            <e="INITIAL" f="2" g="INITIAL_LEGACY" h="1550794901228" i="LegacyToggle is off. Follow Legacy flow"/>
            <lx ets="2019-02-22T00:21:41.228Z" trxn="smaple data with spaces 2 record" rn="Derive data" abc="COT def" def="Season occur" trxn="smaple data with spaces 3rd record" den="andys and others" trxn="smaple data with spaces 4th record" kit="Theater - Span day"
             rns="Span day" trxn="smaple data with spaces 5th record" off="|"/>
            <cwl wc="2.0766" tot="16" act="116.28960000000001" CSE="CHE-CSFL" wg1.0" high="1" </cwl>
                </gr1>
            </r1>
</bre>
<bre rt="1234" et="1234794901464" st="1234794899864" tid="2345500116294391399" mh="Y" cn="At123" lc="" ts="NA" cidc="" IDC="some text value" eidc="abc-def-gh-2385418501"/>
    <r1>
        <gr1>
            <a="1" trxn="other data with spaces" c="Created Info" d="3434794545228"/>
            <e="begin" f="2" g="INITIAL_LEGACY" h="1234709901228" i="Toggle hig. Follow toggle flow"/>
            <lx ets="2017-02-22T00:21:41.228Z" trxn="another record data" rn="Derive data" abc="COT def" trxn="smaple data with spaces record" def="Season occur" den="andys and others" trxn="smaple data with spaces 4th record" kit="Theater - Span day"
             rns="Span day" trxn="data with spaces" off="|"/>
            <cwl wc="2.0766" tot="16" act="116.28960000000001" CSE="CHE-CSFL" wg1.0" high="1" </cwl>
                </gr1>
            </r1>
</bre>
<bre rt="1234" et="1234794901464" st="1234794899864" tid="2345500116294391399" mh="Y" cn="At123" lc="" ts="NA" cidc="" IDC="some text value" eidc="abc-def-gh-2385418501"/>
    <r1>
        <gr1>
            <a="1" c="Created transaction" b="3434794545228"/>
            <e="begin" f="2" g="INITIAL_LEGACY" h="1234709901228" i="Toggle hig. Follow toggle flow"/>
            <lx ets="2017-02-22T00:21:41.228Z" rn="Derive data" abc="COT def" def="Season occur" den="andys and others" kit="Theater - Span day"
             rns="Span day" off="|"/>
            <cwl wc="2.0766" tot="16" act="116.28960000000001" CSE="CHE-CSFL" wg1.0" high="1" </cwl>
                </gr1>
            </r1>
</bre>

出力は次のようになります。

tid="8390500116294391399"
ts="N/A"
ets="2019-02-22T00:21:41.228Z" 
trxn="smaple data with spaces 2 record"
trxn="smaple data with spaces 3rd record"
trxn="smaple data with spaces 5th record"
tid="2345500116294391399"
ts="NA"
ets="2017-02-22T00:21:41.228Z" 
trxn="other data with spaces"
trxn="another record data"
trxn="smaple data with spaces record"
trxn="data with spaces"
tid="2345500116294391399"
ts="NA"
ets="2017-02-22T00:21:41.228Z"

私は次のように試しました

sed -e 's/trxn=/\ntrxn=/g' -e 's/tid=/\ntid=/g' -e 's/ts=/\nts=/g'

while IFS= read -r var
do
    if grep -Fxq "$trxn" temp2.txt
    then
      awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /trxn/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt
    else
      awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt
    fi
done < "$input"

ベストアンサー1

またはgrepを使用してください。

$ grep -Eo '(ets|tid|trxn|ts)="[^"]+"' file
tid="8390500116294391399"
ts="N/A"
ets="2019-02-22T00:21:41.228Z"
trxn="smaple data with spaces 2 record"
trxn="smaple data with spaces 3rd record"
trxn="smaple data with spaces 4th record"
trxn="smaple data with spaces 5th record"
tid="2345500116294391399"
ts="NA"
trxn="other data with spaces"
ets="2017-02-22T00:21:41.228Z"
trxn="another record data"
trxn="smaple data with spaces record"
trxn="smaple data with spaces 4th record"
trxn="data with spaces"
tid="2345500116294391399"
ts="NA"
ets="2017-02-22T00:21:41.228Z"

おすすめ記事