Join使用時に追加されたキャリッジリターン文字の移動

Join使用時に追加されたキャリッジリターン文字の移動

2つのパイプで区切られたファイルを結合しようとしていますが、私の結合コマンドを使用した後は次のようになります。

join -a 1 -i -t"|" -o 1.3 1.1 2.2 1.4 1.5 2.3 2.4 2.5 2.6 2.7 2.8 2.9  <(sort -d -t"|" -z  alt.csv) <(sort -d -t"|" -z  ../original/alt.csv) > ../out/alt.csv

出力ファイルには、接続が発生するキャリッジリターン文字があります。例:

IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword
|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes
|Photographic negatives ||&lt;p&gt;The albums comprise of negatives of Gypsies and Gypsy life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. &lt;&#x0002F;p&gt;||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes
|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes
|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes
|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||

ただし、正しい処理を行うには、キャリッジリターンを最後の列の後に表示する必要があります。

IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes|Photographic negatives ||&lt;p&gt;The albums comprise of negatives of  life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. &lt;&#x0002F;p&gt;||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||

希望の結果を得るためにsedまたはawkを使用する方法はありますか?まず、最後の列の最後に別のパイプを追加し、発生回数に応じて交換する必要がありますか?

ベストアンサー1

解決策を見つけましたが、特にエレガントではありません。私は2番目のファイルに追加のパイプを追加して結合することにしました。これは、型を正しく指定するためにいくつかの追加処理を実行できるためです。

今私が取るべきステップは次のとおりです。

    # add pipe to the end of the line for ORIGINAL files only
    sed -i 's/$/|/' ../original/alt.csv

    --- Do join and output joined file to ../out/alt.csv ---

    # match on last pipe and add a carriage return
    sed -i 's/\(.*\)\|/\0\r/' ../out/alt.csv

    # remove carriage return where join occurred (the use of pipe is simply to locate carriage return) and replace with pipe
    sed -i 's/\r|/|/' ../out/alt.csv

    # remove all blank lines 
    sed -i '/^\s*$/d' ../out/alt.csv

    # remove pipe at the end of the line of output file and add a carriage return
    sed -i 's/[^\r\n].$/\r/' ../out/alt.csv 

これを達成する簡単な方法があれば、喜んで聞きたいです。

おすすめ記事