sed：同じですが、同じではない次の行の前に何かを挿入します。

2024-06-28 • tag-icon

1行に1つの用語集エントリを持つLaTeXファイルがあります。

...
\newglossaryentry{ajahn}{name=Ajahn,description={\textit{(Thai)} From the Pali \textit{achariya}, a Buddhist monk's preceptor: `teacher'; often used as a title of the senior monk or monks at monastery. In the West, the forest tradition uses it for all monks and nuns of more than ten years' seniority}}
\newglossaryentry{ajivaka}{name={\=Aj\={\i}vaka},description={Sect of contemplatives contemporary with the Buddha who held the view that beings have no volitional control over their actions and that the universe runs according to fate and destiny}}
...

\newglossaryentry{label}ここでは各行の部分にのみ興味があります。

sort重複タグが次のように表示されるように、ファイルの行がソートされました。

\newglossaryentry{anapanasati}{name=\=an\=ap\=anasati,description={`Awareness of inhalation and exhalation'; using the breath, as a mediation object},sort=anapanasati}
\newglossaryentry{anapanasati}{name={\=an\=ap\=anasati},description={Mindfulness of breathing. A meditation practice in which one maintains one's attention and mindfulness on the sensations of breathing. \textbf{[MORE]}}}

sedこのファイルの繰り返しタグの前に行を挿入するにはどうすればよいですか？

#!/bin/sh

cat glossary.tex | sed '
/\\newglossaryentry[{][^}]*[}]/{
    N;
    s/^\(\\newglossaryentry[{][^}]*[}]\)\(.*\)\n\1/% duplicate\n\1\2\n\1/;
}' > glossary.sed.tex

上記のコマンドに従いましたが、欠陥があります。パターン空間の行をペアで読み取るため、重複した項目が読み取ったペアである場合にのみ機能します。

たとえば、以下は一致しません。

\newglossaryentry{abhinna}{name={abhi\~n\~n\=a},description={Intuitive powers that come from the practice of concentration: the ability to display psychic powers, clairvoyance, clairaudience, the ability to know the thoughts of others, recollection of past lifetimes, and the knowledge that does away with mental effluents (see \textit{asava}).}}
\newglossaryentry{acariya}{name={\=acariya},description={Teacher; mentor. See \textit{kalyanamitta.}}}
\newglossaryentry{acariya}{name=\=acariya,description={Teacher},see=Ajahn}
\newglossaryentry{adhitthana}{name={adhi\d{t}\d{t}h\=ana},description={Determination; resolution. One of the ten perfections \textit{(paramis).}}}

まず、次の行を読むのでアビナそしてアッチャリアすると、次のように読み込まれますアッチャリアそしてアディタナ。

スペースを維持し、条件付きで線を印刷するには追加の魔法が必要だと思いますsedが、理解することはできません。

ベストアンサー1

これはsedの場合はかなり複雑で、awkまたはPerlの操作に似ています。以下は、連続した重複項目を見つけるスクリプトです（ただし、それらの間に一致しない行は許可されています）。

perl -l -pe '
    if (/^ *\\newglossaryentry[* ]*{([^{}]*)}/) {
        print "% duplicate" if $1 eq $prev;
        $prev = $1;
    }'

ソートされていない入力でも、重複項目が容易に検出されます。

perl -l -pe '
    if (/^ *\\newglossaryentry[* ]*{([^{}]*)}/) {
        print "% duplicate" if $seen{$1};
        ++$seen{$1};
    }'

連続した行に簡単に制限することもできます。

perl -l -pe '
    if (/^ *\\newglossaryentry[* ]*{([^{}]*)}/) {
        print "% duplicate" if $1 eq $prev;
        $prev = $1;
    } else {undef $prev}'

ベストアンサー1

おすすめ記事