複数のファイル間で同じ単語を比較するには？

Question

この試み、

# Declare the files you want to include
files=( file* )

# Function to find common words in any number of files
wcomm() {
    # If no files provided, exit the function.
    [ $# -lt 1 ] && return 1
    # Extract words from first file
    local common_words=$(grep -o "\w*" "$1" | sort -u)
    while [ $# -gt 1 ]; do
        # shift $1 to next file
        shift
        # Extract words from next file
        local next_words=$(grep -o "\w*" "$1" | sort -u)
        # Get only words in common from $common_words and $next_words
        common_words=$(comm -12 <(echo "${common_words,,}") <(echo "${next_words,,}"))
    done
    # Output the words common to all input files
    echo "$common_words"
}

# Output number of matches for each of the common words in total and per file
for w in $(wcomm "${files[@]}"); do
    echo $w:$(grep -oiw "$w" "${files[@]}" | wc -l);
    for f in "${files[@]}"; do
        echo $f:$(grep -oiw "$w" "$f" | wc -l);
    done;
    echo;
done

出力：

beautiful:3
file1:1
file2:1
file3:1

so:3
file1:1
file2:1
file3:1

説明する:

スクリプト内にコメントとして含まれます。

特徴:

あなたが持っているだけのファイルARG_MAX許可する
単語区切り文字として認識される文字で区切られたすべての単語を検索しますgrep。
大文字と小文字は無視されるため、「beautiful」と「Beautiful」は同じ単語です。

Answer 1

この試み、

# Declare the files you want to include
files=( file* )

# Function to find common words in any number of files
wcomm() {
    # If no files provided, exit the function.
    [ $# -lt 1 ] && return 1
    # Extract words from first file
    local common_words=$(grep -o "\w*" "$1" | sort -u)
    while [ $# -gt 1 ]; do
        # shift $1 to next file
        shift
        # Extract words from next file
        local next_words=$(grep -o "\w*" "$1" | sort -u)
        # Get only words in common from $common_words and $next_words
        common_words=$(comm -12 <(echo "${common_words,,}") <(echo "${next_words,,}"))
    done
    # Output the words common to all input files
    echo "$common_words"
}

# Output number of matches for each of the common words in total and per file
for w in $(wcomm "${files[@]}"); do
    echo $w:$(grep -oiw "$w" "${files[@]}" | wc -l);
    for f in "${files[@]}"; do
        echo $f:$(grep -oiw "$w" "$f" | wc -l);
    done;
    echo;
done

出力：

beautiful:3
file1:1
file2:1
file3:1

so:3
file1:1
file2:1
file3:1

説明する:

スクリプト内にコメントとして含まれます。

特徴:

あなたが持っているだけのファイルARG_MAX許可する
単語区切り文字として認識される文字で区切られたすべての単語を検索しますgrep。
大文字と小文字は無視されるため、「beautiful」と「Beautiful」は同じ単語です。

複数のファイル間で同じ単語を比較するには？

ベストアンサー1

おすすめ記事