繰り返される数字で始まる整数のサブシーケンスの抽出

Question

目的のタスクを実行するPythonスクリプトは次のとおりです。

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""extract_subsequences.py"""

import sys
import re

# Open the file
with open(sys.argv[1]) as file_handle:

    # Read the data from the file
    # Remove white-space and ignore non-integers
    numbers = [
        line.strip()
        for line in file_handle.readlines()
        if re.match("^\d+$", line) 
    ]

    # Set a lower bound so that we can output multiple lists
    lower_bound = 0
    while lower_bound < len(numbers)-1:

        # Find the "start index" where the same number
        # occurs twice at consecutive locations
        start_index = -1 
        for i in range(lower_bound, len(numbers)-1):
            if numbers[i] == numbers[i+1]:
                start_index = i
                break

        # If a "start index" is found, print out the two rows
        # values and the next 10 rows as well
        if start_index >= lower_bound:
            upper_bound = min(start_index+12, len(numbers))
            print(' '.join(numbers[start_index:upper_bound]))

            # Update the lower bound
            lower_bound = start_index + 1

        # If no "start index" is found then we're done
        else:
            break

データがというディレクトリにあると仮定すると、data.txt次のようにこのスクリプトを実行できます。

python extract_subsequences.py data.txt

入力ファイルがdata.txt次のようになるとします。

その後、出力は次のようになります。

1 1 1 2 3 4 5 6 7 8 9 10
1 1 2 3 4 5 6 7 8 9 10 11

出力をファイルに保存するには、出力リダイレクトを使用します。

python extract_subsequences.py data.txt > output.txt

Answer 1

目的のタスクを実行するPythonスクリプトは次のとおりです。

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""extract_subsequences.py"""

import sys
import re

# Open the file
with open(sys.argv[1]) as file_handle:

    # Read the data from the file
    # Remove white-space and ignore non-integers
    numbers = [
        line.strip()
        for line in file_handle.readlines()
        if re.match("^\d+$", line) 
    ]

    # Set a lower bound so that we can output multiple lists
    lower_bound = 0
    while lower_bound < len(numbers)-1:

        # Find the "start index" where the same number
        # occurs twice at consecutive locations
        start_index = -1 
        for i in range(lower_bound, len(numbers)-1):
            if numbers[i] == numbers[i+1]:
                start_index = i
                break

        # If a "start index" is found, print out the two rows
        # values and the next 10 rows as well
        if start_index >= lower_bound:
            upper_bound = min(start_index+12, len(numbers))
            print(' '.join(numbers[start_index:upper_bound]))

            # Update the lower bound
            lower_bound = start_index + 1

        # If no "start index" is found then we're done
        else:
            break

データがというディレクトリにあると仮定すると、data.txt次のようにこのスクリプトを実行できます。

python extract_subsequences.py data.txt

入力ファイルがdata.txt次のようになるとします。

その後、出力は次のようになります。

1 1 1 2 3 4 5 6 7 8 9 10
1 1 2 3 4 5 6 7 8 9 10 11

出力をファイルに保存するには、出力リダイレクトを使用します。

python extract_subsequences.py data.txt > output.txt

繰り返される数字で始まる整数のサブシーケンスの抽出

ベストアンサー1

おすすめ記事