あるファイルの1つの列を別のファイルのすべての列と比較する

Question

~からPythonタグ内のこれはPython 3のstdlibのみを使用しますnumpy。numpy

標準ライブラリ

with open('needle') as f:
    needle = [int(line.strip()) for line in f]

with open('haystack') as f:
    haystack = [[int(val) for val in line.strip().split()] for line in f]
    # transpose
    haystack = [list(row) for row in zip(*haystack)]

count = haystack.count(needle)
indices = [i for i, row in enumerate(haystack) if row == needle]

print('count:', count)
print('indices:', indices)

`numpy`

import numpy


needle = numpy.loadtxt('needle', dtype=int)
haystack = numpy.loadtxt('haystack', dtype=int).transpose()
match = (haystack == needle).all(-1)

count = numpy.count_nonzero(match)
indices = numpy.where(match == 1)[0]

print('count:', count)
print('indices:', indices)

テストデータ

テストのために、次のジェネレータを使用して1,000,000列1と0を生成しました。

import numpy


arr = numpy.random.choice([0, 1], size=(10, 1000000))
mat = numpy.matrix(arr)

with open('generated', 'wb') as f:
    for line in mat:
        numpy.savetxt(f, line, fmt='%i', delimiter='\t')

測定時間：

$ uname -a
Linux localhost 3.10.103-g35adc8d #1 SMP PREEMPT Wed Jun 27 20:11:35 UTC 2018 aarch64 GNU/Linux
$ time python search_stdlib.py >/dev/null

real    0m16.326s
user    0m14.867s
sys     0m0.617s
$ time python search_numpy.py >/dev/null

real    0m11.006s
user    0m10.487s
sys     0m0.307s

Answer 1

~からPythonタグ内のこれはPython 3のstdlibのみを使用しますnumpy。numpy

標準ライブラリ

with open('needle') as f:
    needle = [int(line.strip()) for line in f]

with open('haystack') as f:
    haystack = [[int(val) for val in line.strip().split()] for line in f]
    # transpose
    haystack = [list(row) for row in zip(*haystack)]

count = haystack.count(needle)
indices = [i for i, row in enumerate(haystack) if row == needle]

print('count:', count)
print('indices:', indices)

`numpy`

import numpy


needle = numpy.loadtxt('needle', dtype=int)
haystack = numpy.loadtxt('haystack', dtype=int).transpose()
match = (haystack == needle).all(-1)

count = numpy.count_nonzero(match)
indices = numpy.where(match == 1)[0]

print('count:', count)
print('indices:', indices)

テストデータ

テストのために、次のジェネレータを使用して1,000,000列1と0を生成しました。

import numpy


arr = numpy.random.choice([0, 1], size=(10, 1000000))
mat = numpy.matrix(arr)

with open('generated', 'wb') as f:
    for line in mat:
        numpy.savetxt(f, line, fmt='%i', delimiter='\t')

測定時間：

$ uname -a
Linux localhost 3.10.103-g35adc8d #1 SMP PREEMPT Wed Jun 27 20:11:35 UTC 2018 aarch64 GNU/Linux
$ time python search_stdlib.py >/dev/null

real    0m16.326s
user    0m14.867s
sys     0m0.617s
$ time python search_numpy.py >/dev/null

real    0m11.006s
user    0m10.487s
sys     0m0.307s

あるファイルの1つの列を別のファイルのすべての列と比較する

ベストアンサー1

標準ライブラリ

`numpy`

テストデータ

おすすめ記事