ソートされたファイルを2つの列にマージする

Question

TXR Lispソリューション：

$ txr merge.tl 
ancient-american    mercury 1   164
ancient-american    mercury 1   160
ancient-american    mh25    2   8717664
ancient-american    mh25    2   10362712888
ancient-neolith tk11    262 40074321970
ancient-neolith tk11    264 43842268110
ancientdna  jk21    6936    17069206689
ancientdna  jk21    6919    16379509855
ancientdna  rm20    11267   372606702813
ancientdna  rm20    11268   324906365415
ancientgen  ab34    1573    27800468142
ancientgen  ab34    1577    33947364202
ancientgen  dg11    3516    45081427920
ancientgen  dg11    3518    48092138390
ancientgen  fa8 7179    462396221983
ancientgen  fa8 7174    472364587220
ancientgen  mp15    41  10248223517
ancientgen  mp15    39  32487920045
ancientgen  mp18    254 1049351143
ancientgen  mp18    254 1058177852
ancientgen  rm20    15100   1565340401
ancientgen  rm20    15104   998615135
ancientgen  tc9 1695    89861489631
ancientgen  tc9 1692    94858351562

パスワード：

(defstruct record ()
  key
  line
  (:method equal (me) me.key))

(defun read-recs (file)
  (build
    (awk (:set fs "\t")
         (:inputs file)
         (t (add (new record
                      key [f 0..2]
                      line rec))))))

(mapdo [chain .line put-line] (merge (read-recs "bygroup.0") (read-recs "bygroup.1")))

ファイル内の各レコードに関する情報を保持するために、ソートキーrecordaと生の軸行であるaを含む構造体の種類を定義します。スロットは2つの文字列のリストになります。keylinekey

このrecordタイプには、equal次を実装するメソッドがあります。同等の代替。つまり、record関数が渡されるかequalオブジェクトlessが比較されるたびに、オブジェクトのgreater代わりにそのメソッドの値が使用されます。たとえば、比較関数を指定せずにsort構造体リストにアクセスすると、recordその構造体はキーに基づいてソートされます。

この関数は、read-recs標準awkAwkのフィールド区切り文字を"\t" (tab). For each record, the t condition (unconditional truth) dispatches an action which creates a record object. Thekey is a sublist of thef (field) list, consisting of the first two fields. Theline is therec $ 0`として指定します。: the whole record. is like

このbuildマクロは、暗黙的、手続き型リストの作成に使用されます。たとえば、(build (add 1) (add 2))リストが返されます(1 2)。終了時に返されるbuild暗黙の隠しリストに呼び出しが追加される範囲を作成します。addbuild

read-recs等値置換を使用すると、必要なキーを正しく入力するタイプがあるため、両方のファイルを読み取り、ソートされたリストを取得するために関数にrecord渡すだけです。merge

recordこのリストのオブジェクトは、2 つの関数の接続を介してマップされます。[chain .line put-line]この関数はオブジェクトのスロットを.line検索し、それを標準出力にダンプし、その後に改行文字が続きます。lineput-line

read-recsandを使わずにbuild関数を実装する方法は次のawkとおりです。

(defun read-recs (file)
  (collect-each ((line (file-get-lines file)))
    (let ((fields (spl #\tab line)))
      (new record key [fields 0..2]
                  line line))))

Answer 1