4番目のファイルに基づいて3つのCSVファイルを更新する方法

Question

データが公開したものとまったく同じであると仮定すると、通常のbashで次のことができます。（警告する：ファイルを所定の位置に変更します。テスト前のバックアップを必ずお試しください。 )

最初の2つのファイルを管理するいくつかの機能：

next_id() {
  file="$1"
  # assumes file is sorted by id
  echo $(( $(tail -n 1 $file|cut -d, -f1) + 1 ))
}

仮説ファイル1そしてファイル2id列に基づいてソートすると、最後の行の最初の部分を取得して1ずつインクリメントして、次のIDを生成します。

find_or_create_id() {
  file="$1"
  item="$2"
  # check if we already have that item
  id=$(grep -m 1 ",$item$" "$file" 2> /dev/null)
  if [[ $? -ne 0 ]] ; then
    # generate the next id, append
    id=$(next_id "$file")
    echo "$id,$item" >> "$file"
  else
    # got it already
    id=${id/,*}
  fi
  echo "$id"
}

最初の2つのファイルのいずれかでエントリ（vnameまたはdname）を見つけます。見つかると、既存のIDが返されます。それ以外の場合は、次のIDを生成してファイルに保存し直してください。

正しい部分文字列があれば、主要部分は非常に簡単です。

while read line ; do
  col1=${line/,*}  # everything up to first ,
  col3=${line//*,} # everything after last ,
  col2=${line%,*}  # everything after first ,
  col2=${col2#*,}  # everything before last ,
  id1=$(find_or_create_id file1 "$col1")
  id2=$(find_or_create_id file2 "$col2")
  # don't insert duplicates
  if ! grep -m 1 -q "^$id1,$id2," file3 ; then
    echo "$id1,$id2,$col3" >> file3
  fi
done < <(tail -n +2 file4)

これはいいえ最後のファイルを順番に挿入すると、最後に新しい行が追加されます。

つまり、これらのファイルのサイズが大きい場合、データベースは適切です。データベースサーバーが必要ない場合は、SQLiteを検討してください。

シーケンシャルIDを気にせず（単に異なるという点のみ）、integer primary key autoincrementテーブル1とテーブル2にID（vnameとdnameの一意のキーを含む）を追加したと仮定すると、更新は次のようになります（おそらくこの方法よりも良いでしょう）。もっと微妙なアプローチinsert or ignore）：

insert or ignore into tab1(vname) select distinct vname from tab4;
insert or ignore into tab2(dname) select distinct dname from tab4;

insert or ignore into tab3(id1,id2,value)
  select tab1.id, tab2.id, tab4.value
  from tab4
  left join tab1 on tab1.vname = tab4.vname
  left join tab2 on tab2.dname = tab4.dname;

SQLite"は。

.separator ,
.import fileX tabX

少なくとも現在持っているサンプルについては、Right Thing™を実行してください。

シンプルなアーキテクチャ：

create table tab1 (id integer primary key autoincrement, vname text);
create unique index tab1_vname on tab1(vname);

create table tab2 (id integer primary key autoincrement, dname text);
create unique index tab2_dname on tab2(dname);

create table tab3 (id1 int, id2 int, value text,
                   constraint tab3_pk primary key(id1, id2));

create table tab4 (vname text, dname text, value text);

Answer 1