SQLなどのCSVファイルのクエリ

2024-06-27 • tag-icon

これは明らかに人気のインタビューの質問です。

恐竜データを含むCSVファイルが2つあります。特定の基準を満たす恐竜を返すには照会する必要があります。

2つのオプションがあります。 Unixコマンドラインツール（ /// cut）を使用するか、Pythonなどのスクリプト言語を使用してください。pastesedawk追加モジュールなしいいねq、、ちょっと待ってくださいfsql。csvkit

ファイル1.csv：

NAME,LEG_LENGTH,DIET
Hadrosaurus,1.2,herbivore
Struthiomimus,0.92,omnivore
Velociraptor,1.0,carnivore
Triceratops,0.87,herbivore
Euoplocephalus,1.6,herbivore
Stegosaurus,1.40,herbivore
Tyrannosaurus Rex,2.5,carnivore

ファイル2.csv

NAME,STRIDE_LENGTH,STANCE
Euoplocephalus,1.87,quadrupedal
Stegosaurus,1.90,quadrupedal
Tyrannosaurus Rex,5.76,bipedal
Hadrosaurus,1.4,bipedal
Deinonychus,1.21,bipedal
Struthiomimus,1.34,bipedal
Velociraptor,2.72,bipedal

フォーラムをご利用ください。

speed = ((STRIDE_LENGTH / LEG_LENGTH) - 1) * SQRT(LEG_LENGTH * g)

どこ

g = 9.8 m/s^2

csvファイルを読み、二足歩行恐竜の名前だけが最も速いものから最も遅いものの順に印刷するプログラムを作成します。

SQLでは、次のように簡単です。

select f2.name from
file1 f1 join file2 f2 on f1.name = f2.name
where f1.stance = 'bipedal'
order by (f2.stride_length/f1.leg_length - 1)*pow(f1.leg_length*9.8,0.5) desc

BashまたはPythonでこれを行う方法は？

ベストアンサー1

これを達成するために、いくつかのツールが作成されました。例は次のとおりです。

$ csvq 'select * from cities'
+------------+-------------+----------+
|    name    |  population |  country |
+------------+-------------+----------+
| warsaw     |  1700000    |  poland  |
| ciechanowo |  46000      |  poland  |
| berlin     |  3500000    |  germany |
+------------+-------------+----------+

$ csvq 'insert into cities values("dallas", 1, "america")'
1 record inserted on "C:\\cities.csv".
Commit: file "C:\\cities.csv" is updated.

https://github.com/mithrandie/csvq

ベストアンサー1

おすすめ記事