3.5 Querying Relational Databases
Most companies store their data in a relational database. Examples of relational databases are MySQL, PostgreSQL, and SQLite. These databases all have a slightly different way of interfacing with them. Some provide a command-line tool or a command-line interface, while others do not. Moreover, they are not very consistent when it comes to their usage and output.
Fortunately, there is a command-line tool called sql2csv
, which is part of the Csvkit suite. Because it leverages the Python SQLAlchemy package, we only have to use one tool to execute queries on many different databases through a common interface, including MySQL, Oracle, PostgreSQL, SQLite, Microsoft SQL Server, and Sybase. The output of sql2csv
is, as its name suggests, in CSV format.
We can obtain data from relational databases by executing a SELECT
query on them. (sql2csv
also support INSERT
, UPDATE
, and DELETE
queries, but that’s not the purpose of this chapter.) To select a specific set of data from an SQLite database named iris.db, sql2csv
can be invoked as follows:
$ sql2csv --db 'sqlite:///data/iris.db' --query 'SELECT * FROM iris '\
> 'WHERE sepal_length > 7.5'
sepal_length,sepal_width,petal_length,petal_width,species
7.6,3.0,6.6,2.1,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
Here, we are selecting all rows where sepal_length
is larger than 7.5. The —db
option specifies the database URL, of which the typical form is: dialect+driver://username:password@host:port/database
.