3.7 Calling a Web API
In the previous section we explained how to download individual files from the Internet. Another way data can come from the Internet is through a web API, which stands for Application Programming Interface. The number of APIs that are being offered by organizations is growing at increasing rate, which means a lot of interesting data for us data scientists.
Web APIs are not meant to be presented in nice layout, such as websites. Instead, most web APIs return data in a structured format, such as JSON or XML. Having data in a structured form has the advantage that the data can be easily processed by other tools, such as jq
. For example, the API from https://randomuser.me returns data in the following JSON structure.
$ curl -s https://randomuser.me/api/1.2/ | jq .
{
"results": [
{
"gender": "male",
"name": {
"title": "mr",
"first": "jeffrey",
"last": "lawson"
},
"location": {
"street": "838 miller ave",
"city": "washington",
"state": "maryland",
"postcode": 81831,
"coordinates": {
"latitude": "81.9488",
"longitude": "-67.8247"
},
"timezone": {
"offset": "+4:00",
"description": "Abu Dhabi, Muscat, Baku, Tbilisi"
}
},
"email": "jeffrey.lawson@example.com",
"login": {
"uuid": "78918f6c-2658-4915-bebf-bfaa61a1624c",
"username": "silverzebra774",
"password": "treble",
"salt": "iAtIKhvB",
"md5": "4c02abeca4d6ca4dbfc0ddb33dcef29f",
"sha1": "36e109513abf73df460cead89b78c749abe908fa",
"sha256": "0155d9e6cabedfc3ad0f21d18b3ca3e738a8f17811dd57dc3b4dd386cd021963"
},
"dob": {
"date": "1996-07-04T02:49:46Z",
"age": 22
},
"registered": {
"date": "2013-01-13T13:37:21Z",
"age": 5
},
"phone": "(406)-041-2792",
"cell": "(831)-085-8264",
"id": {
"name": "SSN",
"value": "629-40-9671"
},
"picture": {
"large": "https://randomuser.me/api/portraits/men/62.jpg",
"medium": "https://randomuser.me/api/portraits/med/men/62.jpg",
"thumbnail": "https://randomuser.me/api/portraits/thumb/men/62.jpg"
},
"nat": "US"
}
],
"info": {
"seed": "4bd9f66fd83a6ec7",
"results": 1,
"page": 1,
"version": "1.2"
}
}
The data is piped to a command-line tool jq
in order to display it in a nice way. jq
has many more possibilities that we will explore in Chapter 5.
Some web APIs return data in a streaming manner. This means that once you connect to it, the data will continue to pour in forever. A well-known example is the Twitter “firehose”, which constantly streams all the tweets being sent around the world. Luckily, most command-line tools that we use also operate in a streaming matter, so that we also use this kind of data.
Some APIs require you to log in using the OAuth protocol. There is a handy command-line tool called curlicue
(Foster 2014) that assists in performing the so-called “OAuth dance”. Once this has been set up, it curlicue
will call curl
with the correct headers. First, you set things up once for a particular API with curlicue-setup
, and then you can call that API using curlicue
. For example, to use curlicue
with the Twitter API you would run:
$ curlicue-setup \
> 'https://api.twitter.com/oauth/request_token' \
> 'https://api.twitter.com/oauth/authorize?oauth_token=$oauth_token' \
> 'https://api.twitter.com/oauth/access_token' \
> credentials
$ curlicue -f credentials \
> 'https://api.twitter.com/1/statuses/home_timeline.xml'
For more popular APIs, there are specialized command-line tools available. These are wrappers that provide a convenient way to connect to the API. In Chapter 9, for example, we’ll be using the command-line tool bigmler
that only connects to BigML’s prediction API.