Using the Socrata API

This tutorial explains the usage of the Socrata API in Data Retriever. It includes both the CLI (Command Line Interface) commands as well as the Python interface for the same.

Note

Currently Data Retriever only supports tabular Socrata datasets (tabular Socrata datasets which are of type map are not supported).

Command Line Interface

Listing the Socrata Datasets

The retriever ls -s command displays the Socrata datasets which contain the provided keywords in their title.

$ retriever ls -h (gives listing options)

usage: retriever ls [-h] [-l L [L ...]] [-k K [K ...]] [-v V [V ...]]
                [-s S [S ...]]

optional arguments:
  -h, --help    show this help message and exit
  -l L [L ...]  search datasets with specific license(s)
  -k K [K ...]  search datasets with keyword(s)
  -v V [V ...]  verbose list of specified dataset(s)
  -s S [S ...]  search socrata datasets with name(s)

Example

This example will list the names of the socrata datasets which contain the word fishing.

$ retriever ls -s fishing

Autocomplete suggestions : Total 34 results

[?] Select the dataset name: Recommended Fishing Rivers And Streams
 > Recommended Fishing Rivers And Streams
   Recommended Fishing Rivers And Streams API
   Iowa Fishing Report
   Recommended Fishing Rivers, Streams, Lakes and Ponds Map
   Public Fishing Rights Parking Areas Map
   Fishing Atlas
   Cook County - Fishing Lakes
   [ARCHIVED] Fishing License Sellers
   Public Fishing Rights Parking Areas
   Recommended Fishing Lakes and Ponds Map
   Recommended Fishing Lakes and Ponds
   Delaware Fishing Licenses and Trout Stamps
   Cook County - Fishing Lakes - KML

Here the user is prompted to select a dataset name. After selecting a dataset, the command returns some information related to the dataset selected.

Let’s select the Public Fishing Rights Parking Areas dataset, after pressing Enter, the command returns some information regarding the dataset selected.

Autocomplete suggestions : Total 34 results

[?] Select the dataset name: Public Fishing Rights Parking Areas
   Iowa Fishing Report
   Recommended Fishing Rivers, Streams, Lakes and Ponds Map
   Fishing Atlas
   Public Fishing Rights Parking Areas Map
   [ARCHIVED] Fishing License Sellers
   Cook County - Fishing Lakes
 > Public Fishing Rights Parking Areas
   Recommended Fishing Lakes and Ponds Map
   Recommended Fishing Lakes and Ponds
   Delaware Fishing Licenses and Trout Stamps
   Cook County - Fishing Lakes - KML
   General Fishing and Salmon Licence Sales
   Hunting and Fishing License Sellers

Dataset Information of Public Fishing Rights Parking Areas: Total 1 results

1. Public Fishing Rights Parking Areas
  ID : 9vef-6whi
  Type : {'dataset': 'tabular'}
  Description : The New York State Department of Environmental Con...
  Domain : data.ny.gov
  Link : https://data.ny.gov/Recreation/Public-Fishing-Rights-Parking-Areas/9vef-6whi

Downloading the Socrata Datasets

The retriever download socrata-<socrata id> command downloads the Socrata dataset which matches the provided socrata id.

Example

From the example in Listing the Socrata Datasets section, we selected the Public Fishing Rights Parking Areas dataset. Since the dataset is of type tabular, we can download it. The information received in the previous example contains the socrata id. We use this socrata id to download the dataset.

$ retriever download socrata-9vef-6whi

=> Installing socrata-9vef-6whi
Downloading 9vef-6whi.csv: 10.0B [00:03, 2.90B/s]
Done!

The downloaded raw data files are stored in the raw_data directory in the ~/.retriever directory.

Installing the Socrata Datasets

The retriever install <engine> socrata-<socrata id> command downloads the raw data, creates the script for it and then installs the Socrata dataset which matches the provided socrata id into the provided engine.

Example

From the example in Listing the Socrata Datasets section, we selected the Public Fishing Rights Parking Areas dataset. Since the dataset is of type tabular, we can install it. The information received in that section contains the socrata id. We use this socrata id to install the dataset.

$ retriever install postgres socrata-9vef-6whi

=> Installing socrata-9vef-6whi
Downloading 9vef-6whi.csv: 10.0B [00:03, 2.69B/s]
Processing... 9vef-6whi.csv
Successfully wrote scripts to /home/user/.retriever/socrata-scripts/9vef_6whi.csv.json
Updating script name to socrata-9vef-6whi.json
Updating the contents of script socrata-9vef-6whi
Successfully updated socrata_9vef_6whi.json
Creating database socrata_9vef_6whi...

Bulk insert on ..  socrata_9vef_6whi.socrata_9vef_6whi
Done!

The script created for the Socrata dataset is stored in the socrata-scripts directory in the ~/.retriever directory.

Python Interface in Data Retriever

Searching Socrata Datasets

The function socrata_autocomplete_search takes a list of strings as input and returns a list of strings which are the autocompleted names.

>>> import retriever as rt
>>> names = rt.socrata_autocomplete_search(['clinic', '2015', '2016'])
>>> for name in names:
...     print(name)
...
2016 & 2015 Clinic Quality Comparisons for Clinics with Five or More Service Providers
2015 - 2016 Clinical Quality Comparison (>=5 Providers) by Geography
2016 & 2015 Clinic Quality Comparisons for Clinics with Fewer than Five Service Providers

Socrata Dataset Info by Dataset Name

The input argument for the function socrata_dataset_info should be a string (valid dataset name returned by socrata_autocomplete_search). It returns a list of dicts, because there are multiple datasets on socrata with same name (e.g. Building Permits).

>>> import retriever as rt
>>> resource = rt.socrata_dataset_info('2016 & 2015 Clinic Quality Comparisons for Clinics with Five or More Service Providers')
>>> from pprint import pprint
>>> pprint(resource)
[{'description': 'This data set includes comparative information for clinics '
                 'with five or more physicians for medical claims in 2015 - '
                 '2016. \r\n'
                 '\r\n'
                 'This data set was calculated by the Utah Department of '
                 'Health, Office of Healthcare Statistics (OHCS) using Utah’s '
                 'All Payer Claims Database (APCD).',
  'domain': 'opendata.utah.gov',
  'id': '35s3-nmpm',
  'link': 'https://opendata.utah.gov/Health/2016-2015-Clinic-Quality-Comparisons-for-Clinics-w/35s3-nmpm',
  'name': '2016 & 2015 Clinic Quality Comparisons for Clinics with Five or '
          'More Service Providers',
  'type': {'dataset': 'tabular'}}]

Finding Socrata Dataset by Socrata ID

The input argument of the function find_socrata_dataset_by_id should be the four-by-four socrata dataset identifier (e.g. 35s3-nmpm). The function returns a dict which contains metadata about the dataset.

>>> import retriever as rt
>>> from pprint import pprint
>>> resource = rt.find_socrata_dataset_by_id('35s3-nmpm')
>>> pprint(resource)
{'datatype': 'tabular',
 'description': 'This data set includes comparative information for clinics '
                'with five or more physicians for medical claims in 2015 - '
                '2016. \r\n'
                '\r\n'
                'This data set was calculated by the Utah Department of '
                'Health, Office of Healthcare Statistics (OHCS) using Utah’s '
                'All Payer Claims Database (APCD).',
 'domain': 'opendata.utah.gov',
 'homepage': 'https://opendata.utah.gov/Health/2016-2015-Clinic-Quality-Comparisons-for-Clinics-w/35s3-nmpm',
 'id': '35s3-nmpm',
 'keywords': ['socrata'],
 'name': '2016 & 2015 Clinic Quality Comparisons for Clinics with Five or More '
         'Service Providers'}

Downloading a Socrata Dataset

import retriever as rt
rt.download('socrata-35s3-nmpm')

Installing a Socrata Dataset

import retriever as rt
rt.install_postgres('socrata-35s3-nmpm')

Note

For downloading or installing the Socrata Datasets, the dataset should follow the syntax given. The dataset name should be socrata-<socrata id>. The socrata id should be the four-by-four socrata dataset identifier (e.g. 35s3-nmpm).

Example:
  • Correct: socrata-35s3-nmpm
  • Incorrect: socrata35s3-nmpm, socrata35s3nmpm