Retriever allows committing of datasets and installation of the committed dataset into the database of your choice at a later date. This ensures that the previous outputs/results can be produced easily.
The directory to save your committed dataset can be defined by setting the environment variable
However, you can still save the committed dataset in a directory of your choice by defining the
path while committing
Retriever supports committing of a dataset into a compressed archive.
def commit(dataset, commit_message='', path=None, quiet=False):
A description of the default parameters mentioned above:
dataset (String): Name of the dataset. commit_message (String): Specify commit message for a commit. path (String): Specify the directory path to store the compressed archive file. quiet (Bool): Setting True minimizes the console output.
Example to commit dataset:
retriever commit abalone-age -m "Example commit" --path . Committing dataset abalone-age Successfully committed.
>>> from retriever import commit >>> commit('abalone-age', commit_message='Example commit', path='/home/')
If the path is not provided the committed dataset is saved in the
Log Of Committed Datasets¶
You can view the log of commits of the datasets stored in the provenance directory.
A description of the parameter mentioned above:
dataset (String): Name of the dataset.
retriever log abalone-age Commit message: Example commit Hash: 02ee77 Date: 08/16/2019, 16:12:28
>>> from retriever import commit_log >>> commit_log('abalone-age')
Installing Committed Dataset¶
You can install committed datasets by using the hash-value or by providing the path of the compressed archive. Installation using hash-value is supported only for datasets stored in the provenance directory.
For installing dataset from a committed archive you can provide the path to the archive in place of dataset name:
retriever install sqlite abalone-age-02ee77.zip
>>> from retriever import install_sqlite >>> install_sqlite('abalone-age-02ee77.zip')
Also, you can install using the hash-value of the datasets stored in provenance directory. You can always look up the
hash-value of your previous commits using the command
retriever log dataset_name.
For installing dataset from provenance directory provide the
hash-value of the commit.
retriever install sqlite abalone-age --hash-value 02ee77
>>> from retriever import install_sqlite >>> install_sqlite('abalone-age', hash_value='02ee77')