Installation

Prerequisites

  1. A distribution package of the form datastories-*xxx*-*environemnt*.zip (or .gz) matching the environment available on the target installation computer.

  2. A valid DataStories SDK license key file.

Note:

For Databricks, there should be a directory /dbfs/FileStore/DataStories that will contain licenses (see below) and that the SDK will use as a storage, when required (see visualisations).

Optional dependencies

MATLAB (or MATLAB Runtime)

This is required for computing predictive models (see datastories.story.predict_single_kpi()).

Linux / Windows:

Matlab 2016b (or MATLAB Runtime v9.1).

MacOS:

Matlab 2017a (or MATLAB Runtime v9.2).

Parquet files

Parquet files are used by the SDK in order to store data in an efficient file format. The DataStories SDK follows the same policy as Pandas (see https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html). Starting from version 1.8.0, the PyArrow dependency won’t be installed automatically anymore.

Note:

The MATLAB Runtime is free and can be downloaded from:

https://www.mathworks.com/products/compiler/matlab-runtime.html

Note:

On MacOS, the “Command Line Development Tools” suite needs to be available when using MATLAB. This is free and can be installed using the following command line:

xcode-select --install

Procedure

  1. Unzip the contents of the archive to a convenient location on disk.

    The archive contains a Python installable package and additional material including documentation and samples. It is therefore convenient to unzip to a location that can be used for further reference (i.e., after the installation is complete).

  2. Install the included Python package using the provided shell script.

    Windows:

    pip_install.bat
    

    Linux / MacOS:

    . ./pip_install.sh
    

    Note: Conda equivalents are provided. However, the installation of the DataStories SDK itself relies on pip. Only required dependencies will be managed by Conda.

  3. Copy the license key file to a a platform dependent location as indicated below.

    Windows:

    • Either in %PROGRAMDATA%\DataStories (e.g., C:\ProgramData\DataStories)

    • or in ~\DataStories

    • or in ~\.DataStories

    • or in the target script execution folder

    Note: The %PROGRAMDATA% folder could be hidden. In order to copy the license to that location one might need to switch on the display of hidden files, ot use the xcopy /H command from the command prompt.

    Linux:

    • Either in /etc/DataStories

    • or in ~/DataStories

    • or in ~/.DataStories

    • or in the target script execution folder

    MacOS:

    • Either in /Library/DataStories

    • or in ~/DataStories

    • or in ~/.DataStories

    • or in the target script execution folder

    Databricks:

    • /dbfs/FileStore/DataStories/licenses

    Note: When the destination folder is not present on the machine it needs to be created manually.

    Note: When the license key file is available at a location that differs from the options above, the DataStories license management service needs to be explicitly initialized before execution rights are granted to license protected functionality (see datastories.api.license.LicenseManager()).

  4. If MATLAB is installed on the system, the path to the MATLAB libraries has to be known in order to enable the dependent functionality in the DataStories SDK (e.g., datastories.story.predict_single_kpi()). This path is system and installation dependent. You can find below a set of commonly encountered configurations:

    Windows::

    The path has to be specified in Environment variables section. This is usually done automatically while installing the MATLAB Runtime. When this is not the case, the path has to be added to the PATH variable:

    PATH = %PATH%;C:\Program Files\MATLAB\MATLAB Runtime\v91\runtime\win64;

    Linux and MacOS::

    The application looks for the MATLAB libraries in a number of default locations. If MATLAB is installed in a custom location, the path to the runtime has to be given in the MCR_ROOT variable

    export MCR_ROOT=/usr/local/MATLAB/MATLAB_Runtime/v91

Databricks::

The same process than the one on Linux should be followed. The environment variable can be set on the cluster, globally.

Jupyter Notebook

The DataStories SDK offers comprehensive support for interactive exploration of data and analysis results within Jupyter notebooks. However, Jupyter is not a required dependency and all SDK analyses can be performed outside the Jupyter context. However, in order to enjoy the full SDK capabilities, it is recommend to install the Jupyter package as well.

Windows / Linux / MacOS:

pip install jupyter

When running the DataStories SDK within a Jupyter Notebook, additional settings have to be performed on the Jupyter configuration. A convenience script is included with the installation, performing all necessary configurations. One can use this by running the DataStories notebook starter:

Windows:

ds_notebook.bat

Linux / MacOS:

ds_notebook.sh

Note: When running the Jupyter Notebook via the provided runner script is not an option, the following settings has to be added to the Jupyter configuration file: NotebookApp.iopub_data_rate_limit=10000000