SDK Reference

General Interfaces

datastories.api.get_version()

Get the version of the currently loaded modules.

Returns:
  • A dictionary containing loaded modules and corresponding versions

Base classes and interfaces

class datastories.api.IAnalysisResult

Interface implemented by all analysis results.

plot(*args, **kwargs)

Plots a graphical representation of the results in Jupyter Notebook.

to_csv(file_path, delimiter=',', decimal='.')

Export the result to a CSV file.

Args:

  • file_path (str):

    path to the output file.

  • delimiter (str=’,’):

    character used as value delimiter.

  • decimal (str=’.’):

    character used as decimal point.

Raises:

  • ValueError:

    when the object returned by to_pandas is not a Pandas data frame.

to_excel(file_path, tab_name='Statistics')

Export the result to an Excel file.

Args:

  • file_path (str):

    path to the output file.

  • tab_name (str=’Statistics’):

    name of the Excel tab where to save the result.

Raises:

  • ValueError:

    when the object returned by to_pandas is not a Pandas data frame.

to_html(file_path, title='', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the analysis result visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.

abstract to_pandas()

Exports the result to a Pandas DataFrame.

Returns:

  • The constructed Pandas DataFrame.

to_txt(file_path)

Export the result to a TXT file.

Args:

  • file_path (str):

    path to the output file.

class datastories.api.IConsole

Interface implemented by all message loggers.

abstract log(message)

Log a message tot he console.

Args:

  • message (string):

    the message to log.

class datastories.api.IPrediction(data)

Bases: IAnalysisResult

Interface implemented by all prediction results.

Args:

  • data (obj):

    The associated prediction input data.

abstract property metrics

A dictionary containing prediction performance metrics.

These metrics are computed when the data frame used for prediction includes KPI values, for the purpose of evaluating the model prediction performance.

class datastories.api.IPredictiveModel

Interface implemented by all prediction models.

abstract property metrics

A dictionary containing model prediction performance metrics.

The type of metrics depend on the model type (i.e., regression or classification)

abstract property model

The generic RSX based model used for making predictions.

abstract predict(data_frame)

Predict the model KPI on a new data frame.

Args:

  • data_frame (obj):

    the data frame on which the model associated KPI is to be predicted.

Returns:

  • An object of type datastories.model.PredictionResult encapsulating the prediction results.

Raises:

  • ValueError:

    when not all required columns are provided.

to_cpp(file_path)

Export the model to a C++ file.

Args:

  • file_path (str):

    path to the output file.

Raises:

to_excel(file_path)

Export the model to an Excel file.

Args:

  • file_path (str):

    path to the output file.

Raises:

to_matlab(file_path)

Export the model to a MATLAB file.

Args:

  • file_path (str):

    path to the output file.

Raises:

to_py(file_path)

Export the model to a Python file.

Args:

  • file_path (str):

    path to the output file.

Raises:

to_r(file_path)

Export the model to an R file.

Args:

  • file_path (str):

    path to the output file.

Raises:

class datastories.api.IStory(params=None, metainfo=None, raw_results=None, results=None, folder=None, notes=None, upload_function=None, on_snapshot=None, progress_bar=False)

Bases: IAnalysisResult

Interface implemented by all story analyses.

Args:

  • params (dict):

    dictionary containing user and inferred analysis parameters.

  • metainfo (dict):

    dictionary containing process parameters (e.g., progress pointers).

  • raw_results (dict):

    dictionary containing rainstorm processing results.

  • results (dict):

    dictionary containing processing results.

  • folder (str=None):

    the story working folder. Leave not specified to create one at runtime.

  • notes (list=[]):

    a list of notes.

  • upload_function (callback=None):

    a function to upload files to a storage (relevant for the client).

  • on_snapshot (callback=None):

    a callback to be executed upon saving a snapshot (e.g., upload snapshot to S3).

  • progress_bar (obj=None):

    a progress bar object.

abstract add_note(note)

Add an annotation to the story results.

The already present annotations can be retrieved using the datastories.api.IStory.notes() property.

Args:

  • note (str):

    the annotation to be added.

abstract clear_note(note_id)

Remove a specific annotation associated with the story analysis.

Args:

  • note_id (int):

    the index of the note to be removed.

Raises:

  • ValueError:

    when the note index is unknown.

abstract clear_notes()

Clear the annotations associated with the story analysis.

abstract property info

Displays story execution information.

abstract static is_compatible(current_version_string, ref_version_string)

Checks if a story version is compatible with a reference version.

abstract classmethod load(file_path)

Loads a previously saved story.

abstract property metrics

Returns a set of metrics computed during analysis.

abstract property notes

The list of all annotations currently associated with the story analysis.

abstract reset()

Reset the execution pointer of a story to the first stage.

abstract run(resume_from=None, strict=False, params=None, progress_bar=None, check_interrupt=None)

Resumes the execution of a story form a give stage.

The stage to resume from is optional. If not specified, the story is executed from the beginning. If the stage cannot be executed (e.g., due to missing intermediate results) the closest story that can be executed will be used as starting point unless the [strict] argument is set to True. In that case an exception will be raised if the execution cannot be resumed from the requested stage.

Args:

  • resume_from (StoryProcessingStage=None):

    The stage to resume execution from. Should be a stage for which all intermediate results are available. If None, the stage at which execution was previously interrupted (if any) is used.

  • strict (bool=False):

    Raise en error if execution cannot be resumed from the requested stage.

  • params (dict={}):

    Map of parameters to be used with the run. It can override the original parameters, but this leads to invalidating previous results that depend on the updated parameter values.

  • progress_bar (obj=None):

    An object of type datastories.display.ProgressReporter to replace the currently used progress reporter. When not specified the current story progress reporter will not be modified. The case for this is to set a progress bar after the story is loaded, when a progress bar cannot be given to the load function directly (e.g, when a progress bar has to be constructed based on the story).

  • check_interrupt (func=None):

    an optional callback to check whether analysis execution needs to be interrupted.

Raises:

abstract save(file_path)

Saves the story analysis results.

abstract property stats

Returns a set of stats computed during analysis.

class datastories.api.IStoryDeprecated(notes=None)

Bases: IAnalysisResult

Interface implemented by all story analyses.

Args:

  • notes (list=[]):

    a list of notes.

add_note(note)

Add an annotation to the story results.

The already present annotations can be retrieved using the datastories.api.IStory.notes() property.

Args:

  • note (str):

    the annotation to be added.

clear_note(note_id)

Remove a specific annotation associated with the story analysis.

Args:

  • note_id (int):

    the index of the note to be removed.

Raises:

  • ValueError:

    when the note index is unknown.

clear_notes()

Clear the annotations associated with the story analysis.

static is_compatible(current_version_string, ref_version_string)

Checks if a story version is compatible with a reference version.

abstract static load(file_path)

Loads a previously saved story.

abstract property metrics

Returns a set of metrics computed during analysis.

property notes

The list of all annotations currently associated with the story analysis.

abstract save(file_path)

Saves the story analysis results.

class datastories.api.IProgressObserver

Interface implemented by all progress report observers.

abstract on_progress(progress)

Callback triggered upon progress update.

Args:

  • progress (float):

    the amount of progress. Possible values: [0-1]

class datastories.api.ISlide(slide_deck=None, file_path='slide.json', slide_name=None)

Interface implemented by slides.

A slide is a collection of data and references to data that a renderer can transform into a visual representation.

Args:

  • slide_deck (obj=None):

    a datastories.api.SlideDeck object used to manage the slide.

  • file_path (str=’slide.json’):

    path to a file to be used for serializing the slide.

property slide

The slide content.

The slide content is a versioned and serializable entity that can be used to visualize the slide without requiring access to the object itself.

NOTE: This information cannot be used to construct the object by deserialization.

class datastories.api.SlideDeck

Base class for slide decks.

A slide deck is a convenience component that facilitates managing a collection of slides.

add_slide(slide)

Adds a slide to the deck.

Args:

clear_slides()

Remove the slides in the deck.

goto_slide(slide_idx)

Sets the current slide pointer to a specific value.

Args:

  • slide_idx (int):

    the new value for the slide pointer.

has_slides()

Check if the slide deck contains any slides (i.e., it is not empty).

Returns:

  • True is the slide deck is empty, otherwise False.

insert_slide(pos_idx, slide)

Inserts a slide in the deck at a given position.

Args:

  • pos_idx (int):

    the index at which position the slide is to be inserted.

  • slide (datastories.api.ISlide):

    the slide to be inserted.

next_slide()

Retrieve the next slide in the deck and advances the slide pointer.

If the deck is at the end, or has no slides it returns None.

Returns:

  • The next slide in the deck or None.

property slides

The deck slides.

sort_slides(names)

Sort the slides based on a list of names.

Slides are sorted in place.

Args:

  • names (list):

    A list of slide names indicating the desired sort order. Slides that not mentioned in the list will be added at the end.

class datastories.core.utils.ExportableMixin
to_csv(file_path, delimiter=',', decimal='.', df=None)

Export the result to a CSV file.

Args:

  • file_path (str):

    path to the output file.

  • delimiter (str=’,’):

    character used as value delimiter.

  • decimal (str=’.’):

    character used as decimal point.

  • df (pandas=None):

    data frame to export. If left unspecified it will use he data frame returned by the to_pandas method of the object

Raises:

  • ValueError:

    when the serialized object is not a Pandas data frame

to_excel(file_path, tab_name='Statistics', df=None)

Export the result to an Excel file.

Args:

  • file_path (str):

    path tot he output file.

  • tab_name (str=’Statistics’):

    name of the Excel tab where to save the result

  • df (pandas=None):

    data frame to export. If left unspecified it will use he data frame returned by the to_pandas method of the object

Raises:

  • ValueError:

    when the serialized object is not a Pandas data frame

class datastories.core.utils.ManagedObject(dependencies=None, *args, **kwargs)

An object that has a user controllable lifespan.

Typically inherited by classes that require special resource to be allocated and manually released outside the Python object lifetime management.

Note: Objects of this class should not be manually constructed.

assert_alive()

Triggers an exception if the object has been manually released.

release()

Releases the object associated storage.

Note: This function should only be used in order to force releasing allocated resources. Using the object after this point would lead to an exception.

class datastories.core.utils.StorageBackedObject(folder=None, files=None, *args, **kwargs)

An object that stores part of its resources on disk and loads them on demand.

Base classes:

The resources may be provided by the object dependencies or by the object associated storage. When resources are specified, the object can be made independent from its dependencies by copying the listed resources to its associated storage.

Note: Objects of this class should not be manually constructed.

make_independent(base_folder='')

Make object independent by copying required resources to the own folder.

Args:

  • base_folder (str=’’):

    the base folder for the unique object folder that will hold the required resources.

Errors

class datastories.api.errors.DatastoriesError(value='')

Base exception class for the DataStories SDK.

class datastories.api.errors.ObjectError(value='')

Exception generated when SDK managed objects are not valid.

class datastories.api.errors.LicenseError

Exception generated when accessing license protected functionality using an invalid license.

class datastories.api.errors.ConversionError(value='')

Error raised when data conversion fails.

class datastories.api.errors.VisualizationError(value='')

Error raised when result visualization fails.

class datastories.api.errors.StoryError(value='')

Base class for all story analysis related errors.

class datastories.api.errors.StoryDataLoadingError(value='')

Exception generated when a story analysis cannot load the provided input data.

class datastories.api.errors.StoryDataPreparationError(value='')

Exception generated when a story analysis cannot be preprocess the provided data.

class datastories.api.errors.StoryProcessingError(value='')

Exception generated when a story analysis cannot be performed.

class datastories.api.errors.StoryInterrupted(value='')

Exception generated when a story analysis execution is interrupted.

class datastories.api.errors.ParserError(value='')

Base class for all file parsing and validation related errors.

class datastories.api.errors.FormatError(value='')

Error raised when the provided file is not in a readable format (unreadable csv, …)

class datastories.api.errors.ValidationError(value='')

Error raised when the parser was able to read the file structure, but an error occurred during validation.

class datastories.api.errors.TypeNotRecognized(value='')

Error raised when the SDK parser cannot determine the provided file type.

class datastories.api.errors.TypeNotSupported(value='')

Error raised when the provided file type cannot be handled by SDK the parser.

class datastories.api.errors.ExternalDataConnectionError(value='')

Error raised when VBA scripts or an external data connection is detected in spreadsheet.

Constants and Enumerations

class datastories.api.OutlierType(value)

Enumeration of possible outlier types.

FAR_OUTLIER_HIGH = 2
FAR_OUTLIER_LOW = -2
NO_OUTLIER = 0
OUTLIER_HIGH = 1
OUTLIER_LOW = -1

License Management

datastories.api.get_activation_info()

Get information required to create and activate a DataStories license.

Returns:
dict:

a dictionary containing data to be submitted to the DataStories representative in charge with issuing the license.


The datastories.api.license module encodes the interfaces relevant to interact with licenses. Users of this module are expected to use the predefined license manager available in the license package:

Example:

from datastories.license import manager
manager.initialize(license_file_path='my_license.lic')
manager

class datastories.api.license.LicenseManager(*args, **kwargs)

Encapsulates the DataStories license manager.

The license manager enables users to inspect the details of their installed DataStories SDK license, and to use license keys that are not available in the standard installation locations (see Installation)

This class should not be instantiated directly. Instead one should use the already available object instance datastories.license.manager.

Args:

  • license_file_path (str = None):

    the path to a license key file or folder if other than the standard locations for the platform.

Attributes:

  • status (str):

    the status of the license manager initialization.

  • license (obj):

    the managed license as indicated in the license key file.

Example:

from datastories.license import manager
manager.initialize(license_file_path='my_license.lic')
manager
initialize(host=None, api_key=None, license_file_path=None, initialize_modules=True)

Initialize the license manager with a license key at a specific location.

Args:

  • license_file_path (string):

    the path to a license key RLM file or a folder containing the RLM license key file. If set, host and api_key should not be set

  • host (string):

    the API host to use to get access to the license If set, license_file_path should not be set, and api_key should be provided

  • api_key (string):

    the API key to use to get access to the license If set, license_file_path should not be set, and host should be provided

  • initialize_modules (bool=True):

    set to True in order to initialize dependent modules.

Raises:

  • ValueError:

    when the provided license_file_path is not accessible.

is_granted(option: LicenseOptions | str) bool

Checks if execution rights are granted for license protected functionality.

Args:

  • option (str):

    the license option required by the protected functionality.

Returns:

  • True if execution rights are granted by the installed license.

property is_ok: bool

Check the initialization status of the license manager.

The license manager initialization fails when no valid license file is found in the standard or user indicated locations.

Note: A successful license manager initialization does not imply a grant for using license protected functionality. Fort example, when an expired license is used, the initialization is still successful. To check whether execution rights are granted one should use the datastories.api.license.LicenseManager.is_granted() method.

This method is functionally equivalent to testing whether or not the status is ‘Ok’

Returns:

  • True if the license manager was successfully initialized.

register_module(module)

Regsters a DataStories module.

This method should not be used by end users, since module initialization is automatically performed by the regular module imports. Implementation details depend on the manager.

Args:

  • module:

    Implementation detail. For RLM License Manager, the expected type is a callable on strings.

Returns:

  • Implementation detail. For RLM License Manager, the expected return is the result of a call of module.

reinitialize()

Re-initializes the license manager.

This is done using the same license file path as in the previous call to datastories.api.license.LicenseManager.initialize().

release()

Releases the currently held licenses.

This can be useful e.g., when using floating or counted licenses, as it makes the released licenses available for other clients or processes.

Note: once a license is released, the associated execution rights are retracted. In order to use the license protected functionality, users need to acquire the license, by initializing the license manager again (i.e., datastories.api.license.LicenseManager.initialize()).


class datastories.api.license.License(*args, **kwargs)

Wrapper for a license.

This class is returned by the RLM engine wrapper and can be used to directly interact with the license.

errstring(option: LicenseOptions | str) str | None

Check what’s wrong with a license option.

Args:

  • option (str):

    the license option to check.

Returns:

  • A string message containing error details.

Raises:

get_attr_health(option: LicenseOptions | str) int

Update information about license from server.

Args:

  • option (str):

    the license option to check.

Returns:

  • 0 if the license option is valid; any other value means the license option is invalid.

    The exact interpretation of the error code might depend on the license kind.

Raises:

has_option(option: LicenseOptions | str) bool

Check if an option is present in the option list.

property is_acquired: bool

Check if the license has been acquired.

Note: Once a license object has been created one can use this method to assess whether the other methods of this class are safe to be called. Otherwise, all other methods will result in an exception when the license has not been acquired.

license_exp(option: LicenseOptions | str) str

Get the expiration date of a license option.

Args:

  • option (str):

    the license option to check.

Returns:

  • ‘permanent’ in case there is none. A specific date otherwise.

Raises:

license_exp_days(option: LicenseOptions | str) int

Get the number of days until license expiration.

Args:

  • option (str):

    the license option to check.

Returns:

  • the number of days remaining till license expires.

Raises:

stat(option: LicenseOptions | str) int

Check if the license is valid.

Args:

  • option (str):

    the license option to check.

Returns:

  • 0 if the license option is valid; any other value means the license option is invalid.

    The exact interpretation of the error code might depend on the license kind.

Raises:

Data

Base Classes


Data Frame Preparation

Summary Calculation


Outlier Detection


Classification

Feature Ranking


Correlation

Prototype Detection


Model

Base Classes

Prediction



Optimization



class datastories.optimization.OptimizationDirection

Enumeration for possible optimization goals when no other optimization specification is provided.

Possible values:
  • OptimizationDirection.MAXIMIZE

  • OptimizationDirection.MINIMIZE

Story




Predict Single KPI

Predict Multiple KPIs

Check Data Health

Story Results

General Results






Predict Multiple KPI Story Specific Results




Visualization

Display Utils

The datastories.display package contains a collection of display helpers.


datastories.display.wide_screen(width=0.95)

Make the notebook screen wider when running under Jupyter Notebook.

Args:

  • width (float=0.95):

    width of notebook as a fraction of the screen width. Should be in the interval [0,1].

Raises:

  • ValueError:

    when the [width] argument is outside the accepted interval.

datastories.display.init_graphics(should_embed=False, dslibs_location=None)

Initializes the DataStories graphics engine.

Use this method at the beginning of your notebooks (Jupyter, Jupyterlab, Databricks) to trigger optimal rendering.

The component can be chosen to embed Datastories libraries (should_embed=True), or rely on its running environment (should_embed=False); default is False..

When should_embed=False, that is, the environment is required to be sufficient to load the components, this method loads the scripts in the environment by using the embedded version inside the SDK. Components in NOTEBOOK mode are then loaded taking as granted the environment contains sufficiently many resource.

When should_embed=True, that is, components should embed the DataStories library resources, this method does not act on the HTML and has the following actions: - if dslibs_location is not provided (None), then components will carry their own version of the libraries - otherwise, components will try to reach the provided end point.

Recommended usages: Recommended usage is init_graphics(). On Databricks, it will setup Datastories libraries in /dbfs/FileStore/DataStories/components_library/.

Args:

should_embed: is True if components should be responsible for embedding library resources, False otherwise (the environment is in charge) dslibs_location: is the reference to DataStories libraries

Returns:

Nothing

Effects:

This method has effects on the SDK state, and may has the HTML environment executing it. The latter is irreversible.

datastories.display.export_javascript_library(file_path=None)

Export DataStories libraries as a JavaScript code.

Args:
file_path: The path of the JavaScript file that will contain the library code.

If None, the JavaScript code is returned directly

Returns:

The JavaScript library to load DataStories components


datastories.display.get_progress_bar(progress_bar)

A default implementation for a progress bar.

Args:

Returns:

  • An object of type datastories.api.ProgressReporter.

class datastories.display.ProgressCounter

Base class implemented by all progress counters (including progress reporters).

Attributes:

  • total (int):

    the number of steps required for completion.

  • step (int):

    the current step.

  • start_time (int):

    the start time in ns.

  • stop_time (int):

    the stop time in ns.

increment(steps=1)

Registers a processing advance with a number of steps.

Args:

  • steps (int):

    the number of steps to advance.

start(total=1)

Initialize the progress range.

Args:

  • total (int):

    the number of steps required for completion.

stop()

Stop progress monitoring.

timeout()

Mark the step at which the execution timeout occurred.

Use this upon interrupting counting before reaching the end (i.e., step < total).

class datastories.display.ProgressReporter(observers=[])

Abstract base class implemented by all progress reporters.

Base classes:

Args:

  • observers (list):

    list of progress observers to be notified on progress updates.

property header

Get/set the current reporting header.

increment(steps=1)

Register a processing advance with a number of steps.

Args:

  • steps (int=1):

    number of advance steps.

log(message)

Log a progress message.

Args:

  • message (str):

    progress message to log.

on_progress(progress)

Log the completion percentage.

Args:

  • progress (float=None):

    completion percentage to be logged.

property progress

The currently reported progress.

report()

Notify observers on progress updates.

start(total=1)

Start progress reporting.

Args:

  • total (int=)`:

    total number of steps required for completion.

property state

Get/set the currently reported state.

stop(info='')

Stop progress reporting.

Args:

  • info (str=’’):

    optional message to report.

class datastories.display.AggregatedReporter(stages=None, observers=None, display=True, bar_length=50)

A progress reporter that aggregates progress of a number of independent stages.

Base classes:

Stages are to be specified in the beginning, together with an estimation of the stage importance relative to the whole execution. The progress of each stage will be individually monitored and reported in the context of the whole execution.

Stages are to be identified and activated by setting the progress header.

Args:

  • stages (dict):

    a dictionary mapping local stage names to their bounds in the globally reported progress.

  • observers (list):

    list of observers to be notified about progress updates.

  • display (bool=True):

    set to False in order to disable progress display (e.g., when the display is done by observers)

  • bar_length (int=cfg):

    optional size of the progress bar. It defaults to the value specified in the SDK configuration settings. That is 25 if no configuration settings are provided.

Example:

stages = {
    'Stage 1' : (0,50),
    'Stage 2' : (50,100)
}
reporter = AggregatedReporter(stages=stages)
property header

Get/set the progress report header.

log(message)

Log a progress message.

Args:

  • message (str):

    progress message to log.

on_progress(progress=None)

Log the completion percentage.

Args:

  • progress (float=None):

    completion percentage to be logged.

reset()

Reset the progress reporter.

Plots

The datastories.visualization package contains a collection of visualizations that facilitates the assessment of selected DataStories analysis results.

class datastories.visualization.VisualizableMixin(title='', subtitle='')

Mixin for classes that provide a visualization property.

Enables exporting to HTML, manging the visualization settings, and provides a Jupyter representation.

plot(*args, **kwargs)

Display an interactive visualization.

to_html(file_path, title=None, subtitle=None, scenario=VisualizationScenario.REPORT)

Exports the visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to output file.

  • title (str=’’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

Raises:

property vis_settings

Get/set the visualization settings.

Raises:

abstract property visualization

The visualization.

class datastories.visualization.ColorScheme(value)

Enumeration of available color encoding schemes:

Possible values:

  • For discrete variable encoding:
    • DISCRETE_12

    • DISCRETE_12_LIGHT

    • DISCRETE_10

    • DISCRETE_8

    • DISCRETE_8_LIGHT

    • DISCRETE_8_ACCENT

  • For numeric variable encoding:
    • NUMERIC_RED_YELLOW_GREEN

    • NUMERIC_RED_YELLOW_BLUE

    • NUMERIC_RED_BLUE

    • NUMERIC_PINK_GREEN

    • NUMERIC_COLD_HOT


class datastories.visualization.ConclusionsSettings

Encapsulates visualization settings for datastories.visualization.Conclusions visualizations.

class datastories.visualization.Conclusions(conclusions=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of KPI drivers.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Conclusions visualization.

Accepts the same parameters as the constructor for datastories.visualization.ConclusionsSettings objects.

to_html(file_path, title='Conclusions', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Conclusions visualization to a standalone HTML document.

Args:

  • file_path (str):

    path tho the output file.

  • title (str=’Conclusions’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.ConfusionMatrixSettings(width=480, height=320)

Encapsulates visualization settings for datastories.visualization.ConfusionMatrix visualizations.

Args:

  • width (int=480):

    Graph width in pixels.

  • height (int=320):

    Graph height in pixels.

Attributes:

  • Same as the Args section above.

class datastories.visualization.ConfusionMatrix(prediction_performance, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of model accuracy for binary classification models.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Confusion Matrix visualization.

Accepts the same parameters as the constructor for datastories.visualization.ConfusionMatrixSettings objects.

to_html(file_path, title='Confusion Matrix', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Confusion Matrix visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Confusion Matrix’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


datastories.visualization.correlation_browser(file_path=None, raw_content=None, vis_settings=None)

Displays a Correlation Browser visualization in a Jupyter notebook based on an input correlation data file.

Args:

  • file_path (str=None):

    path to the input data file containing a serialized class:datastories.correlation.CorrelationResult object.

  • raw_content (str=None):

    a string, containing a JSON serialized class:datastories.correlation.CorrelationResult object.

  • vis_setting (obj=CorrelationBrowserSettings()):

    an object of type datastories.visualization.CorrelationBrowserSettings containing visualization settings. Set this object before displaying the visualization or exporting to HTML.

NOTE: Either the [file_path] or [raw_content] argument has to be provided but not both.

Returns:

Raises:

  • ValueError:

    when both the [file_path] and the [raw_content] arguments are provided.

Example:

from datastories.visualization import correlation_browser
correlation_browser('correlations.json')
class datastories.visualization.CorrelationBrowserSettings(scale=1, node_opacity=0.9, edge_opacity=0.3, tension=0.65, font_size=15, filter_unconnected=False, min_weight=50, max_weight=100, weight_key='weightMI', show_controls=True, show_inspector=True)

Encapsulates visualization settings for datastories.visualization.CorrelationBrowser visualizations.

Args:

  • scale (float=1):

    Scale factor of the radius [0-1].

  • node_opacity (float=0.9):

    Opacity of the nodes that aren’t hovered or connected to hovered or selected nodes [0-1].

  • edge_opacity (float=0.3):

    Opacity of the edges that aren’t hovered or connected to hovered or selected nodes [0-1].

  • tension (float=0.65):

    The tension of the links. A tension of 0 means straight lines [0-1].

  • font_size (int=15):

    Font size used for the nodes of the plot [10-32];

  • filter_unconnected (boolean=False):

    Whether or nodes that aren’t connected to any other node are filtered from the view.

  • min_weight (int=50):

    Minimum weight of the links that will be shown [0-100].

  • max_weight (int=100):

    Maximum weight of the links that will be shown [0-100].

  • weight_key (str=’weightMI’):

    Type of relations top display [‘weightMI’ for Mutual Information,’weightL’ for Linear Correlation].

  • show_controls (bool=True):

    Set to True in order to display relation controls.

  • show_inspector (bool=True):

    Set to True in order to display the relation inspector window.

Attributes:

  • Same as the Args section above.

class datastories.visualization.CorrelationBrowser(correlation_result=None, raw_content=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of correlation between features.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Correlation Browser visualization.

Accepts the same parameters as the constructor for datastories.visualization.CorrelationBrowserSettings objects.

to_html(file_path, title='Correlation Browser', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Correlation Browser visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Correlation Browser’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.DataHealthSettings(page_size=25)

Encapsulates visualization settings for datastories.visualization.DataHealth visualizations.

Args:

  • page_size (int=1):

    Maximum number of columns to display one one summary page;

Attributes:

  • Same as the Args section above.

class datastories.visualization.DataHealth(data_health=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of data health report.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Data Health visualization.

Accepts the same parameters as the constructor for datastories.visualization.DataHealthSettings objects.

to_html(file_path, title='Data Health', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Data Health visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Data Health’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.DataSummaryTableSettings(page_size=25, show_console=True)

Encapsulates visualization settings for datastories.visualization.DataSummaryTable visualizations.

Args:

  • page_size (int=1):

    Maximum number of columns to display one one summary page;

  • show_console (bool=True):

    Set to True in order to display the visualization console.

Attributes:

  • Same as the Args section above.

class datastories.visualization.DataSummaryTable(summary=None, column_stats=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of data frame summary.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Data Summary visualization.

Accepts the same parameters as the constructor for datastories.visualization.DataSummaryTableSettings objects.

to_html(file_path, title='Data Summary', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Data Summary visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Data Summary’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


datastories.visualization.driver_overview(file_path=None, raw_content=None, vis_settings=None)

Displays a DriverOverview visualization in a Jupyter notebook based on an input correlation data file.

Args:

  • file_path (str=None):

    path to the input driver overview data file;

  • vis_setting (obj):

    an object of type datastories.visualization.DriverOverviewSettings containing visualization settings. Set this object before displaying the visualization or exporting to HTML.

NOTE: Either the [file_path] or [raw_content] argument has to be provided but not both.

Returns:

Raises:

  • ValueError:

    when both the [file_path] and the [raw_content] arguments are provided.

Example:

from datastories.visualization import driver_overview
driver_overview('driver_overview.json')
class datastories.visualization.DriverOverviewSettings(height=600)

Encapsulates visualization settings for datastories.visualization.DriverOverview visualizations.

Args:

  • height (int=600):

    Graph height in pixels;

Attributes:

  • Same as the Args section above.

class datastories.visualization.DriverOverview(driver_overview=None, raw_content=None, auxiliary_health_stats=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of KPI drivers.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Driver Overview visualization.

Accepts the same parameters as the constructor for datastories.visualization.DriverOverviewSettings objects.

to_html(file_path, title='Driver Overview', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Driver Overview visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Driver Overview’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.ErrorPlotSettings(sort_key='id', highlight_outliers=True, display_confidence_interval=True, connect_dots=False, width=900, height=300)

Encapsulates visualization settings for datastories.visualization.ErrorPlot visualizations.

Args:

  • sort_key (str=’id’):

    The sorting criteria for the X axis.Possible values:

    • 'id': sort on record id.

    • 'actual': sort on record actual KPI value.

    • 'predicted': sort on record predicted value.

  • highlight_outliers (bool=Tue):

    set to True if outliers should be highlighted.

  • display_confidence_interval (bool=True):

    set to True if confidence limits should be displayed.

  • connect_dots (bool=False):

    set to True if data points should be connected by lines

  • width (int=900):

    plot width in pixels.

  • height (int=300):

    plot height in pixels.

Attributes:

  • Same as the Args section above.

class datastories.visualization.ErrorPlot(pva=None, metrics=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of prediction error for prediction models.

Both regression and classification models are supported.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Error Plot visualization.

Accepts the same parameters as the constructor for datastories.visualization.ErrorPlotSettings objects.

to_html(file_path, title='Error Plot', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Error Plot visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Error Plot’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.FeatureRanksTableSettings(height=460, show_console=False)

Encapsulates visualization settings for datastories.visualization.FeatureRanksTable visualizations.

Args:

  • height (int=460):

    graph height in pixels.

  • show_console (bool=True):

    displays the visualization console where update operations are logged.

Attributes:

  • Same as the Args section above.

class datastories.visualization.FeatureRanksTable(feature_ranks, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of feature ranking.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Feature Ranking visualization.

Accepts the same parameters as the constructor for datastories.visualization.FeatureRanksTable objects.

to_html(file_path, title='Feature Ranking', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Feature Ranking visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Feature Ranking’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.OutlierPlotSettings(width=800, height=200, x_padding=0.2, y_padding=0.2, marker_size=32, hover_marker_size_delta=32, animations=500, show_jitter=True, show_cdf=True, show_iqr=True, show_summary=True, show_console=True, show_legend=True, low_threshold=0.05, high_threshold=0.95)

Encapsulates visualization settings for datastories.visualization.OutlierXPlot visualizations.

Args:

  • width (int=800):

    graph width in pixels.

  • height (int=200):

    graph height in pixels.

  • x_padding (float=0.2):

    padding on horizontal axis.

  • y_padding (float=0.2): ;

    padding on vertical axis.

  • marker_size (int=32):

    size of the point marker.

  • hover_marker_size_delta (int=32):

    size of the point hover marker.

  • animations (int=500):

    animation duration in milliseconds.

  • show_jitter (bool=False):

    amount of jitter added to the vertical dimension, to better distinguish points.

  • show_cdf (bool=True):

    set to True to display the cumulative distribution function.

  • show_iqr (bool=True):

    set to True to display the inter-quartile range, as specified in the lower and higher threshold arguments.

  • show_summary (bool=True):

    set to True to display the summary table.

  • show_console (bool=True):

    set to True to display the visualization console where update operations are logged.

  • low_threshold (float=0.05):

    the lower threshold for the inter-quartile range.

  • high_threshold (float=0.95):

    the upper threshold for the inter-quartile range.

Attributes:

  • Same as the Args section above.

class datastories.visualization.OutlierXPlot(outliers_result, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of outliers resulting from a one dimensional analysis.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Outliers visualization.

Accepts the same parameters as the constructor for datastories.visualization.OutlierPlotSettings objects.

to_html(file_path, title='Outliers', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Outliers visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Outliers’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


datastories.visualization.plot_xy(data, x, y, color=None, info_columns=None, **kwargs)

Create an X vs Y plot.

Args:

  • data (obj):

    A Pandas data frame containing the data to be visualized.

  • x (str|int):

    Name or index of the variable for the horizontal axis.

  • y (str|int):

    Name or index of the variable for the vertical axis.

  • color (str|int=None):

    Optional name or index for a variable to be used for encoding in the color dimension.

  • info_columns (list):

    Optional list of name or index for columns to be used to provide additional info (e.g., in tooltips)

  • kwargs (dict):

    Dictionary of additional options to be used for configuring the visualization. See datastories.visualization.PairWisePlotSettings for a complete list.

class datastories.visualization.PairWisePlotSettings(width=600, height=400, color_scheme=ColorScheme.DEFAULT)

Encapsulates visualization settings for datastories.visualization.PairWisePlot visualizations.

Args:

Attributes:

  • Same as the Args section above.

class datastories.visualization.PairWisePlot(plot_json, data=None, diff_data=None, record_info_columns=None, show_navigator=False, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of two variable relations.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the ‘Pair-Wise Plots’ visualization.

Accepts the same parameters as the constructor for datastories.visualization.PairWisePlotSettings objects.

to_html(file_path, title='', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Predict vs Actual visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Predicted vs Actual’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.PredictedVsActualSettings(highlight_outliers=True, show_metrics=True, width=600)

Encapsulates visualization settings for datastories.visualization.PredictedVsActual visualizations.

Args:

  • highlight_outliers (bool=Tue):

    set to True if outliers should be highlighted.

  • show_metrics (bool=True):

    set to True if prediction performance metrics should be displayed

  • width (int=600):

    graph width in pixels.

Attributes:

  • Same as the Args section above.

class datastories.visualization.PredictedVsActual(pva=None, metrics=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of model accuracy for prediction models.

Both regression and classification models are supported.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Predict vs Actual visualization.

Accepts the same parameters as the constructor for datastories.visualization.PredictedVsActualSettings objects.

to_html(file_path, title='Predicted vs Actual', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Predict vs Actual visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Predicted vs Actual’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


class datastories.visualization.PrototypeTableSettings(height=320, show_console=True, selectable=True, condensed=True)

Encapsulates visualization settings for datastories.visualization.PrototypeTable visualizations.

Args:

  • height (int=320):

    graph height in pixels.

Attributes:

  • Same as the Args section above.

class datastories.visualization.PrototypeTable(prototypes, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation of feature prototypes.

Note: Objects of this class should not be manually constructed.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Attributes:

plot(*args, **kwargs)

Convenience function to set-up and display the Prototypes visualization.

Accepts the same parameters as the constructor for datastories.visualization.PrototypeTableSettings objects.

to_html(file_path, title='Prototypes', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the Prototypes visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’Prototypes’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.


datastories.visualization.what_ifs(file_path=None, raw_content=None, init_values=None, minimize_values=None, maximize_values=None, vis_settings=None)

Displays a What-Ifs visualization in a Jupyter notebook based on an input RSX model file.

Args:

  • file_path (str=None):

    path to the input RSX model file. If None the [raw_content] argument has to be provided.

  • raw_content (bytes=None):

    a bytes object, containing the source of the backing RSX model.

  • init_values (list=[]):

    list of initial driver values;

  • minimize_values (list=None):

    driver values that minimize the KPI.

  • maximize_values (list=None):

    driver values that maximize the KPI.

  • vis_settings (obj=WhatIfsSettings()):

    An object of type datastories.visualization.WhatIfsSettings containing the initial visualization settings.

NOTE: Either the [file_path] or [json_content] argument has to be provided but not both.

Returns:

Raises:

  • ValueError:

    when both the [file_path] and the [raw_content] arguments are provided.

Example:

from datastories.visualization import what_ifs
what_ifs('my_model.rsx')
class datastories.visualization.WhatIfsSettings(show_controls=True, show_console=True, show_optimizer=False)

Encapsulates visualization settings for datastories.visualization.WhatIfs visualizations.

Args:

  • show_controls (bool=True):

    Set to True in order to display the visualization controls.

  • show_console (bool=True):

    Set to True in order to display the visualization console.

  • show_optimizer (bool=False):

    Set to True in order to disenable the optimizer functionality.

Attributes:

  • Same as the Args section above.

class datastories.visualization.WhatIfs(init_values=None, minimize_values=None, maximize_values=None, driver_importances=None, raw_model=None, vis_settings=None, *args, **kwargs)

Encapsulates a visual representation for exploring the influence of driver variables on target KPIs.

One can display this visualization in a IPython Notebook by simply giving the name of an object of this class.

Note: Objects of this class should not be manually constructed.

property drivers

Get/set the driver values.

maximize()

Identify a set of driver values that maximize the KPI.

minimize()

Identify a set of driver values that minimize the KPI.

plot(*args, **kwargs)

Convenience function to set-up and display the What-Ifs visualization.

Accepts the same parameters as the constructor for datastories.visualization.PredictedVsActualSettings objects.

to_html(file_path, title='What-Ifs', subtitle='', scenario=VisualizationScenario.REPORT)

Exports the What-Ifs visualization to a standalone HTML document.

Args:

  • file_path (str):

    path to the output file.

  • title (str=’What-Ifs’):

    HTML document title.

  • subtitle (str=’’):

    HTML document subtitle.

  • scenario (enum=VisualizationScenario.REPORT):

    A value of type :class:datastories.api.VisualizationScenario to indicate the use scenario.

MLflow Support

Story modelling


Optimizer modelling