.. Copyright (C) 2019-2025 SpaceKnow, Inc.

.. _api.datacube:

************
Datacube API
************

Datacube API is a system for storing, querying and filtering data-point time
series.

* URL: https://api.spaceknow.com/datacube

See also :docs:`Datacube API reference<redoc.html#tag/Datacube>`.

.. _api.datacube.concept:

Concept
=======

Datacube serves a “data lake” of data-points, an ever increasing storage of
versioned, multi-dimensional time-series data. New data from various sources,
including but not limited to SpaceKnow algorithms applied on satellite imagery,
are constantly streamed to Datacube.

A *data-point* is a scalar (for example a result of a specific algorithm, e.g.
mean NDVI value, or number of detected cars) that can be computed on a given
(multi)polygon AoI and one source data sample (usually one satellite image shot
or a imagery mosaic). Datacube is a query-able, multi-dimensional data-point
time-series storage and API.

Data-points have the following properties (i.e. dimensions of the datacube):

* ``version`` (str) -- version of the data-point. For example, data can be
  re-computed when the source algorithm changes or when new interpolation could
  be computed after new satellite images become available. Once a version of a
  data-point is uploaded it remains forever available.

  Alphabetically greater versions are considered newer. For example, version
  ``ab`` is considered newer than version ``aa``.
* ``startDatetime``, ``endDatetime`` (datetime) -- a time-range (possibly 0
  seconds wide) of the source data acquisition (for example a time when underlying
  satellite image was taken).
* ``algorithm`` (str) -- a string identification of the underlying algorithm(s)
  which produced the particular data-point. For example, ``ndmi_mean``.
* ``project`` (str) -- For example, ``aluminium`` or ``coal`` (but not
  ``china-coal``).
* ``aoi`` (GeoJSON) -- point AoI, this information is not retrievable
  with the API but data-points can be filtered to a region.
* ``aoiId`` (str) -- a unique identifier of the data-point AoI.
* ``source`` (str) -- underlying data source identification, for example
  ``landsat_8``, ``viirs_nl_monthly``.
* ``firstSeen`` (datetime) -- date of the initial availability of the data to
  us, or date when we've first seen the data when the former is not available.
  This value is **optional**.
* ``ingestDatetime`` (datetime) -- time at which the data-point was inserted
  into Datacube.
* ``cloudCover`` (float) -- **expected** cloudiness in range [0, 1], 0 being
  cloud-free. Satellite-derived data-points may have diminished quality due to
  high cloud cover. This value is **optional**.
* ``intersectionRatio`` (float) -- a value in range [0, 1]. Ratio of site's
  area that has been analyzed in given sample and total site's area. This value
  is **optional**.

.. _api.datacube.get:

Get Data
========

This API end-point returns a link to a CSV on Google Cloud Storage with column
for each Datacube dimension and another column with values.

.. note::

   By default only data with newest version are returned. This behaviour
   can be overridden by setting ``keepAllVersions`` to ``true`` or by filtering by
   version.

.. _api.datacube.filters:

Filters
-------

Data can be filtered with so-called filters. Filter on ``project`` and
``algorithm`` is mandatory for each query.
Each filter is a JSON object in the following form.

* ``type`` (str) -- type of filter, see list of types below.
* ``field`` (str) -- field / axis on which the filter should be applied. See
  available fields in :ref:`api.datacube.concept`.
* ``params`` (object) -- object with filter parameters which is specific for
  each filter type.

The following filter-types are available.

* ``time-range`` -- keeps only data-points inside a date-range. Filter bounds
  can be half-open if one of the bounding parameters is left out.

  Parameters:

  * ``from`` -- a date-time in the form ``YYYY-mm-dd HH:MM:SS``, inclusive.
  * ``to`` -- a date-time in the form ``YYYY-mm-dd HH:MM:SS``, inclusive.

* ``value-range`` -- keeps only data-points inside a value-range. Filter
  bounds can be half-open if one of the bounding parameters is left out.
  Missing / null values are filtered out by this filter.

  Parameters:

  * ``min`` -- a minimum allowed value, inclusive.
  * ``max`` -- a maximum allowed value, inclusive.

* ``value-list`` -- keeps only data-points which have given field set to
  one of provided values. Needs to be used for ``project`` and ``algorithm``.
  Other possible fields to filter are ``version``, ``aoiId``, ``source`` and
  ``tag``. Behaviour of ``tag`` filter is a little different than the rest -
  all provided values must be matched by given data-point to satisfy the
  filter.

  Parameters:

  * ``values`` (list) -- list of allowed string values.
* ``geo-intersects`` -- keeps only data-points with AoI within a given
  area. Parameters:

  * ``geometryLabel`` -- label of the filtering geometry. For countries
    lower-case ISO identifier is used. Examples:

    * ``cz``
    * ``cn``

.. warning::

   If filtering geometry is not chosen or multiple filtering geometries are
   chosen and resulting AOIs have different newest version only those with
   overall newest version are returned. This behaviour can be overridden by
   setting ``keepAllVersions`` to ``true`` or by filtering by version.

.. _api.datacube.permissions:

Permission Packages
-------------------

Each user can be given (multiple) permission packages. Each permission package
allows the user to access some data-point subset. A permission package is a
list of mandatory filters (see :ref:`above <api.datacube.filters>`). The user
is allowed to perform a query if she uses all filters from some of her
permission packages. In other words, the user must use at least as restrictive
filter as one of her permission packages.

.. note::

   Users need regular ``datacube.get``
   :ref:`permission <api.mechanics.permissions>` to have access to the API
   endpoint. Permission packages work on-top of that.

All users are automatically granted permission package with ``value-range``
filter on field ``project`` set to values ``['test']``. Project ``test``
contains test values. This implicit “free” permission package serves for
development purposes.

.. _api.datacube.aggregation:

Aggregation
-----------

It is possible to retrieve periodic aggregates of the data. In such a case, the
original data-points are replaced by average values computed over multiple
data-points falling into the same time-period. The aggregation preserves
``version``, ``algorithm``, ``project``, ``aoiId`` and ``source``, id est
data-points with different values in one of the mentioned fields are treated as
separate.

When aggregated data are requested, original start and end date-times are
replaced with formatted dates of the time periods. ``daily`` (``%Y-%m-%d``),
``weekly`` (``%G-W%V``, i.e. ISO 8601 week), ``monthly`` (``%Y-%m``),
``yearly`` (``%Y``) aggregations are allowed.

.. _api.datacube.deduplication:

Deduplication
-------------

Datacube output deduplicates data by default. Rows with same ``project``,
``algorithm``, ``start_timestamp``, ``end_timestamp``, ``aoi_id``, ``source``,
``version`` are treated as duplicates.
Duplicate entries are ordered by ``first_seen`` (descending) and by
``ingest_timestamp`` (descending) and first data-point is kept.

.. warning::

   Be aware that row with higher ``first_seen`` value takes precedence over
   row with higher ``ingest_timestamp``.

For aggregation this ensures that average values are calculated from unique
values only. Using attribute ``keepDuplicates`` set to ``true`` can be used to
override this behaviour and return all the rows.

.. warning::

   Be aware that keeping duplicates especially in aggregated queries can really
   skew the results.

.. _api.datacube.postprocessing:

Postprocessing
--------------

Datacube API can provide a simple view on top of the query result. This option
is available for the :ref:`asynchronous endpoint<datapoints-async-get>` only.
Postprocessing can be done by adding ``'postprocess':'postprocess_name'`` to
request json.
The list of supported post-processing operations follows:

* ``pivot`` -- groups values by datetime transposes `aoi_ids` into columns and
  averages values that were grouped.

* ``columns`` -- allows to filter resulting view based on given columns.
  Requested columns are provided via ``columns`` property.

* ``seasonal_decompose`` -- adjust time series for seasonal effects. New
  columns are added to the resulting csv - ``valueSeasonallyAdjusted`` and
  ``seasonalDecomposition``. A resampling must be specified via ``resample``
  property ('W', 'M' or 'Y').

* ``sd_hpfilter`` -- adjust time series for seasonal effects and apply
  Hodrick–Prescott filter. New columns are added to the result - all columns
  listed for ``seasonal_decompose`` plus ``trend`` and ``gap``. A resampling
  must be specified via ``resample``  property ('M' or 'Y').
* ``abs_diff`` -- replace each value by its absolute difference from
  the previous value, the first row is lost in the process
* ``diff_log`` -- replace each value by the difference between its logarithm
  and a logarithm of the previous value, the first row is lost in the process,
  can lead to an error if the time series contains zero or negative values
* ``unique_datetimes`` -- Remove datapoints with duplicate start or end
  datetime for each AoI. Take preferentially datapoints with shorter time span.
  This modification is needed for SAR change datapoints where a problem arises
  when scenes are ingested out of order in Earth Engine.
* ``yoy_percent_diff`` -- replace each value by its percentage difference from
  the value exactly a year ago (including time), the first year is lost in
  the process.


.. warning::

   Important metadata will be lost when using postprocessing.

.. _datapoints-get:

See :docs:`Datapoints Get API reference<redoc.html#tag/Datacube/operation/Get_Datapoints_datacube_datapoints_get_post>`.

Get Large Amount of Data-points
-------------------------------

.. _datapoints-async-get:

See :docs:`Datapoints Async Get API reference<redoc.html#tag/Datacube/operation/Get_Datapoints_Initiate_datacube_datapoints_get_initiate_post>`.

.. _api.datacube.aois:

List Available AoIs
===================

See :docs:`List Available AoIs API reference<redoc.html#tag/Datacube/operation/Get_Filter_Aois_datacube_aois_get_post>`.

List Catalogue of Available Data-points
=======================================

See :docs:`List Catalogue Data-points API reference<redoc.html#tag/Datacube/operation/Get_Catalogue_datacube_catalogue_get_post>`.

Get Product
===========

.. _datacube-product-get:

See :docs:`Get Product API reference<redoc.html#tag/Datacube/operation/Get_Product_datacube_product_get_post>`.

CSV Formats
-----------

Simple
^^^^^^

**Contains:**

* value datetime (``value_dt``)

    * Datapoint datetime for which the values were calculated.

* delivery datetime (``delivery_dt``)

    * Exact datetime when the data was delivered.

* product value columns (e.g. ``SK_XXX_ASI_US-MN_D_24d_high_activity_level``)

    * Other columns in the delivered csv are defined per product but are
      generally prefixed by the ``productId``.

Rich
^^^^

**Contains:**

* value datetime (``value_dt``)

    * Datapoint datetime for which the values were calculated.

* delivery datetime (``delivery_dt``)

    * Exact datetime when the data was delivered.

* methodology (``methodology``)

    * Name of the methodology used for data calculation.

* aggregation types (``aggregation``)

    * Types of aggregation applied to the data.

* country (``country``)

    * Country associated with the data.

* region (``region``)

    * Region associated with the data.

* company (``company``)

    * Company related to the data.

* frequency (``frequency``)

    * Frequency at which the data is updated or reported.

* number of locations (``locations``)

    * Total count of locations included in the data.

* categorization (``type``, ``sub_type``, ``sub_type_2``)

    * Categorization attributes for the data.

* product value columns (e.g., ``data_value_24d_normal_activity``)

    * Columns representing product values. In the 'rich' format, the column
      names have no product ID prefix, only the value type.

* product ID (``product_id``)

    * Unique identifier for the product associated with the data.

.. note::

    In the 'simple' format, a column named ``SK_XXX_ASI_US-MN_D_24d_normal_activity_level``
    is equivalent to the column named ``data_value_24d_normal_activity``. The name can be
    constructed by removing 'data_value' and prepending the product ID.

.. _api.datacube.product-permissions:

Product Permission Packages
---------------------------

Each user can be given (multiple) product permission packages.
A permission package is a list of product IDs which user is allowed to
retrieve. The data is trimmed using the user's earliest date cut-off
when they have multiple product permission packages over the same product ID.

.. note::

   Users need regular ``datacube.get``
   :ref:`permission <api.mechanics.permissions>` to have access to the API
   endpoint. Product permission packages work on-top of that.

Get Product Catalogue
=====================

.. _datacube-product-catalogue-get:

See :docs:`Product Catalogue Get API reference<redoc.html#tag/Datacube/operation/Get_Product_Catalogue_datacube_product_catalogue_get_post>`.

Get Product Sales Info
======================

.. _datacube-product-sales-info-get:

See :docs:`Sales Info Get API reference<redoc.html#tag/Datacube/operation/Get_Product_Sales_Info_datacube_product_sales_info_get_post>`.