Spaceknow Image (SKI)

SKI is a Spaceknow’s proprietary binary file format used for aerial and satellite images. Most visual data inside SK platform are handled in this file format.

SKI is a file-format and a python library. SKI supports multiple bands (i.e. channels). For example each band can represent a single part of electromagnetic spectrum (e.g. near-infrared). A band is a 2D matrix of integers. Each band in a single SKI can have a different resolution (number of rows and columns) and bit-depth. 8, 16, 32 and 64 bits per pixel per band are supported.

TODO: a visual example of an image with two bands, with different resolutions and different bit depths

Python API

The new-style Pythonic API for SKI access is available in the sk.tools.ski.handle module, which replaces the deprecated sk.tools.ski.image. Overview of available classes:

SkiHandle
Low-level SKI reader/writer. Mostly an implementation detail, externally used only in special cases.
GeoReferencedSki
Abstract base class for geo-referenced SKIs. Utility functions for operations like scaling or reprojecting work with this type.
ImagerySki(GeoReferencedSki)
Representation of geo-referenced imagery SKI with rich metadata like satellite name or cloud cover.
AnalysisSki(GeoReferencedSki)
Representation of an SKI with less metadata—only those crucial for geo-referencing. Typically used for analysis results.
MaskedBand
One band (channel) of an SKI with mask and convenience methods.
MaskedBandWithMeta(MaskedBand)
Extension of MaskedBand with geo-referencing metadata, with optional rich metadata.

SKI Handle Classes

Both SkiHandle and GeoReferencedSki (subclasses) are essentially wrappers around their band_map (dict) attribute. band_map is a band_id (str) => MaskedBand or MaskedBandWithMeta mapping.

Manipulation with bands is therefore easy:

>>> # get IDs of all bands in ski:
>>> list(ski_handle.band_map.keys())
['red', 'green']

>>> # add new "near-infrared" band created out of its band_np, mask_np
>>> # Numpy arrays and metadata object:
>>> ski_handle.band_map['near-infrared'] = \
>>>     MaskedBandWithMeta(band_np, mask_np, metadata)

>>> # remove "near-infrared" band:
>>> del ski_handle.band_map['near-infrared']

>>> # duplicate "red" band as "blue" (referencing same data):
>>> ski_handle.band_map['blue'] = ski_handle.band_map['red']

Loading and saving to path or file-like object is done using load() and save(). It is possible to load legacy SKIs with non-conforming metadata by breaking it into multiple steps:

# load SKI which doesn't have valid CRS EPSG filled in:
raw_ski = SkiHandle.load('old.ski')
raw_ski.meta['crsEpsg'] = 12345
ski = ImagerySki.from_ski_handle(raw_ski)

# Load an SKI which has unfortunate order of names of each band - the
# correct name is always the second:
def choose_second(names):
    return names[1]
ski = ImagerySki.load('legacy.ski', choose_band_id=choose_second)

GeoReferencedSki provides convenience multiple-band access methods get_pil_like_data(), get_mask_intersection().

AnalysisSki provides convenience method from_reference_band() that is useful for creating analysis results out of reference imagery or other analysis result SKI/band.

Band Classes

Band classes MaskedBand and MaskedBandWithMeta are containers for NumPy band data and mask arrays. Mask is a bit array with multiple possible values for each pixel. The classes however provide convenience computed boolean mask properties valid_mask and requested_mask, with setters (in-line modification alone of these computed arrays has no effect).

>>> data = np.array([[1, 2], [3, 4]])
>>> valid = np.array([[True, True], [False, False]])
>>> requested = np.array([[True, False], [False, True]])
>>> masked_band = MaskedBand.from_data_valid_requested(data, valid, requested)
>>> masked_band.data
array([[1, 2],
       [3, 4]])
>>> masked_band.mask
array([[3, 1],
       [0, 2]], dtype=uint8)
>>> masked_band.valid_mask
array([[ True,  True],
       [False, False]], dtype=bool)
>>> masked_band.requested_mask
array([[ True, False],
       [False,  True]], dtype=bool)

>>> # data, mask can be modified in-line:
>>> masked_band.data[0, 0] = 5
>>> masked_band.data
array([[5, 2],
       [3, 4]])
>>> masked_band.mask[0, 0] = 2
>>> masked_band.mask
array([[2, 1],
       [0, 2]], dtype=uint8)

>>> # Beware, changing computed boolean properties has no effect
>>> masked_band.valid_mask[0, 0] = True  # this is an error, no effect
>>> masked_band.valid_mask[0, 0]
False

Scaling

Rescale an SKI, keeping geo-referencing metadata correct:

from sk.tools.image.scaling import scale_geo_ski, scale_geo_ski_to_shape

# create new georeferenced SKI such that each band has approximate
# resolution equal to approximately 3 meters
new_ski = scale_geo_ski(old_ski, target_resolution=3)

# or specify desired shape of band after scaling
new_ski = scale_geo_ski_to_shape(old_ski, target_shape=(256, 256))

Reprojecting

Reproject SKI to a different projection or change its origin, correctly updating geo-referencing metadata along the way:

Note

reproject_geo_ski() is not yet implemented.

from sk.tools.image.reprojection import reproject_geo_ski

# reproject SKI to Web Mercator projection
new_ski = reproject_geo_ski(ski, 3857, target_origin, target_pixel_size,
                            (256, 256))

Binary On-Disk Representation

SKI is gzipped tar archive with band files, info and meta JSON files.

Bands

Bands are stored into separate files with names xxxxx.skb where x is index of the band (e.g. 00000.skb). First two bytes of the file are an unsigned integer whose value map to data type according to following table:

value dtype
8 uint8
9 int8
16 uint16
17 int16
32 uint32
33 int32
64 uint64
65 int64

Next four bytes represent a number of columns and rows. These bytes follow little endian format. The rest of the file is row-major ordered data of the band.

Pixel values could be reconstructed with the following formula:

\(p_{r, c} = (p_{r - 1, c} + b_{r, c}) \mod 2^k\),

where \(p_{r, c}\) is the pixel value at the rth row and cth column, \(b_{r, c}\) is the value from a stored matrix at the rth row and cth column and k is the bit depth of the band (maximum value + 1). -1th row is defined as a row full of 0s, therefore, the encoded values of the first row are equal to real pixel values.

When the binary is constructed and inverse formula is used.

The following is an hexadecimal example of a band with 8 bits per pixel, one column and two rows. Value in the first row is 250 and 200 in second.

08 00 01 00 00 00 02 00 00 00 FA C8

Info

Every SKI contains info.json file which is UTF-8 encoded JSON serialized data with information about the SKI. It contains a list of band names (band can have multiple names) and SKI version.

Info file has this format:

{
    "bands": [
        {
            "names": ["{band-name}"]
        }
    ],
    "version": "{SKI version}"
}

Example of an RGB image:

{
    "bands": [
        {
            "names": ["r", "red"]
        },
        {
            "names": ["g", "green"]
        },
        {
            "names": ["b", "blue"]
        }
    ],
    "version": "7"
}

Meta

An optional file meta.json may be present in the SKI archive. This file contains arbitrary JSON data set by the creator of the SKI. The file is UTF-8 encoded JSON serialized data.

Auxiliary Files

An arbitrary set of auxiliary files may be present in the aux/ sub-folder of an SKI archive. This is a universal way to transfer additional data from SKI producers to consumers.

No assumptions are made on the content of these files, it can be e.g. JSON, XML, binary file etc.

SKI Band Masks

Each band in an SKI has a corresponding data mask named __MASK__{band_name}__ (e.g. __MASK__red__). Masks are 8 bit unsigned integers (uint8) where each bit has a different meaning.

Bit 0b00000001

Value “1” corresponds to a valid pixel, other values indicate blackfill, lost, suspect or corrupt pixel.

Bit 0b00000010

Value “1” corresponds to a pixel inside any area-based geometry from requested extent.

Bit 0b00000100

Value “1” corresponds to a lost, suspect or otherwise corrupt pixel. Value “1” implies pixel invalidity (least significant bit must be “0”).

This bit is available only in Planet imagery and is always set to “0” for other providers.