SpaceKnow Image (SKI)

SKI is a SpaceKnow’s proprietary binary file format used for aerial and satellite images. Most visual data inside SK platform are handled in this file format.

SKI supports multiple bands (i.e. channels). For example each band can represent a single part of electromagnetic spectrum (e.g. near-infrared). A band is a 2D matrix of values. Each band in a single SKI can have a different resolution (number of rows and columns) and bit-depth. 8, 16, 32 and 64 bits per pixel per band are supported.

_images/2_bands_different_bit_depth.svg

Python API

Pythonic API for SKI access is available in the sk.tools.ski.handle module, which replaces the deprecated

SkiHandle

Low-level SKI reader/writer. Mostly an implementation detail, externally used only in special cases.

Ski

Abstract base class for geo-referenced SKIs. Utility functions for operations like scaling or reprojecting work with this type.

ImagerySki(GeoReferencedSki)

Representation of geo-referenced imagery SKI with rich metadata like satellite name or cloud cover.

AnalysisSki(GeoReferencedSki)

Representation of an SKI with less metadata—only those crucial for geo-referencing. Typically used for analysis results.

MaskedBand

One band (channel) of an SKI with mask and convenience methods.

MaskedBandWithMeta(MaskedBand)

Extension of MaskedBand with geo-referencing metadata, with optional rich metadata.

SKI Handle Classes

Both SkiHandle and Ski (subclasses) are essentially wrappers around their band_map (dict) attribute. band_map is a band_id (str) => MaskedBand or MaskedBandWithMeta mapping.

Manipulation with bands is therefore easy:

>>> from sk.tools.ski.handle import SkiHandle, MaskedBand
>>> import numpy as np
>>> ski_handle = SkiHandle.load('doc/files/old.ski')
>>> # get IDs of all bands in ski:
>>> list(ski_handle.band_map.keys())
['blue', 'green', 'near-infrared', 'red']

>>> # add new "near-ir2" band created out of Numpy arrays:
>>> band_np = np.zeros(ski_handle.band_map['blue'].data.shape)
>>> mask_np = np.zeros(ski_handle.band_map['blue'].data.shape, dtype=np.uint8)
>>> ski_handle.band_map['near-ir2'] = MaskedBand(band_np, mask_np)
>>> list(ski_handle.band_map.keys())
['blue', 'green', 'near-infrared', 'red', 'near-ir2']

>>> # remove "near-ir2" band:
>>> del ski_handle.band_map['near-ir2']
>>> list(ski_handle.band_map.keys())
['blue', 'green', 'near-infrared', 'red']

>>> # duplicate "red" band as "blue" (referencing same data):
>>> ski_handle.band_map['blue'] = ski_handle.band_map['red']

Loading and saving to path or file-like object is done using load() and save(). It is possible to load legacy SKIs with non-conforming metadata by breaking it into multiple steps:

>>> from sk.tools.ski.handle import SkiHandle, ImagerySki
>>> # load SKI which doesn't have valid CRS EPSG filled in:
>>> raw_ski = SkiHandle.load('doc/files/old.ski')
>>> raw_ski.meta['crsEpsg'] = 12345
>>> ski = ImagerySki.from_ski_handle(raw_ski)

Ski provides convenience multiple-band access methods get_pil_like_data(), get_mask_intersection().

AnalysisSki provides convenience method from_reference_band() that is useful for creating analysis results out of reference imagery or other analysis result SKI/band.

Band Classes

Band classes MaskedBand and MaskedBandWithMeta are containers for NumPy band data and mask arrays. Mask is a bit array with multiple possible values for each pixel. The classes however provide convenience computed boolean mask properties valid_mask and requested_mask, with setters (in-line modification alone of these computed arrays has no effect).

>>> data = np.array([[1, 2], [3, 4]])
>>> valid = np.array([[True, True], [False, False]])
>>> requested = np.array([[True, False], [False, True]])
>>> masked_band = MaskedBand.from_data_valid_requested(data, valid, requested)
>>> masked_band.data
array([[1, 2],
       [3, 4]])
>>> masked_band.mask
array([[3, 1],
       [0, 2]], dtype=uint8)
>>> masked_band.valid_mask
array([[ True,  True],
       [False, False]])
>>> masked_band.requested_mask
array([[ True, False],
       [False,  True]])

>>> # data, mask can be modified in-line:
>>> masked_band.data[0, 0] = 5
>>> masked_band.data
array([[5, 2],
       [3, 4]])
>>> masked_band.mask[0, 0] = 2
>>> masked_band.mask
array([[2, 1],
       [0, 2]], dtype=uint8)

>>> # Beware, changing computed boolean properties has no effect
>>> masked_band.valid_mask[0, 0] = True  # this is an error, no effect
>>> masked_band.valid_mask[0, 0]
False

Scaling

Rescale an SKI, keeping geo-referencing metadata correct:

from sk.tools.image.scaling import scale_ski, scale_ski_to_shape

# create new georeferenced SKI such that each band has approximate
# resolution equal to approximately 3 meters, resulting band shapes might
# differ
new_ski = scale_ski(old_ski, target_resolution=3)

# or specify desired shape of band after scaling
new_ski = scale_ski_to_shape(old_ski, target_shape=(512, 512))

Reprojecting

Reproject SKI to a different projection or change its origin, correctly updating geo-referencing metadata along the way:

from sk.tools.image.reprojection import reproject_ski

# reproject SKI to Web Mercator projection
new_ski = reproject_ski(ski, 3857, target_origin, target_pixel_size,
                        (512, 512), pad_width)

Note that this function can be used to crop SKI whose bands do not have equal shape.

Cropping

Following example illustrates how to crop SKI given a row, column, height and width of desired cropped SKI:

from sk.tools.image.cropping import crop_ski

# crop SKI with shape (256, 256) starting at 512 col and row
new_ski = crop_ski(ski, 512, 512, 256, 256)

Note that input SKI needs to have bands with the same shape, see scaling.

Binary On-Disk Representation

SKI is gzipped tar archive with band files, info and meta JSON files.

Bands

Bands are stored into separate files with names xxxxx.skb where x is index of the band (e.g. 00000.skb). First two bytes of the file are an unsigned integer whose value map to data type according to following table:

value

dtype

2

binarized

8

uint8

9

int8

16

uint16

17

int16

32

uint32

33

int32

34

float32

64

uint64

65

int64

66

float64

67

stretched float

binarized dtype represents uint8 np.dtype where all values are either 0 or 1. stretched float is saved as uint16 np.dtype but when loaded it is represented by float32 np.dtype.

float64 dtype is now deprecated but there is backwards compatibility to load legacy SKIs.

Next eight bytes are little endian floats. They represent value range which is used only for stretched float, for all other dtypes it is set to (0, 0). For stretched float value range represents what the uint16 array should stretch into. For most current use cases it is set to (0, 1).

Next four bytes represent a number of columns and rows. These bytes follow little endian format. The rest of the file is row-major ordered data of the band.

Pixel values could be reconstructed with the following formula:

\(p_{r, c} = (p_{r - 1, c} + b_{r, c}) \mod 2^k\),

where \(p_{r, c}\) is the pixel value at the rth row and cth column, \(b_{r, c}\) is the value from a stored matrix at the rth row and cth column and k is the bit depth of the band (maximum value + 1). -1th row is defined as a row full of 0s, therefore, the encoded values of the first row are equal to real pixel values.

When the binary is constructed and inverse formula is used.

The above method of storing pixel values is only used for bands that use any integer as underlying data type including stretched float, the only exception is binarized dtype. For bands with floating point underlying data type and binarized dtype pixel values are stored directly.

The following is a hexadecimal example of a band containing uint8 pixels, with value range (0, 0), one column and two rows. Value in the first row is 250 and 200 in the second.

08 00 00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 FA C8

Info

Every SKI contains info.json file which is UTF-8 encoded JSON serialized data with information about the SKI. It contains a list of band names (band can have multiple names), SKI version and SKI type (imagery, analysis).

Info file has this format:

{
    "bands": [
        {
            "names": ["{band-name}"]
        }
    ],
    "version": "{SKI version}",
    "skiType" "{type of SKI}"
}

Example of an RGB image:

{
    "bands": [
        {
            "names": ["red"]
        },
        {
            "names": ["green"]
        },
        {
            "names": ["blue"]
        }
    ],
    "version": "200",
    "skiType" "imagery"
}

Meta

An optional file meta.json may be present in the SKI archive. This file contains arbitrary JSON data set by the creator of the SKI. The file is UTF-8 encoded JSON serialized data.

Meta always contain scene metadata in SKIs produced by Ragnar API. Geo-referencing sub-set of the metadata (i.e. EPSG code, pixel size and CRS origin) is usually present in algorithm output (i.e. mask) SKIs.

Auxiliary Files

An arbitrary set of auxiliary files may be present in the aux/ sub-folder of an SKI archive. This is a universal way to transfer additional data from SKI producers to consumers.

No assumptions are made on the content of these files, it can be e.g. JSON, XML, binary file etc.

SKI Band Masks

Each band in an SKI has a corresponding data mask stored in separate file named __MASK__{band_name}__ (e.g. __MASK__red__). The first two bytes of the file are an unsigned integer of value 3 which states that the file is a mask. Next eight bytes are little endian floats. They represent value range which is set to (0, 0). Next four bytes represent a number of columns and rows. These bytes follow little endian format. The rest of the file is row-major ordered 8 bit unsigned integers (uint8) where each bit has a different meaning.

Bit 0b00000001

Value “1” corresponds to a valid pixel, other values indicate blackfill, lost, suspect or corrupt pixel. In the context of an algorithmic results, value “1” is set for pixels where the algorithm couldn’t perform adequately.

Bit 0b00000010

Value “1” corresponds to a pixel inside any area-based geometry from requested extent.

Bit 0b00000100

Value “1” corresponds to a lost, suspect or otherwise corrupt pixel. Value “1” implies pixel invalidity (least significant bit must be “0”).

This bit is available only in Planet imagery and is always set to “0” for other providers.