SpaceKnow Image (SKI)
SKI is a SpaceKnow’s proprietary binary file format used for aerial and satellite images. Most visual data inside SK platform are handled in this file format.
SKI supports multiple bands (i.e. channels). For example each band can represent a single part of electromagnetic spectrum (e.g. near-infrared). A band is a 2D matrix of values. Each band in a single SKI can have a different resolution (number of rows and columns) and bit-depth. 8, 16, 32 and 64 bits per pixel per band are supported.
Python API
Pythonic API for SKI access is available in the
sk.tools.ski.handle
module, which replaces the deprecated
- SkiHandle
Low-level SKI reader/writer. Mostly an implementation detail, externally used only in special cases.
- Ski
Abstract base class for geo-referenced SKIs. Utility functions for operations like scaling or reprojecting work with this type.
- ImagerySki(GeoReferencedSki)
Representation of geo-referenced imagery SKI with rich metadata like satellite name or cloud cover.
- AnalysisSki(GeoReferencedSki)
Representation of an SKI with less metadata—only those crucial for geo-referencing. Typically used for analysis results.
- MaskedBand
One band (channel) of an SKI with mask and convenience methods.
- MaskedBandWithMeta(MaskedBand)
Extension of MaskedBand with geo-referencing metadata, with optional rich metadata.
SKI Handle Classes
Both SkiHandle
and Ski
(subclasses) are
essentially wrappers around their band_map
(dict) attribute.
band_map
is a band_id
(str) => MaskedBand
or
MaskedBandWithMeta
mapping.
Manipulation with bands is therefore easy:
>>> from sk.tools.ski.handle import SkiHandle, MaskedBand
>>> import numpy as np
>>> ski_handle = SkiHandle.load('doc/files/old.ski')
>>> # get IDs of all bands in ski:
>>> list(ski_handle.band_map.keys())
['blue', 'green', 'near-infrared', 'red']
>>> # add new "near-ir2" band created out of Numpy arrays:
>>> band_np = np.zeros(ski_handle.band_map['blue'].data.shape)
>>> mask_np = np.zeros(ski_handle.band_map['blue'].data.shape, dtype=np.uint8)
>>> ski_handle.band_map['near-ir2'] = MaskedBand(band_np, mask_np)
>>> list(ski_handle.band_map.keys())
['blue', 'green', 'near-infrared', 'red', 'near-ir2']
>>> # remove "near-ir2" band:
>>> del ski_handle.band_map['near-ir2']
>>> list(ski_handle.band_map.keys())
['blue', 'green', 'near-infrared', 'red']
>>> # duplicate "red" band as "blue" (referencing same data):
>>> ski_handle.band_map['blue'] = ski_handle.band_map['red']
Loading and saving to path or file-like object is done using load()
and
save()
. It is possible to load legacy SKIs with non-conforming metadata
by breaking it into multiple steps:
>>> from sk.tools.ski.handle import SkiHandle, ImagerySki
>>> # load SKI which doesn't have valid CRS EPSG filled in:
>>> raw_ski = SkiHandle.load('doc/files/old.ski')
>>> raw_ski.meta['crsEpsg'] = 12345
>>> ski = ImagerySki.from_ski_handle(raw_ski)
Ski
provides convenience multiple-band access
methods get_pil_like_data()
, get_mask_intersection()
.
AnalysisSki
provides convenience method
from_reference_band()
that is useful for creating analysis results out
of reference imagery or other analysis result SKI/band.
Band Classes
Band classes MaskedBand
and MaskedBandWithMeta
are containers
for NumPy band data and mask arrays. Mask is a bit array with multiple possible
values for each pixel. The classes however provide convenience computed boolean
mask properties valid_mask
and requested_mask
, with setters
(in-line modification alone of these computed arrays has no effect).
>>> data = np.array([[1, 2], [3, 4]])
>>> valid = np.array([[True, True], [False, False]])
>>> requested = np.array([[True, False], [False, True]])
>>> masked_band = MaskedBand.from_data_valid_requested(data, valid, requested)
>>> masked_band.data
array([[1, 2],
[3, 4]])
>>> masked_band.mask
array([[3, 1],
[0, 2]], dtype=uint8)
>>> masked_band.valid_mask
array([[ True, True],
[False, False]])
>>> masked_band.requested_mask
array([[ True, False],
[False, True]])
>>> # data, mask can be modified in-line:
>>> masked_band.data[0, 0] = 5
>>> masked_band.data
array([[5, 2],
[3, 4]])
>>> masked_band.mask[0, 0] = 2
>>> masked_band.mask
array([[2, 1],
[0, 2]], dtype=uint8)
>>> # Beware, changing computed boolean properties has no effect
>>> masked_band.valid_mask[0, 0] = True # this is an error, no effect
>>> masked_band.valid_mask[0, 0]
False
Scaling
Rescale an SKI, keeping geo-referencing metadata correct:
from sk.tools.image.scaling import scale_ski, scale_ski_to_shape
# create new georeferenced SKI such that each band has approximate
# resolution equal to approximately 3 meters, resulting band shapes might
# differ
new_ski = scale_ski(old_ski, target_resolution=3)
# or specify desired shape of band after scaling
new_ski = scale_ski_to_shape(old_ski, target_shape=(512, 512))
Reprojecting
Reproject SKI to a different projection or change its origin, correctly updating geo-referencing metadata along the way:
from sk.tools.image.reprojection import reproject_ski
# reproject SKI to Web Mercator projection
new_ski = reproject_ski(ski, 3857, target_origin, target_pixel_size,
(512, 512), pad_width)
Note that this function can be used to crop SKI whose bands do not have equal shape.
Cropping
Following example illustrates how to crop SKI given a row, column, height and width of desired cropped SKI:
from sk.tools.image.cropping import crop_ski
# crop SKI with shape (256, 256) starting at 512 col and row
new_ski = crop_ski(ski, 512, 512, 256, 256)
Note that input SKI needs to have bands with the same shape, see scaling.
Binary On-Disk Representation
SKI is gzipped tar archive with band files, info and meta JSON files.
Bands
Bands are stored into separate files with names xxxxx.skb
where x is index
of the band (e.g. 00000.skb
). First two bytes of the file are an unsigned
integer whose value map to data type according to following table:
value |
dtype |
---|---|
2 |
binarized |
8 |
uint8 |
9 |
int8 |
16 |
uint16 |
17 |
int16 |
32 |
uint32 |
33 |
int32 |
34 |
float32 |
64 |
uint64 |
65 |
int64 |
66 |
float64 |
67 |
stretched float |
binarized
dtype represents uint8 np.dtype
where all values are either 0
or 1. stretched float
is saved as uint16 np.dtype
but when loaded it is
represented by float32 np.dtype
.
float64
dtype is now deprecated but there is backwards compatibility to load legacy SKIs.
Next eight bytes are little endian floats. They represent value range
which is used only for
stretched float
, for all other dtypes it is set to (0, 0)
. For stretched float
value range
represents what the uint16 array should stretch into. For most
current use cases it is set to (0, 1)
.
Next four bytes represent a number of columns and rows. These bytes follow little endian format. The rest of the file is row-major ordered data of the band.
Pixel values could be reconstructed with the following formula:
\(p_{r, c} = (p_{r - 1, c} + b_{r, c}) \mod 2^k\),
where \(p_{r, c}\) is the pixel value at the rth row and cth column, \(b_{r, c}\) is the value from a stored matrix at the rth row and cth column and k is the bit depth of the band (maximum value + 1). -1th row is defined as a row full of 0s, therefore, the encoded values of the first row are equal to real pixel values.
When the binary is constructed and inverse formula is used.
The above method of storing pixel values is only used for bands that use any integer as underlying data type including stretched float, the only exception is binarized dtype. For bands with floating point underlying data type and binarized dtype pixel values are stored directly.
The following is a hexadecimal example of a band containing uint8
pixels,
with value range (0, 0), one column and two rows. Value in the first row is 250
and 200 in the second.
08 00 00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 FA C8
Info
Every SKI contains info.json
file which is UTF-8 encoded JSON serialized
data with information about the SKI. It contains a list of band names (band can
have multiple names), SKI version and SKI type (imagery, analysis).
Info file has this format:
{
"bands": [
{
"names": ["{band-name}"]
}
],
"version": "{SKI version}",
"skiType" "{type of SKI}"
}
Example of an RGB image:
{
"bands": [
{
"names": ["red"]
},
{
"names": ["green"]
},
{
"names": ["blue"]
}
],
"version": "200",
"skiType" "imagery"
}
Meta
An optional file meta.json
may be present in the SKI archive. This file
contains arbitrary JSON data set by the creator of the SKI. The file is UTF-8
encoded JSON serialized data.
Meta always contain scene metadata in SKIs produced by Ragnar API. Geo-referencing sub-set of the metadata (i.e. EPSG code, pixel size and CRS origin) is usually present in algorithm output (i.e. mask) SKIs.
Auxiliary Files
An arbitrary set of auxiliary files may be present in the aux/
sub-folder
of an SKI archive. This is a universal way to transfer additional data
from SKI producers to consumers.
No assumptions are made on the content of these files, it can be e.g. JSON, XML, binary file etc.
SKI Band Masks
Each band in an SKI has a corresponding data mask stored in separate
file named __MASK__{band_name}__
(e.g. __MASK__red__
). The
first two bytes of the file are an unsigned integer of value 3 which
states that the file is a mask. Next eight bytes are little endian
floats. They represent value range
which is set to (0, 0)
.
Next four bytes represent a number of columns and rows. These bytes
follow little endian format. The rest of the file is row-major ordered
8 bit unsigned integers (uint8) where each bit has a different meaning.
Bit 0b00000001
Value “1” corresponds to a valid pixel, other values indicate blackfill, lost, suspect or corrupt pixel. In the context of an algorithmic results, value “1” is set for pixels where the algorithm couldn’t perform adequately.
Bit 0b00000010
Value “1” corresponds to a pixel inside any area-based geometry from requested extent.
Bit 0b00000100
Value “1” corresponds to a lost, suspect or otherwise corrupt pixel. Value “1” implies pixel invalidity (least significant bit must be “0”).
This bit is available only in Planet imagery and is always set to “0” for other providers.