What is a Datablock Schema descriptor?

Apiro processes units of data via a custom pipeline. It can validate and process it in numerous ways. This suggests the data has some form of consistency.

The Apiro DataBlock Schema (or Schema for short) enforces this type of data consistency for a category of data. The name itself lends from the nomenclature of databases intentionally.

An Apiro installation can be configured with an essentially unlimited number of Schemas, and a schema categorises a subset of ingested data and specifies the format of the data that is associated with that schema.

A DataBlock schema is roughly equivalent to a table descriptor in a relational database.

A Datablock is composed of DataPoints.

DataPoints are analogous to individual columns in a relational table. Apiro supports DataPoint types of strings, integers, decimal numbers, double precision floating point numbers, binary large objects, xml and json. The DataBlock schema specifies the datapoints that comprise a datablock, whether they are optional or nullable, and arbitrary rules on the values of the datapoints individually, and their potential relationships between each other. In addition, a schema can derive datapoint values from other datapoints (such as the midpoint between bid and offer price), and finally the schema specifies potential actions that can be made as the data changes. These processing actions may make changes internally in Apiro itself, or they may potentially perfoem side effects to external systems. All of this is wrapped up into an Apiro DataBlock Schema descriptor.

A consequence of this is that since Apiro categorises data into Schema and processes it differently depending on the individual Schema descriptor, all data ingested into Apiro must fundamentally nominate what Schema the data is associated with. And we shall see that a mandatory field for a data feed (the domain entity that ingests data into a schema to kick off its processing through the pipeline) is indeed the Schema the data is being ingested into.

Apiro tries to walk a pragmatic line between excessively strict Schema defintion on the one hand (leading to difficuly in managing inevitable change and enhancement) and excessively loose Schema definitions on the other (leading to insufficient structure and the problems of attempting to process insufficiently scoped data).

TODO: Provide detailed configuration guide