Section 13 - Real time Datablock processors

In this section we will:

show how to create a Data Block processor to distribute data in real time.

please ensure we completed all previous sections before proceeding

Description
Config Reference	Data Sinks Email Attach - Data Sink Datablock Collections Datablock collections
Artifacts	DATA_SINK_EMAIL_ATTACH_TEMPLATE.xml DISTRIBUTION_TEMPLATE.xml
Generated files	DISTR_CUSTOMER_EMAIL_ATTACH_CSV_1AM.xml
Required prerequisites	Section 12 Distirbutions

Processing Pipeline

Ready for real time processing?

If you completed all previous sections you would have, implemented data cleansers, data processors and data validators validators as illustrated below.
We also implemented Datablock collections, Data Sinks and scheduled Distributions.
This means that all the cleansed, validated and enriched data will be distributed in a batch manner.
Note: Our current implementation distributed collections of data not single records.
In this section, we will show how to publish each record as it becomes ready, in real time.
In order to implement real time processing we need to update the CUSTOMER schema.

All we have to do is to add the following Data Block processor as showin below

        <schemaAppliedProcessors>
            <dataBlockProcessors>
                <dataBlockProcessor name="PUBLISH_CUSTOMER_DATA" entity="ADHOC_JSON_SINK">
                    <config>
                        <![CDATA[
                            {
                                "jsonPayload": {
                                    "bac": "#GRV{ CTX['BAC'] }",
                                    "first_name": "#GRV{ CTX['FIRST_NAME'] }",
                                    "last_name": "#GRV{ CTX['LAST_NAME'] }",
                                    "address": "#GRV{ CTX['ADDRESS'] }",
                                    "phone_number": "#GRV{ CTX['PHONE_NUMBER'] }",
                                    "age": "#GRV{ CTX['AGE'] }",
                                    "yearly_income": "#GRV{ CTX['YEARLY_INCOME'] }",
                                    "tfn": "#GRV{ CTX['TFN'] }",
                                    "portfolio_value": "#GRV{ CTX['PORTFOLIO_VALUE'] }"
                                },
                                "dataSink": {
                                    "name": "adhoc",
                                    "entity": "HAZELCAST_CLUSTER_QUEUE",
                                    "config" : {
                                        "queueName": "CUSTOMER_DATA"
                                    }
                                }
                            }
                    ]]>
                    </config>
                </dataBlockProcessor>
            </dataBlockProcessors>
        </schemaAppliedProcessors>

This resulting file is shown below in the next section below
Congradulations! You have just implemented real time processing by publishing into a Hazelcast queue every time a record becomes available.
Note: Please note that the Hazelcast queue needs to be setup manually before you can start sending data to it. This is out of the scope of this section.

Configuration files

Completed configuration files

This is the updated SCEHMA_CUSTOMER.xml

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <apiroConf version="1" xmlns="http://apiro.com/apiro/v1/root">
        <loadOrder>15</loadOrder>
        <schemas>
            <schema defBacked="false" historical="false" name="CUSTOMER">
                <identityKeys>
                    <identityKey>BAC</identityKey>
                </identityKeys>
                <dataPoints>
                    <dataPoint name="BAC" displayName="Bank Account Code" dataType="STRING" canEditValid="false" canEditViolated="false">
                        <nullable>false</nullable>
                        <rawDPValidators>
                            <rawDPValidator name="IN_BAC_SET_CHECK " entity="IN_SET">
                                <config>
                                    <![CDATA[
                                        {
                                            ignoreCase : "true",
                                            options : [ "AAAA1111", "BBBB2222", "CCCC2222" ]
                                        }
                                    ]]>
                                </config>
                            </rawDPValidator>
                        </rawDPValidators>
                        <rawDPProcessors/>
                        <consolidationAlgorithm/>
                        <consDPValidators/>
                        <consDPProcessors/>
                    </dataPoint>
                    <dataPoint displayName="First Name" name="FIRST_NAME" dataType="STRING" >
                        <rawDPProcessors>
                            <rawDPProcessor name="CAPITALISE_LAST_NAME_RAW_PROC" entity="GEN_EXPRESS">
                                <config>
                                    <![CDATA[
                                        #GRV{
                                                CTX['.'] = CTX['.'].toUpperCase()
                                            }
                                    ]]>
                                </config>

                            </rawDPProcessor>
                        </rawDPProcessors>
                    </dataPoint>
                    <dataPoint displayName="Last Name" name="LAST_NAME" dataType="STRING" >
                    </dataPoint>
                    <dataPoint displayName="Address" name="ADDRESS" dataType="STRING" >
                    </dataPoint>
                    <dataPoint displayName="Phone Number" name="PHONE_NUMBER" dataType="STRING" >
                    </dataPoint>
                    <dataPoint displayName="Age" name="AGE" dataType="INTEGER" >
                        <rawDPValidators>
                            <rawDPValidator name="INVALID_IF_NULL" entity="NOT_NULL"/>

                            <rawDPValidator name="INVALID_IF_NEGATIVE" entity="POSITIVE">
                                <lateBound>false</lateBound>
                            </rawDPValidator>
                        </rawDPValidators>
                    </dataPoint>
                    <dataPoint displayName="Yearly Income" name="YEARLY_INCOME" dataType="DECIMAL" >
                    </dataPoint>
                    <dataPoint displayName="Tax File Number" name="TFN" dataType="STRING" >
                    </dataPoint>
                    <dataPoint displayName="Tax File Number Masked" name="TFN_MASKED" dataType="STRING" >
                        <consDPProcessors>
                            <consDPProcessor name="TFN_HASH_MASKING" entity="HASH_MASK">
                                <config>
                                    <![CDATA[
                                {
                                    "inputValue":"CTX['TFN']",
                                    "maskingSalt":"aqQwSxXcfgdejhbJhdygjyfdghjHGYYIdh!66gydshasGY!"
                                }
                            ]]>
                                </config>
                            </consDPProcessor>
                        </consDPProcessors>
                    </dataPoint>
                    <dataPoint displayName="Investment Portfolio Value" name="PORTFOLIO_VALUE" dataType="DECIMAL" >
                        <rawDPValidators>
                            <rawDPValidator name="CHECK_HIGH_PORTFOLIO_VALUE" entity="GEN_EXPRESS">
                                <config>
                                    <![CDATA[
                                        #GRV{// CTX["."] refers to the current DP, eg.PORTFOLIO_VALUE. eg. CTX[AGE] would refer to the AGE data point
                                            if((CTX["."]>100000000)){ // If this condition is met
                                                return true;          // A violation with the name `CHECK_HIGH_PORTFOLIO_VALUE` is raised
                                            }
                                            return false;
                                        }
                                    ]]>
                                </config>
                            </rawDPValidator>
                        </rawDPValidators>
                        <consolidationAlgorithm name="PORTF_VALUE_WEIGHTED_MEAN_01" entity="GEN_EXPRESS">
                            <config>
                                <![CDATA[
                                            #GRV{
                                                return (
                                                (items.get(CUSTOMERS_A_XLSX) * 0.8) +
                                                (items.get(CUSTOMERS_A_XLSX) * 0.2))/2
                                            ]]>
                            </config>
                        </consolidationAlgorithm>

                        <consDPValidators>
                            <consDPValidator name="PORTF_VALUE_HISTORICAL_VALIDATOR" entity="HISTORICAL_SHIFT">
                                <config>
                                    <![CDATA[
                                        {
                                            "priorValues" : 5,
                                            "percent" : 5.2,
                                            "comparisonMore" : true
                                        }
                                    ]]>
                                </config>
                            </consDPValidator>
                        </consDPValidators>
                    </dataPoint>
                    <dataPoint displayName="Company Name" name="COMPANY_NAME" dataType="STRING" >
                    </dataPoint>
                    <dataPoint displayName="Company Address" name="COMPANY_ADDRESS" dataType="STRING" >
                    </dataPoint>
                    <dataPoint displayName="XML Root Doc" name="xmlRootDoc" dataType="XML" >
                    </dataPoint>
                    <dataPoint displayName="JSON Root Doc" name="jsonRootDoc" dataType="JSON" >
                    </dataPoint>
                    <dataPoint displayName="Profile Image" name="PROFILE_IMAGE" dataType="BLOB" >
                    </dataPoint>
                </dataPoints>

                <schemaAppliedProcessors>
                    <dataBlockProcessors>>
                        <dataBlockProcessor name="PUBLISH_CUSTOMER_DATA" entity="ADHOC_JSON_SINK">
                            <config>
                                <![CDATA[
                                    {
                                        "jsonPayload": {
                                            "bac": "#GRV{ CTX['BAC'] }",
                                            "first_name": "#GRV{ CTX['FIRST_NAME'] }",
                                            "last_name": "#GRV{ CTX['LAST_NAME'] }",
                                            "address": "#GRV{ CTX['ADDRESS'] }",
                                            "phone_number": "#GRV{ CTX['PHONE_NUMBER'] }",
                                            "age": "#GRV{ CTX['AGE'] }",
                                            "yearly_income": "#GRV{ CTX['YEARLY_INCOME'] }",
                                            "tfn": "#GRV{ CTX['TFN'] }",
                                            "portfolio_value": "#GRV{ CTX['PORTFOLIO_VALUE'] }"
                                        },
                                        "dataSink": {
                                            "name": "adhoc",
                                            "entity": "HAZELCAST_CLUSTER_QUEUE",
                                            "config" : {
                                                "queueName": "CUSTOMER_DATA"
                                            }
                                        }
                                    }
                            ]]>
                            </config>
                        </dataBlockProcessor>
                    </dataBlockProcessors>
                </schemaAppliedProcessors>

            </schema>
        </schemas>
    </apiroConf>

Deploy config files

Follow these steps Config Deployment to deploy and start using your configuration files.