Skip to content

Section 8 - Create Consolidated data processors using IntelliJ

Go back to Getting started guide

In this section we will:

Processing Pipeline

Apply a HASH MASKING algorithm on the TFN Data point value and store the value in a new field called TFN_HASHED

HASH MASK TFN Data Point Value

  1. In this example we will mask the value of the TFN data point but we will not modify the TFN value directly.
  2. Instead, we will create a derived data point called TFN_MASKED and stored the masked value there.
  3. This will help us to also demonstate the concept of derived data points. Obviously, in a production environment we will be masking the value itself and not creating a derived data point where masking is required.
  4. Before proceeding please briefly read over the concepts in Schema context guide and also complete Section 5 Data Consolidators.
  5. If you completed all previous sections, the current state of the consolidated data is illustrated by the table below:

    BAC FIRST_NAME LAST_NAME AGE TFN YEARLY_INCOME PORTFOLIO_VALUE
    BAC111111 Tom JONES 22 111 111 111 89000 97 800
    BAC222222 Bob SMITH 35 222 222 222 99000 82 000
    BAC333333 ROGERS 54 333 333 333 125000 1 000 000
  6. We will create a new Data Point called TFN_MASKED. All we have to do is add the TFN_MASKED data point to the CUSTOMER schema.

  7. If you were not able to complete the previous section you could copy the configuration below and paste it into SCHEMA_CUSTOMER.xml to continue with this section.

      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
        <?xml version="1.0" encoding="UTF-8"?>
    
        <apiroConf version="1" xmlns="http://apiro.com/apiro/v1/root">
            <groups/>
            <loadOrder>15</loadOrder>
            <schemas>
                <schema defBacked="false" historical="false" name="CUSTOMER">
                    <groupTags>
                        <groupTag>EXAMPLES</groupTag>
                    </groupTags>
                    <metaData/>
                    <identityKeys>
                        <identityKey>BAC</identityKey>
                    </identityKeys>
    
                    <!-- Data Point descriptions -->
                    <dataPoints>
                        <dataPoint name="BAC"
                                   dataType="STRING"
                                   canEditValid="true"
                                   canEditViolated="true"
                                   displayName="BAC">
                            <nullable>false</nullable>
    
                            <metaData>
                                <item name="piiClassification">
                                    <simpleValues>
                                        <simpleValue>High Risk</simpleValue>
                                    </simpleValues>
                                </item>
                            </metaData>
    
                            <!-- BAC data point processors -->
                            <rawDPValidators/>
                            <rawDPProcessors/>
                            <!--consolidationAlgorithm></consolidationAlgorithm -->
                            <consDPValidators/>
                            <consDPProcessors/>
                        </dataPoint>
    
                        <dataPoint name="FIRST_NAME"
                                   dataType="STRING"
                                   displayName="First Name"
                                   canEditValid="true"
                                   canEditViolated="true">
                            <rawDPValidators>
                                <rawDPValidator name="IN_BAC_SET_CHECK " entity="IN_SET">
                                    <config>
                                        <![CDATA[
                                    {
                                        ignoreCase : true,
                                        options : [ "Tom", "Bob"]
                                    }
                                ]]>
                                    </config>
                                </rawDPValidator>
                            </rawDPValidators>
    
                            <consDPValidators>
                                <consDPValidator name="INVALID_IF_CONSOLIDATED_NULL" entity="NOT_NULL"/> 
                            </consDPValidators>
                        </dataPoint>
    
                        <dataPoint name="LAST_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="LAST NAME">
                            <rawDPProcessors>
                                <rawDPProcessor name="CAPITALISE_LAST_NAME_RAW_PROC" entity="GEN_EXPRESS">
                                    <config>
                                        <![CDATA[
                                            #GRV{
                                                CTX['.'] = CTX['.'].toUpperCase()
                                            }
                                        ]]>
                                    </config>
    
                                </rawDPProcessor>
                            </rawDPProcessors>
                        </dataPoint>
    
                        <dataPoint name="ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="ADDRESS"/>
                        <dataPoint name="PHONE_NUMBER" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PHONE NUMBER"/>
                        <dataPoint  name="AGE" dataType="INTEGER" canEditValid="true" canEditViolated="true" displayName="Age">
                            <rawDPValidators>
                                <rawDPValidator name="INVALID_IF_NULL" entity="NOT_NULL"/> // The name can be anything and it will appear in data audit/lineage
                                <rawDPValidator name="INVALID_IF_NEGATIVE" entity="POSITIVE">
                                    <lateBound>false</lateBound> // This is the default value if one is not specified
                                </rawDPValidator>
                            </rawDPValidators>
                        </dataPoint>
                        <dataPoint name="YEARLY_INCOME" canEditValid="false" canEditViolated="true" dataType="DECIMAL" displayName="YEARLY INCOME"/>
                        <dataPoint name="TFN" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="TFN"/>
    
                        <dataPoint name="PORTFOLIO_VALUE"
                                   displayName="Investment Portfolio Value"
                                   dataType="DECIMAL"
                                   canEditValid="false"
                                   canEditViolated="true" >
    
                                <consolidationAlgorithm name="PORTF_VALUE_WEIGHTED_MEAN_01" entity="GEN_EXPRESS">
                                    <config>
                                        <![CDATA[
                                            #GRV{
                                                def list= []
    
                                                list.add(items.get("CUSTOMERS_A_XLSX"))
                                                list.add(items.get("CUSTOMERS_B_XLSX"))
                                                list.remove(null)
    
                                                if(list.size()==0)
                                                    return 0;
                                                else if (list.size() == 1)
                                                    return list[0]
                                                else {
                                                    return (list[0].asDBL()*0.8 + list[1].asDBL()*0.2)
                                                }
                                            }
                                            ]]>
                                    </config>
                                </consolidationAlgorithm>
                        </dataPoint>
    
                        <dataPoint name="COMPANY_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY NAME"/>
                        <dataPoint name="COMPANY_ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY ADDRESS"/>
                        <dataPoint name="PROFILE_IMAGE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PROFILE_IMAGE"/>
                        <dataPoint name="COMPANY_WEBSITE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY WEBSITE"/>
                        <dataPoint name="XML_ROOT_DOC"  canEditValid="false" canEditViolated="true"  displayName="XML Root Doc" dataType="XML"/>
                        <dataPoint name="JSON_ROOT_DOC"  canEditValid="false" canEditViolated="true"  displayName="JSON Root Doc" dataType="JSON"/>
                    </dataPoints>
                    <schemaAppliedProcessors>
                        <groupTags>
                            <groupTag>DEFAULT</groupTag>
                        </groupTags>
                        <metaData/>
                        <rawDPValidators/>
                        <rawDPProcessors/>
                        <consDPValidators/>
                        <consDPProcessors/>
                        <dataBlockProcessors/>
                    </schemaAppliedProcessors>
                    <alerts/>
                </schema>
            </schemas>
        </apiroConf>
    

  8. You do not have to modify feeds FEED_CUSTOMERS_A_XLSX or FEED_CUSTOMERS_B_XLSX because the value of the data point TFN_MASKED is not sourced or cleansed.

    1
    2
        <dataPoint name="TFN_MASKED"  displayName="Tax File Number Masked" dataType="STRING" >
        </dataPoint>
    
  9. Now that we have a the new data point TFN_MASKED we can derive its value by using a the predefined HASH_MASK consolidated data processor as shown below.
  10. Note: The maskingSalt in the example below is hard coded. It must never be hard coded in production.

  11. Please refer to Authentication Manager guide for more details. For Staging and Production environments you must be using on the out-of-the-box Authentication Manager. For example:

    • AWS KMS {WS_SM1:TFN_MASKING_SALT}
    • Azure KeyVault {CYBERARK_SM1:TFN_MASKING_SALT}
    • Cyberark {WS_SM1:TFN_MASKING_SALT}
    • or at the very least System properties ${SYS:TFN_MASKING_SALT}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    <dataPoint name="TFN_MASKED" displayName="Tax File Number Masked" dataType="STRING">
         <consDPProcessors>
            <consDPProcessor name="TFN_HASH_MASKING" entity="HASH_MASK">
                <config>
                    <![CDATA[
                        {
                            "inputValue":"#GRV{ CTX['TFN'] }",
                            "maskingSalt":"aqQwSxXcfgdejhbJhdygjyfdghjHGYYIdh!66gydshasGY!"
                        }
                    ]]>
                </config>
            </consDPProcessor>
        </consDPProcessors>
    </dataPoint>
  1. Now the resulting data will look like this

    BAC FIRST_NAME LAST_NAME AGE TFN TFN_MASKED YEARLY_INCOME PORTFOLIO_VALUE
    BAC111111 Tom JONES 22 111 111 111 X2YzjC+rE5EVDjp1e9vLcOBt37237RGMW0NDW4OQ 89000 97 800
    BAC222222 Bob SMITH 35 222 222 222 WamDXMhaRFqRZx586gum9sH1sVMg9ZjE0DSZA5C6 99000 82 000
    BAC333333 ROGERS 54 333 333 333 ixbm0jsyzFUqzcvMDXSjauzO+Ok9RzdUCqAf6Sqa 125000 1 000 000

tfn_masked

Configuration files

Completed configuration files
  • This is the completed CUSTOMER schema configuration file that derives a new data point TFN_MASKED from TFN. Notice how simple and quick it was to mask TFN values in a single configuration using the existing pre wired pipelines, audit and data lineage features.
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
        <?xml version="1.0" encoding="UTF-8"?>
    
        <apiroConf version="1" xmlns="http://apiro.com/apiro/v1/root">
            <groups/>
            <loadOrder>15</loadOrder>
            <schemas>
                <schema defBacked="false" historical="false" name="CUSTOMER">
                    <groupTags>
                        <groupTag>EXAMPLES</groupTag>
                    </groupTags>
                    <metaData/>
                    <identityKeys>
                        <identityKey>BAC</identityKey>
                    </identityKeys>
    
                    <!-- Data Point descriptions -->
                    <dataPoints>
                        <dataPoint name="BAC"
                                   dataType="STRING"
                                   canEditValid="true"
                                   canEditViolated="true"
                                   displayName="BAC">
                            <nullable>false</nullable>
    
                            <metaData>
                                <item name="piiClassification">
                                    <simpleValues>
                                        <simpleValue>High Risk</simpleValue>
                                    </simpleValues>
                                </item>
                            </metaData>
    
                            <!-- BAC data point processors -->
                            <rawDPValidators/>
                            <rawDPProcessors/>
                            <!--consolidationAlgorithm></consolidationAlgorithm -->
                            <consDPValidators/>
                            <consDPProcessors/>
                        </dataPoint>
    
                        <dataPoint name="FIRST_NAME"
                                   dataType="STRING"
                                   displayName="First Name"
                                   canEditValid="true"
                                   canEditViolated="true">
                            <rawDPValidators>
                                <rawDPValidator name="IN_BAC_SET_CHECK " entity="IN_SET">
                                    <config>
                                        <![CDATA[
                                        {
                                            ignoreCase : true,
                                            options : [ "Tom", "Bob"]
                                        }
                                    ]]>
                                    </config>
                                </rawDPValidator>
                            </rawDPValidators>
    
                            <consDPValidators>
                                <consDPValidator name="INVALID_IF_CONSOLIDATED_NULL" entity="NOT_NULL"/>
                            </consDPValidators>
                        </dataPoint>
    
                        <dataPoint name="LAST_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="LAST NAME">
                            <rawDPProcessors>
                                <rawDPProcessor name="CAPITALISE_LAST_NAME_RAW_PROC" entity="GEN_EXPRESS">
                                    <config>
                                        <![CDATA[
                                                #GRV{
                                                    CTX['.'] = CTX['.'].toUpperCase()
                                                }
                                            ]]>
                                    </config>
    
                                </rawDPProcessor>
                            </rawDPProcessors>
                        </dataPoint>
    
                        <dataPoint name="ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="ADDRESS"/>
                        <dataPoint name="PHONE_NUMBER" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PHONE NUMBER"/>
                        <dataPoint  name="AGE" dataType="INTEGER" canEditValid="true" canEditViolated="true" displayName="Age">
                            <rawDPValidators>
                                <rawDPValidator name="INVALID_IF_NULL" entity="NOT_NULL"/> // The name can be anything and it will appear in data audit/lineage
                                <rawDPValidator name="INVALID_IF_NEGATIVE" entity="POSITIVE">
                                    <lateBound>false</lateBound> // This is the default value if one is not specified
                                </rawDPValidator>
                            </rawDPValidators>
                        </dataPoint>
                        <dataPoint name="YEARLY_INCOME" canEditValid="false" canEditViolated="true" dataType="DECIMAL" displayName="YEARLY INCOME"/>
                        <dataPoint name="TFN" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="TFN"/>
    
                        <dataPoint name="TFN_MASKED" displayName="Tax File Number Masked" dataType="STRING">
                            <consDPProcessors>
                                <consDPProcessor name="TFN_HASH_MASKING" entity="HASH_MASK">
                                    <config>
                                        <![CDATA[
                                {
                                    "inputValue":"#GRV{ CTX['TFN'] }",
                                    "maskingSalt":"aqQwSxXcfgdejhbJhdygjyfdghjHGYYIdh!66gydshasGY!"
                                }
                            ]]>
                                    </config>
                                </consDPProcessor>
                            </consDPProcessors>
                        </dataPoint>
    
                        <dataPoint name="PORTFOLIO_VALUE"
                                   displayName="Investment Portfolio Value"
                                   dataType="DECIMAL"
                                   canEditValid="false"
                                   canEditViolated="true" >
    
                            <consolidationAlgorithm name="PORTF_VALUE_WEIGHTED_MEAN_01" entity="GEN_EXPRESS">
                                <config>
                                    <![CDATA[
                                                #GRV{
                                                    def list= []
    
                                                    list.add(items.get("CUSTOMERS_A_XLSX"))
                                                    list.add(items.get("CUSTOMERS_B_XLSX"))
                                                    list.remove(null)
    
                                                    if(list.size()==0)
                                                        return 0;
                                                    else if (list.size() == 1)
                                                        return list[0]
                                                    else {
                                                        return (list[0].asDBL()*0.8 + list[1].asDBL()*0.2)
                                                    }
                                                }
                                                ]]>
                                </config>
                            </consolidationAlgorithm>
                        </dataPoint>
    
                        <dataPoint name="COMPANY_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY NAME"/>
                        <dataPoint name="COMPANY_ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY ADDRESS"/>
                        <dataPoint name="PROFILE_IMAGE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PROFILE_IMAGE"/>
                        <dataPoint name="COMPANY_WEBSITE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY WEBSITE"/>
                        <dataPoint name="XML_ROOT_DOC"  canEditValid="false" canEditViolated="true"  displayName="XML Root Doc" dataType="XML"/>
                        <dataPoint name="JSON_ROOT_DOC"  canEditValid="false" canEditViolated="true"  displayName="JSON Root Doc" dataType="JSON"/>
                    </dataPoints>
                    <schemaAppliedProcessors>
                        <groupTags>
                            <groupTag>DEFAULT</groupTag>
                        </groupTags>
                        <metaData/>
                        <rawDPValidators/>
                        <rawDPProcessors/>
                        <consDPValidators/>
                        <consDPProcessors/>
                        <dataBlockProcessors/>
                    </schemaAppliedProcessors>
                    <alerts/>
                </schema>
            </schemas>
        </apiroConf>
    
Deploy config files
  • Follow these steps Config Deployment to deploy and start using your configuration files.