Section 8 - Create Consolidated data processors using IntelliJ
Go back to Getting started guide
In this section we will:
- show how to create a Consolidated Data Processor.
-
show how to create a derived data point and update its value during data processing.
Description Config Reference Artifacts Required prerequisites
Processing Pipeline
Apply a HASH MASKING
algorithm on the TFN Data point value and store the value in a new field called TFN_HASHED
- In this example we will mask the value of the TFN data point but we will not modify the TFN value directly.
- Instead, we will create a derived data point called
TFN_MASKED
and stored the masked value there. - This will help us to also demonstate the concept of derived data points. Obviously, in a production environment we will be masking the value itself and not creating a derived data point where masking is required.
- Before proceeding please briefly read over the concepts in Schema context guide and also complete Section 5 Data Consolidators.
-
If you completed all previous sections, the current state of the consolidated data is illustrated by the table below:
BAC FIRST_NAME LAST_NAME AGE TFN YEARLY_INCOME PORTFOLIO_VALUE BAC111111 Tom JONES 22 111 111 111 89000 97 800 BAC222222 Bob SMITH 35 222 222 222 99000 82 000 BAC333333 ROGERS 54 333 333 333 125000 1 000 000 -
We will create a new Data Point called
TFN_MASKED
. All we have to do is add theTFN_MASKED
data point to theCUSTOMER
schema. -
If you were not able to complete the previous section you could copy the configuration below and paste it into
SCHEMA_CUSTOMER.xml
to continue with this section.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
<?xml version="1.0" encoding="UTF-8"?> <apiroConf version="1" xmlns="http://apiro.com/apiro/v1/root"> <groups/> <loadOrder>15</loadOrder> <schemas> <schema defBacked="false" historical="false" name="CUSTOMER"> <groupTags> <groupTag>EXAMPLES</groupTag> </groupTags> <metaData/> <identityKeys> <identityKey>BAC</identityKey> </identityKeys> <!-- Data Point descriptions --> <dataPoints> <dataPoint name="BAC" dataType="STRING" canEditValid="true" canEditViolated="true" displayName="BAC"> <nullable>false</nullable> <metaData> <item name="piiClassification"> <simpleValues> <simpleValue>High Risk</simpleValue> </simpleValues> </item> </metaData> <!-- BAC data point processors --> <rawDPValidators/> <rawDPProcessors/> <!--consolidationAlgorithm></consolidationAlgorithm --> <consDPValidators/> <consDPProcessors/> </dataPoint> <dataPoint name="FIRST_NAME" dataType="STRING" displayName="First Name" canEditValid="true" canEditViolated="true"> <rawDPValidators> <rawDPValidator name="IN_BAC_SET_CHECK " entity="IN_SET"> <config> <![CDATA[ { ignoreCase : true, options : [ "Tom", "Bob"] } ]]> </config> </rawDPValidator> </rawDPValidators> <consDPValidators> <consDPValidator name="INVALID_IF_CONSOLIDATED_NULL" entity="NOT_NULL"/> </consDPValidators> </dataPoint> <dataPoint name="LAST_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="LAST NAME"> <rawDPProcessors> <rawDPProcessor name="CAPITALISE_LAST_NAME_RAW_PROC" entity="GEN_EXPRESS"> <config> <![CDATA[ #GRV{ CTX['.'] = CTX['.'].toUpperCase() } ]]> </config> </rawDPProcessor> </rawDPProcessors> </dataPoint> <dataPoint name="ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="ADDRESS"/> <dataPoint name="PHONE_NUMBER" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PHONE NUMBER"/> <dataPoint name="AGE" dataType="INTEGER" canEditValid="true" canEditViolated="true" displayName="Age"> <rawDPValidators> <rawDPValidator name="INVALID_IF_NULL" entity="NOT_NULL"/> // The name can be anything and it will appear in data audit/lineage <rawDPValidator name="INVALID_IF_NEGATIVE" entity="POSITIVE"> <lateBound>false</lateBound> // This is the default value if one is not specified </rawDPValidator> </rawDPValidators> </dataPoint> <dataPoint name="YEARLY_INCOME" canEditValid="false" canEditViolated="true" dataType="DECIMAL" displayName="YEARLY INCOME"/> <dataPoint name="TFN" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="TFN"/> <dataPoint name="PORTFOLIO_VALUE" displayName="Investment Portfolio Value" dataType="DECIMAL" canEditValid="false" canEditViolated="true" > <consolidationAlgorithm name="PORTF_VALUE_WEIGHTED_MEAN_01" entity="GEN_EXPRESS"> <config> <![CDATA[ #GRV{ def list= [] list.add(items.get("CUSTOMERS_A_XLSX")) list.add(items.get("CUSTOMERS_B_XLSX")) list.remove(null) if(list.size()==0) return 0; else if (list.size() == 1) return list[0] else { return (list[0].asDBL()*0.8 + list[1].asDBL()*0.2) } } ]]> </config> </consolidationAlgorithm> </dataPoint> <dataPoint name="COMPANY_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY NAME"/> <dataPoint name="COMPANY_ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY ADDRESS"/> <dataPoint name="PROFILE_IMAGE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PROFILE_IMAGE"/> <dataPoint name="COMPANY_WEBSITE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY WEBSITE"/> <dataPoint name="XML_ROOT_DOC" canEditValid="false" canEditViolated="true" displayName="XML Root Doc" dataType="XML"/> <dataPoint name="JSON_ROOT_DOC" canEditValid="false" canEditViolated="true" displayName="JSON Root Doc" dataType="JSON"/> </dataPoints> <schemaAppliedProcessors> <groupTags> <groupTag>DEFAULT</groupTag> </groupTags> <metaData/> <rawDPValidators/> <rawDPProcessors/> <consDPValidators/> <consDPProcessors/> <dataBlockProcessors/> </schemaAppliedProcessors> <alerts/> </schema> </schemas> </apiroConf>
-
You do not have to modify feeds
FEED_CUSTOMERS_A_XLSX
orFEED_CUSTOMERS_B_XLSX
because the value of the data pointTFN_MASKED
is not sourced or cleansed.1 2
<dataPoint name="TFN_MASKED" displayName="Tax File Number Masked" dataType="STRING" > </dataPoint>
- Now that we have a the new data point
TFN_MASKED
we can derive its value by using a the predefinedHASH_MASK
consolidated data processor as shown below. -
Note: The
maskingSalt
in the example below is hard coded. It must never be hard coded in production. -
Please refer to Authentication Manager guide for more details. For
Staging
andProduction
environments you must be using on the out-of-the-boxAuthentication Manager
. For example:- AWS KMS
{WS_SM1:TFN_MASKING_SALT}
- Azure KeyVault
{CYBERARK_SM1:TFN_MASKING_SALT}
- Cyberark
{WS_SM1:TFN_MASKING_SALT}
- or at the very least System properties
${SYS:TFN_MASKING_SALT}
- AWS KMS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
-
Now the resulting data will look like this
BAC FIRST_NAME LAST_NAME AGE TFN TFN_MASKED YEARLY_INCOME PORTFOLIO_VALUE BAC111111 Tom JONES 22 111 111 111 X2YzjC+rE5EVDjp1e9vLcOBt37237RGMW0NDW4OQ 89000 97 800 BAC222222 Bob SMITH 35 222 222 222 WamDXMhaRFqRZx586gum9sH1sVMg9ZjE0DSZA5C6 99000 82 000 BAC333333 ROGERS 54 333 333 333 ixbm0jsyzFUqzcvMDXSjauzO+Ok9RzdUCqAf6Sqa 125000 1 000 000
Configuration files
Completed configuration files
- This is the completed
CUSTOMER
schema configuration file that derives a new data pointTFN_MASKED
fromTFN
. Notice how simple and quick it was to mask TFN values in a single configuration using the existing pre wired pipelines, audit and data lineage features.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157
<?xml version="1.0" encoding="UTF-8"?> <apiroConf version="1" xmlns="http://apiro.com/apiro/v1/root"> <groups/> <loadOrder>15</loadOrder> <schemas> <schema defBacked="false" historical="false" name="CUSTOMER"> <groupTags> <groupTag>EXAMPLES</groupTag> </groupTags> <metaData/> <identityKeys> <identityKey>BAC</identityKey> </identityKeys> <!-- Data Point descriptions --> <dataPoints> <dataPoint name="BAC" dataType="STRING" canEditValid="true" canEditViolated="true" displayName="BAC"> <nullable>false</nullable> <metaData> <item name="piiClassification"> <simpleValues> <simpleValue>High Risk</simpleValue> </simpleValues> </item> </metaData> <!-- BAC data point processors --> <rawDPValidators/> <rawDPProcessors/> <!--consolidationAlgorithm></consolidationAlgorithm --> <consDPValidators/> <consDPProcessors/> </dataPoint> <dataPoint name="FIRST_NAME" dataType="STRING" displayName="First Name" canEditValid="true" canEditViolated="true"> <rawDPValidators> <rawDPValidator name="IN_BAC_SET_CHECK " entity="IN_SET"> <config> <![CDATA[ { ignoreCase : true, options : [ "Tom", "Bob"] } ]]> </config> </rawDPValidator> </rawDPValidators> <consDPValidators> <consDPValidator name="INVALID_IF_CONSOLIDATED_NULL" entity="NOT_NULL"/> </consDPValidators> </dataPoint> <dataPoint name="LAST_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="LAST NAME"> <rawDPProcessors> <rawDPProcessor name="CAPITALISE_LAST_NAME_RAW_PROC" entity="GEN_EXPRESS"> <config> <![CDATA[ #GRV{ CTX['.'] = CTX['.'].toUpperCase() } ]]> </config> </rawDPProcessor> </rawDPProcessors> </dataPoint> <dataPoint name="ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="ADDRESS"/> <dataPoint name="PHONE_NUMBER" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PHONE NUMBER"/> <dataPoint name="AGE" dataType="INTEGER" canEditValid="true" canEditViolated="true" displayName="Age"> <rawDPValidators> <rawDPValidator name="INVALID_IF_NULL" entity="NOT_NULL"/> // The name can be anything and it will appear in data audit/lineage <rawDPValidator name="INVALID_IF_NEGATIVE" entity="POSITIVE"> <lateBound>false</lateBound> // This is the default value if one is not specified </rawDPValidator> </rawDPValidators> </dataPoint> <dataPoint name="YEARLY_INCOME" canEditValid="false" canEditViolated="true" dataType="DECIMAL" displayName="YEARLY INCOME"/> <dataPoint name="TFN" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="TFN"/> <dataPoint name="TFN_MASKED" displayName="Tax File Number Masked" dataType="STRING"> <consDPProcessors> <consDPProcessor name="TFN_HASH_MASKING" entity="HASH_MASK"> <config> <![CDATA[ { "inputValue":"#GRV{ CTX['TFN'] }", "maskingSalt":"aqQwSxXcfgdejhbJhdygjyfdghjHGYYIdh!66gydshasGY!" } ]]> </config> </consDPProcessor> </consDPProcessors> </dataPoint> <dataPoint name="PORTFOLIO_VALUE" displayName="Investment Portfolio Value" dataType="DECIMAL" canEditValid="false" canEditViolated="true" > <consolidationAlgorithm name="PORTF_VALUE_WEIGHTED_MEAN_01" entity="GEN_EXPRESS"> <config> <![CDATA[ #GRV{ def list= [] list.add(items.get("CUSTOMERS_A_XLSX")) list.add(items.get("CUSTOMERS_B_XLSX")) list.remove(null) if(list.size()==0) return 0; else if (list.size() == 1) return list[0] else { return (list[0].asDBL()*0.8 + list[1].asDBL()*0.2) } } ]]> </config> </consolidationAlgorithm> </dataPoint> <dataPoint name="COMPANY_NAME" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY NAME"/> <dataPoint name="COMPANY_ADDRESS" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY ADDRESS"/> <dataPoint name="PROFILE_IMAGE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="PROFILE_IMAGE"/> <dataPoint name="COMPANY_WEBSITE" canEditValid="false" canEditViolated="true" dataType="STRING" displayName="COMPANY WEBSITE"/> <dataPoint name="XML_ROOT_DOC" canEditValid="false" canEditViolated="true" displayName="XML Root Doc" dataType="XML"/> <dataPoint name="JSON_ROOT_DOC" canEditValid="false" canEditViolated="true" displayName="JSON Root Doc" dataType="JSON"/> </dataPoints> <schemaAppliedProcessors> <groupTags> <groupTag>DEFAULT</groupTag> </groupTags> <metaData/> <rawDPValidators/> <rawDPProcessors/> <consDPValidators/> <consDPProcessors/> <dataBlockProcessors/> </schemaAppliedProcessors> <alerts/> </schema> </schemas> </apiroConf>
Deploy config files
- Follow these steps Config Deployment to deploy and start using your configuration files.