obfuscate

The obfuscate process enables obfuscation of fields inside your documents in order to protect sensitive data.

Usage

In this example, a document contains a log field and a phone field, as shown in the following object:

  1. {
  2. "id": 1,
  3. "phone": "(555) 555 5555",
  4. "log": "My name is Bob and my email address is abc@example.com"
  5. }

To obfuscate the log and phone fields, add the obfuscate processor and call each field in the source option. To account for both the log and phone fields, the following example uses multiple obfuscate processors because each processor can only obfuscate one source.

In the first obfuscate processor in the pipeline, the source log uses several configuration options to mask the data in the log field, as shown in the following example. For more details on these options, see configuration.

  1. pipeline:
  2. source:
  3. http:
  4. processor:
  5. - obfuscate:
  6. source: "log"
  7. target: "new_log"
  8. patterns:
  9. - "[A-Za-z0-9+_.-]+@([\\w-]+\\.)+[\\w-]{2,4}"
  10. action:
  11. mask:
  12. mask_character: "#"
  13. mask_character_length: 6
  14. - obfuscate:
  15. source: "phone"
  16. sink:
  17. - stdout:

When run, the obfuscate processor parses the fields into the following output:

  1. {
  2. "id": 1,
  3. "phone": "***",
  4. "log": "My name is Bob and my email address is abc@example.com",
  5. "newLog": "My name is Bob and my email address is ######"
  6. }

Configuration

Use the following configuration options with the obfuscate processor.

ParameterRequiredDescription
sourceYesThe source field to obfuscate.
targetNoThe new field in which to store the obfuscated value. This leaves the original source field unchanged. When no target is provided, the source field updates with the obfuscated value.
patternsNoA list of regex patterns that allow you to obfuscate specific parts of a field. Only parts that match the regex pattern will obfuscate. When not provided, the processor obfuscates the whole field.
actionNoThe obfuscation action. As of Data Prepper 2.3, only the mask action is supported.

You can customize the mask action with the following optional configuration options.

ParameterDefaultDescription
mask_characterThe character to use when masking. Valid characters are !, #, $, %, &, , and @.
mask_character_length3The number of characters to mask in the field. The value must be between 1 and 10.

Predefined patterns

When using the patterns configuration option, you can use a set of predefined obfuscation patterns for common fields. The obfuscate processor supports the following predefined patterns.

You cannot use multiple patterns for one obfuscate processor. Use one pattern for each obfuscate processor.

Pattern nameExamples
%{EMAIL_ADDRESS}abc@test.com
123@test.com
abc123@test.com
abc_123@test.com
a-b@test.com
a.b@test.com
abc@test-test.com
abc@test.com.cn
abc@test.mail.com.org
%{IP_ADDRESS_V4}1.1.1.1
192.168.1.1
255.255.255.0
%{BASE_NUMBER}1.1
.1
2000
%{CREDIT_CARD_NUMBER}5555555555554444
4111111111111111
1234567890123456
1234 5678 9012 3456
1234-5678-9012-3456
%{US_PHONE_NUMBER}1555 555 5555
5555555555
1-555-555-5555
1-(555)-555-5555
1(555) 555 5555
(555) 555 5555
+1-555-555-5555
%{US_SSN_NUMBER}123-11-1234