A Simple Guide to Extracting Numbers and Words from a String and Mapping Them to a JSON Object

Sandeep Sai Kumar Kancharla
Another Integration Blog
4 min readDec 13, 2023

In this article, we will learn how to use dataweave to transform a string of alphanumeric characters into a JSON object. Dataweave is a powerful scripting language that allows us to manipulate data in various ways. We will use the following dataweave script as an example:

%dw 2.0
output application/json
import * from dw::core::Strings
var num = flatten(payload filter ((character, index) ->
character matches /[0-9]/
) scan /[0-9]{2}/)
// scan /[0-9]{2}/ it will divide the string into array and each item must have size 2
var char = flatten(payload filter ((character, index) ->
character matches /[a-zA-Z]/
) scan /[a-zA-Z]{3}/)
// scan /[a-zA-Z]{3}/ it will divide the string into array and each item must have size 3

---
{(num map(
($) : char[$$]
))}

This script takes a string of alphanumeric characters as the input payload and outputs a JSON object. The script does the following steps:

  • It imports the Strings module from the dw::core library, which provides useful functions for working with strings.
  • It declares two variables: num and char. The num variable stores an array of two-digit numbers extracted from the payload using the filter and scan functions. The filter function takes a lambda expression that checks if each character in the payload matches the regular expression /[0–9]/, which means any digit from 0 to 9. The scan function takes another regular expression /[0–9]{2}/, which means any two consecutive digits, and returns an array of matches. The flatten function flattens the nested array into a single array. For example, if the payload is “a12b34c56d”, the num variable will store [12, 34, 56].
  • The char variable stores an array of three-letter words extracted from the payload using the same logic as the num variable, but with different regular expressions. The filter function checks if each character matches the regular expression /[a-zA-Z]/, which means any letter from A to Z, either uppercase or lowercase. The scan function takes the regular expression /[a-zA-Z]{3}/, which means any three consecutive letters, and returns an array of matches. For example, if the payload is “a12b34c56d”, the char variable will store [“abc”].
  • The script then outputs a JSON object by using the curly braces {} and the map function. The map function takes a lambda expression that maps each element in the num array to a key-value pair in the JSON object. The key is the element itself, and the value is the corresponding element in the char array at the same index. For example, if the num array is [12, 34, 56] and the char array is [“abc”], the output JSON object will be {“12”: “abc”, “34”: null, “56”: null}.

By using dataweave, we can easily transform a string of alphanumeric characters into a JSON object with a few lines of code. Dataweave offers many more features and functions that can help us manipulate data in various formats and scenarios.

But there is a small problem with the above script, if any extra single number or extra single or twice alphabet present in the string those are ignored like below

so to handle this case we need to do small modification to the above script replace scan /[0–9]{2}/ with scan /[0–9]{1,2}/ and scan /[a-zA-Z]{3}/ with scan /[a-zA-Z]{1,3}/

%dw 2.0
output application/json
import * from dw::core::Strings
var num = flatten(payload filter ((character, index) ->
character matches /[0-9]/
) scan /[0-9]{1,2}/)
// scan /[0-9]{1,2}/ it will divide the string into array and each item have size 2 or 1
var char = flatten(payload filter ((character, index) ->
character matches /[a-zA-Z]/
) scan /[a-zA-Z]{1,3}/)
// scan /[a-zA-Z]{3}/ it will divide the string into array and each item have size 3, 2 or 1

---
{(num map(
($) : char[$$]
))}

Example:

Input

“01020304056INDAUSENGUSACHNUK”

Output

{
"01": "IND",
"02": "AUS",
"03": "ENG",
"04": "USA",
"05": "CHN",
"6": "UK"
}

Note:

above code I used to show you how can we us matches function, without matches function also we can achieve this lke below

--

--