A How-to Guide on Parsing List Data in XML and JSON with CloverDX 4.1

Parsing of XML and JSON data is an important task in data integration because the majority of today's APIs and Web Services are built around these two formats. CloverDX has JSONExtract and XMLExtract components which do just that.

With the release of CloverDX version 4.1 we added the ability to to parse JSON array elements and repeating elements in XML (sequences) directly into lists in CloverDX records.

A metadata field usually holds only a single value of predefined data type (string, integer, etc.). However, you can switch the field to “list” or “map” mode which makes it hold an array of values (of the same data type). Some components can work with these directly, or you can easily manipulate lists in CTL.

You can check out our documentation for finer details.

Look at “galleries” and “tags” in these simple examples:

JSON

{

"filename": "dsc_74532.jpg",

"galleries": ["family", "2015"],

"tags": ["summer", "children", "Peter", "Mary"]

}

XML

<?xml version="1.0" encoding="UTF-8" ?>

<catalog>

  <file>

   <filename>dsc_74532.jpg</filename>

   <galleries>family</galleries>

   <galleries>2015</galleries>

   <tags>summer</tags>

   <tags>children</tags>

   <tags>Peter</tags>

   <tags>Mary</tags>

  </file>

</catalog>

You can see these have multiple values that need to be parsed and processed. Previously, you had to work around such multi-value elements by having multiple record streams (and therefore multiple edges) instead of using multi-value fields.

See how it looks with lists instead of mutiple record streams:

As you can see, we've saved a component for each multi-value element, plus a 'join' component to put everything back together. Undoubtedly, it can save a lot of time in development when incoming data structures change. And of course, this improvement also positively affects performance.

Here's how it works...

Let’s see how the new list mapping feature works for the sample XML data above (galleries and tags).

The mapping looks simple, just like if we were dealing with regular single value fields. For obvious reasons this only works if the repeating element does not contain additional sub elements (i.e. if is not a structured element on its own).

Jan Sedláček

January 30, 2016

CloverDX How-To

Quick start

CloverDX Academy

A How-to Guide on Parsing List Data in XML and JSON with CloverDX 4.1

JSON

XML

Here's how it works...

More from Tech Blog

Sending emails via Azure Communication Services SMTP

Connecting to REST APIs (OpenAPI)

Performance tuning: How to troubleshoot database-related performance issues in CloverDX

CloverDX Transformation Language: How to Extend CTL with Java Functions

Organizing large projects: Separating Configuration and Data

Efficient data modelling with DBT and ETL data pipeline

Visit CloverDX Blog

The vital importance of data governance in the age of AI

Bringing a human perspective to data integration, mapping and AI

How AI is shaping the future of data integration

How to say ‘yes’ to all types of data and embark on a data-driven transformation journey

Data ingestion tools: 7 features you should look for

Read On

CloverDX Data Quality: An Introduction to Validator

Hierarchical data structures - JSON