linkedin tracking
icon-sprite Created with Sketch.
Skip to main content
Working with Structured and Unstructured Data January 29, 2020
Mobile & Web

Working with Structured and Unstructured Data

reading time
Photo of Cyrus Beer

Written by

Cyrus Beer

Share this article

Relational databases define the relations between data entities, data types and other meta-data, such as whether or not a data element is required. Once this meta-data is defined, it’s guaranteed we’ll know the structure of the data stored inside. In contrast, non-relational or nosql databases store unstructured data of any type and relation. The structure of the data is fluid and changeable.

My current project utilizes a relational database, which works well for 95% of the project. We have one requirement, though, for which using unstructured data has proved to be extremely useful. The project needs to store immutable records that flow through a workflow of approvals, rejections, re-submissions, etc.. These immutable records need to be archived for several years and can be reviewed by federal regulators to monitor compliance. In other words, being immutable is not just a goal, it is a requirement of the federal government. The records themselves are comprised of relational data. There is data related to a person, data related to financial plans owned by this person, and data related to questions and answers pertinent to this person’s financial portfolio. If we stored the record data as purely relational data, retrieving a snapshot in time from the past would pose many challenges. Each table would need to be archived and if different tables were archived at different times, we would have to piece back together what the data looked like at a certain time in the past.

The solution we developed uses Postgres to store unstructured data inside an otherwise relational database. The approach greatly simplified the requirement to archive our records. Stored as JSON, the data has no joins. All the data for a given record is stored as a single JSON object inside a relational table. To view a record that is several years old is now straight-forward and reliable. As long as we do not update the JSON content, it remains immutable and self-contained.

Three light blue rectangles in a row, connected by arrows, labeled "User", "Users Financial Records" and "Financial Record".

The JSONB data itself is made up of several nested relationships. Here is a sample of what that looks like

{
    "investor": {
        "id": null,
        "firstName": null,
        "lastName": null,
        
    },
    "creator": {
        ...
        "teamRole": {
            ...
            "team": {
                  ...
            }
        },
        "supervisor": {
            ...
        }
    },
    "modules": [
        {
            "label": null,
            ...
            "questions": [
                  {
                          "questionNumber": 1,
                          ...
                          "possibleAnswers": [
                                  ...
                          ]
                  },
                  ...
                ]
            }
      ]
}

While this solution worked well, it created a new set of challenges:

  • Versioning
  • Querying
  • Performance, and
  • JSON and JSONB data types with JOOQ

Versioning – Once the project goes live, the front end is referencing objects with specific keys inside the JSON structure to display the data on the screen. What happens in a year or two when the business needs change and a new JSON data structure is required?

We solved this problem in two ways. First we created a relational table that holds the JSON “templates”. To start out there is only one. All of our data records are based off of this template. And since the unstructured JSON data is stored in a relational table, we created a foreign key from the JSON data to the template to which it adheres. If the data structure was to change in the future, we would create a separate template and future JSON records will relate via foriegn key to this new template. The second thing we did is to generate a PDF version of the JSON data and store that PDF in Amazon S3. In the future if our front end is no longer able to display this old data structure, we still have a PDF that we can show the user.

Querying – We needed a reliable and accurate way to query aspects of our large JSON data. To address this challenge, Postgres offers a simple syntax to query inside a JSON structure. Take an example where you have a table users with a JSON column activities, and activities has the following structure

[
  {
      "type": "outdoor",
      "label": "hiking"
  },
  {
      "type": "indoor",
      "label": "drawing"
  }
]

You could then query this with the following

select activities from users where activities ->> ‘type’ = ‘outdoor’

Note: More information can be found here on ways to query JSON data in postgres.

Performance – In Postgres, the JSON data type is not indexable, which makes querying slow. But Postgres supports JSONB data type which does allow indexing. In the above example, we might do the following:

CREATE INDEX idx_user_activities ON users (activities->'type');

While this slows the writing of data to the database, the performance of writing to the database was not nearly as important as the speed of querying data for us, so we chose the JSONB data type. Now, when the user selects

select activities from users where activities ->> ‘type’ = ‘outdoor’

the index we just created will greatly enhance the performance of the query.

JOOQ – At Cantina we strongly support using JOOQ as the data layer in your Spring Boot App so we wanted to find a way to get JOOQ to recognize the JSON and JSONB data types. To do so we used the following code.

First create a class called JsonBinder.

package com.my.project.util;

import java.io.IOException;
import java.sql.SQLException;
import java.sql.SQLFeatureNotSupportedException;
import java.sql.Types;
import java.util.Objects;

import org.jooq.Binding;
import org.jooq.BindingGetResultSetContext;
import org.jooq.BindingGetSQLInputContext;
import org.jooq.BindingGetStatementContext;
import org.jooq.BindingRegisterContext;
import org.jooq.BindingSQLContext;
import org.jooq.BindingSetSQLOutputContext;
import org.jooq.BindingSetStatementContext;
import org.jooq.Converter;
import org.jooq.impl.DSL;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.NullNode;
import com.my.project.ServerException;

public class JsonBinder implements Binding<Object, JsonNode> {

  @Override
  public Converter<Object, JsonNode> converter() {
    return new Converter<Object, JsonNode>() {

        @Override
      public JsonNode from(Object t) {
        try {
            return t == null
            ? NullNode.instance
            : new ObjectMapper().readTree(t + "");
          }
          catch (IOException e) {
            throw new ServerException(e.getMessage());
          }
        }

        @Override
        public Object to(JsonNode u) {
          try {
            return u == null || u.equals(NullNode.instance)
            ? null
            : new ObjectMapper().writeValueAsString(u);
        }
        catch (IOException e) {
          throw new ServerException(e.getMessage());
        }
      }

      @Override
      public Class<Object> fromType() {
        return Object.class;
      }

      @Override
      public Class<JsonNode> toType() {
        return JsonNode.class;
      }
    };
  }

  @Override
  public void sql(final BindingSQLContext<JsonNode> ctx) throws SQLException {
    ctx.render().visit(DSL.val(ctx.convert(converter()).value())).sql("::json");
  }

  @Override
  public void register(final BindingRegisterContext<JsonNode> ctx) throws SQLException {
    ctx.statement()
      .registerOutParameter(ctx.index(),
                    Types.VARCHAR);
  }

  @Override
  public void set(final BindingSetStatementContext<JsonNode> ctx) throws SQLException {
      ctx.statement()
        .setString(ctx.index(),
              Objects.toString(ctx.convert(converter()).value(),
            null));

  }

  @Override
  public void set(final BindingSetSQLOutputContext<JsonNode> ctx) throws SQLException {
    throw new SQLFeatureNotSupportedException();

  }

  @Override
  public void get(final BindingGetResultSetContext<JsonNode> ctx) throws SQLException {
    ctx.convert(converter()).value(ctx.resultSet().getString(ctx.index()));

  }

  @Override
  public void get(final BindingGetStatementContext<JsonNode> ctx) throws SQLException {
    ctx.convert(converter()).value(ctx.statement().getString(ctx.index()));
  }

  @Override
  public void get(final BindingGetSQLInputContext<JsonNode> ctx) throws SQLException {
    throw new SQLFeatureNotSupportedException();
  }
}

And then in the pom.xml (This assumes that the JSONB columns end with _json_content.

<plugin>
  <groupId>org.jooq</groupId>
  <artifactId>jooq-codegen-maven</artifactId>
  <executions>
    <execution>
      <id>generate-postgres</id>
      <phase>generate-sources</phase>
      <goals>
        <goal>generate</goal>
        </goals>
        <configuration>
        <jdbc>
          <driver>${spring.datasource.driverClassName}</driver>
          <url>${spring.datasource.url}</url>
          <user>${spring.datasource.username}</user>
          <password>${spring.datasource.password}</password>
        </jdbc>
        <generator>
          <database>
            <name>org.jooq.meta.postgres.PostgresDatabase</name>
            <includes>.*</includes>
            <excludes></excludes>
            <inputSchema>my_schema</inputSchema>
            <forcedTypes>
              <forcedType>
                <userType>com.fasterxml.jackson.databind.JsonNode</userType>
                <binding>com.my.project.util.JsonBinder</binding>
                <expression>.__json_content._</expression>
                <types>.*</types>
              </forcedType>
            </forcedTypes>
          </database>
          <generate>
            <deprecated>false</deprecated>
            <instanceFields>true</instanceFields>
          <pojos>false</pojos>
          </generate>
          <target>
            <packageName>com.my.project.jooq.model</packageName>
            <directory>target/generated-sources/jooq</directory>
          </target>
        </generator>
      </configuration>
    </execution>
  </executions>
  <dependencies>
    <dependency>
      <groupId>org.postgresql</groupId>
      <artifactId>postgresql</artifactId>
      <version>42.2.5</version>
    </dependency>
  </dependencies>
</plugin>

Working with the JSONB data type in Postgres, Spring, and JOOQ allowed us to easily archive immutable data within an otherwise relational database. Once we addressed the concerns outlined above, this approach was a great solution for our needs. Want to learn more? Reach out and let us know how we can help you.

Insights

Contact us

Talk with our experts today

Tell us how you need help. We’ll be in touch.

Thank you! Your message has been sent!

We’ll be in touch.

Want to know more about Cantina? Check out our latest insights or join us at one of our upcoming events.