Skip to main content

Reusable Data Enrichment Process

Natural Language Processing
Data Engineering
Cloud Engineering
Data Scientist
Cloud Engineer
Data Engineer
Automation Engineer
Nightly data enrichment.
The Challenge

Our client, a state government office focused on data and analytics associated with economic opportunity, collects data from various state organizations. The data set has many quality issues and lacks standardization and consistency. The client partnered with us to create a solution that would cleanse this data on an ongoing basis, allowing them to make better data-driven decisions regarding state programs and regulations.

The Solution

Our four-person engineering team designed a solution on Amazon Web Services™ (AWS) using Fuzzy Logic and Text Mining. We were able to implement a Python-based natural language processing algorithm, which was necessary to provide the appropriate data enrichment. Additionally, the team developed an AWS automation capability with Terraform™ so that environments could be easily created, updated and versioned.

The Result

Our solution allows our client to match wage and benefits data with other data sources that will influence programs and policymaking. The process is reusable, so it can be run as often as necessary to create value from new and changed data sets as they flow into the organization. In addition to resolving our client’s data matching problem, our solution will also service as a technology foundation for similar big data and artificial intelligence (AI) workloads.