Data protection compliant anonymization for Data Lakes – Knoxxer

Extraction, Transformation, Loading (ETL / ELT) & Anonymization for Datawarehouse, Big Data, Productive and Test Environments

Providing productive data for Datawarehouse, Big Data (Data Lake) and test environments in compliance with data protection (GDPR) does mean usually a lot of effort. This organizational and technical effort to provide data anonymously over and over again, archive or update it incrementally delays the implementation of the projects to be tested. Our open-source ETL and anonymization solution Knoxxer is the optimal solution for this challenge.

No definition of ETL jobs

Many customers use ETL tools to populate test systems or map anonymization tasks. This often requires specialists who develop and maintain the corresponding ETL jobs. Knoxxer’s anonymizing solution allows even less experienced and non-technical users to quickly and easily create test scenarios and anonymize data. Normally anonymised data and test scenarios can be provided within a few minutes. Knoxxer minimizes the complexity of data acquisition and anonymization processes to a minimum.

A fraction of the usual cost of an ETL and anonymization solution

Unlike many other solutions, the Knoxxer can be used throughout the Group / company:

  • Any number of instances usable
  • No usage-dependent license costs
  • The source code is provided

The combination of minimal complexity and optimal license model makes it possible to use Knoxxer’s anonymizing solution as a central solution for your data warehouse, big data and test environments as well as production systems.

One anonymization solution for all data warehouse, big data and test systems

Instead of one anonymization solution for each database, the anonymizing solution Knoxxer is able to read from and write to a hugh variety of sources and destinations. For example, from a SQL server to a CSV file, or just Oracle database to Oracle database. This means that a large number of data sources – as used in data warehouse and big data projects, as well as in the entire corporate world – can be served with just one solution. Your CIO and Purchasing will thank you.

ETL and anonymisation

ETL and anonymisation

High performance ETL / ELT and anonymization solution Knoxxer

The high-performance ETL / ELT and anonymization solution Knoxxer is able to read data from a variety of data sources, if necessary transform and anonymize and write to a variety of destinations. There are different, irreversible encryption methods available.

Also for the anonymization of customer data in production environments like in SAP or CRM Systems, which would otherwise have to be deleted after a certain time due to legal requirements, Knoxxer is the optimal solution.

Big Data, data protection and Business Intelligence are not a contradiction, but belong together closely. The anonymization using Knoxxer is probably the fastest way.

 

name

surname

revenue

orderdate

Müller

Max

2000

1.8.1999

Biermann

Bernd

1500

3.1.2010

Schmitt

Sabine

3000

9.6.2012

[…]

[…]

[…]

[…]

name

surname

revenue

orderdate

2dRYNmFgJenRvdeQdqYfzQ

zyfcE2KTdzyL/HNLXR/q9A

2000

1.1.2013

hZLoat39tpgCyS3M5CHsxw

8JfmUCxIgLQHcugzYszwwQ

1500

1.1.2015

VtIbNTrfyOFznrA/rtA0bQ

lMuPfAlC/fDHnhM+bOc2bA

3000

1.1.1980

[…]

[…]

[…]

[…]

Minimizing manual effort for test data management by artificial intelligence up to 90%

Instead of investing a lot of effort into manual anonymization or deleting the data, Knoxxer will:

  • Automatically read data from a source, irreversibly encrypt and write to the destination or encrypt on runtime inside destination (ETL / ELT with encryption)
  • Read all structures (metadata) from the source and store them inside a repository
  • Maintain primary and foreign key relationships across systems (cross-system referential integrity)
  • Keep patterns (distribution of data, validity of e-mail addresses, etc.), data types and data lengths
  • Suggestions for fields to be encrypted and methods are created using artificial intelligence (AI)

More functions for professional Testing

  • Data Aging
  • Conditional Updates (e.g. only branch North) and incrementel Updates (e.g.. last Update > 31.12.2017)
  • Datatype checks – Cancel or Logfile entry
  • Content checks (rulebased Validation / Constraints) – Cancel or Logfile entry
  • Null-Checks

Additional Advantages of Knoxxer

  • Multiple Algorithms and PlugIns for data anonymisation – The right algorithm for every application
  • Store data from productive systems compliant in test environments
  • High-performance anonymization of structured and polystructured mass or single data
  • Artificial intelligence will support you to select find the fields which should be anonymized
  • Automatic creation of a data protection catalog and report

PDF Reports for the Data governance manager, Tester and Data protection officer

PDF reports about the metadata, which can be quickly and easily customized via templates, are created automatically. The use of your BI solution is also easily possible. All metadata are accessible either in a database repository or a CSV-file.

data protection report

data protection report – details

 

Especially agile projects will benefit from the high flexibility. For selection of important segments, e.g. for demographic test scenarios, we offer optional AI-modules for automated segmentation and classification. As a result, complete data provision can often be avoided. Effort and costs are noticeably reduced.

Datasources

We support a variety of data sources and targets, such as:

  • CSV-  / text-files
  • XML-files
  • relational database systems using JDBC (such as Oracle, SAP Hana, DB2, Microsoft SQL Server, MySQL, PostgreSQL, …)
  • Support for the Hadoop ecosystem (Hive, Impala, etc.)
  • Apache drill
  • Streams (e.g., Kafka)

Testing operational systems, Datawarehouse and artificial intelligence

Knoxxer offers you the option to leave field types and lengths unchanged even after content has been anonymised. Alternatively, the data type from the source can be specifically transformed to a different data type. This will support you optimal while testing changes.

Using and testing Cubes, artificial intelligence, etc., which require a certain distribution of the data / pattern for the test, is also possible without any problems after the anonymization. Relevant patterns will still remain.

Minimal effort for commissioning, user-friendly, automation by bots

Effort to install and run knoxxer is minimal – in most times, the system can be used in less than one hour. Knoxxer differs from most solutions by the reduced complexity of operation, without sacrificing performance. Users can usually use Knoxxer after a few minutes. Bots perform many recurring, trivial tasks. Tedious test preparation was yesterday.

Automation – cloud or on-premise

Knoxxer can be used by commandline or Webinterface. By this it’s easy to automate your jobs and handling is really easy. It’s up to you run Knoxxer in the cloud or on-premise. Knoxxer is able to scale.

Solutions and consulting

Would you like to know more about the possibilities of our open source anonymization solution Knoxxer or want to get a testdrive? Call us on +49 2547 93998 0 or write us a message. We will advise you personally.

I agree to the collection, processing and storage of my information here in accordance with your privacy policy. I can revoke my consent at any time by informing you.