Category: Data

Machine Learning has a data quality problem. And while there are various tools to check data as it enters ML operations, there are few frameworks out there to standardize data validation across an entire system, or company.

Bantilan introduced two open source programs that he created that can help root out bad data before it is used in production, as well as standardize the process of data validation.

Pandera is a statistical typing and data testing tool that can be integrated in Flyte to validate additional properties beyond data types, in effect adding guardrails to a data processing pipeline.

With data testing, Pandera can both validate the live data coming in as well as the functions handling that data.

Related Articles