Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
-
Updated
Dec 13, 2023 - Python
Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool
A lightweight simple data quality testing tool.
FinAUDIT is an AI-powered financial data health and compliance system that automatically audits datasets against global regulatory standards (GDPR, Visa CEDP, AML, PCI DSS, and Basel). It combines a deterministic 30-rule engine for rigorous data quality scoring with a Generative AI Analyst (Gemini) to provide natural-language answer.
dbt Datasphere Plugin is for integrating multiple open-source data quality frameworks into your dbt projects. It unifies Soda SQL, Great Expectations, Datafold, providing a single interface to configure and run data quality checks.
This repository contains a complete data lakehouse implementation using Docker. It showcases an end-to-end data pipeline with Apache Spark for ETL, MinIO and Delta Lake for storage, Airflow for orchestration, DQOps for data quality, and Superset for BI.
An Apache Airflow data pipeline is designed to perform ELT operations, utilizing Amazon S3 and Amazon Redshift Serverless.
A personal project using an NLP Model to create graphical plots based on user inputted file and instructions (consisting of choice of columns to be used for the graphical plots) . Additionally, this tool also does data quality check of uploaded data file along with sentiment analysis of user-inputted text.
This repo contains details about travel booking project executed on Databricks, Thanks
Add a description, image, and links to the dataqualitycheck topic page so that developers can more easily learn about it.
To associate your repository with the dataqualitycheck topic, visit your repo's landing page and select "manage topics."