Data Quality Diagramming Configuration

Data Quality Diagramming Configuration

This section walks you through the steps to create, configure, run, and review a data quality task using Diagramming Configuration in LakeFusion. The process is designed to ensure your data meets the required standards for accuracy, consistency, and reliability.

Step 1: Creating a Data Quality Task

  • Access the Data Quality card (either from Home or from the left navigation pane) 

  • Click on Create Quality Task

You have two options to define your task:

  1. Notebook Configuration: Use a predefined notebook template to configure your data quality tasks.

  1. Diagramming: Use a visual interface to create and link data quality tasks in a flow.

This section walks you through the steps to use Diagramming Configuration in LakeFusion.
            
            Provide the following required information:
  • Quality Task Name (following naming standards)

  • Detailed task description

  • Choose a task type

  • Target dataset selection

  • Task execution frequency (if required)

Step 2: Configuration

  • After choosing the task type, the task will be in draft status and change to configured after this step.




Step 3: Diagramming Configuration

  • Click on the created task and use the visual interface to link data quality tasks (e.g., Null Handler, String Cleaner, Filter, Value Mapper).

  • Click on each of the task boxes to configure. E.g. In the null handler, select the column, choose the Replace Type, and enter the Replacement Value.

  • Hover over a task box to start a connecting, hold your mouse, and drag a line to the next task box to create an order of operations.


  • Click on Validate Flow to ensure there are no errors (e.g., cyclic errors). You can only save the flow if there are no errors.


Step 4: Run task
  1. Execute the data quality task by clicking on run now in the details tab. The system will apply the configured rules and logic to the selected dataset.      

Step 5: View Results

  • After configuration, a job is created, and job details can be found in the Details Tab.

  • View task execution results in the results tab.

  • Monitor the task execution status (e.g., In Progress or Completed) in the Runs Tab.


    • Related Articles

    • Data Quality Notebook Configuration

      This section walks you through the steps to create, configure, run, and review a data quality task using Notebook Configuration in LakeFusion. The process is designed to ensure your data meets the required standards for accuracy, consistency, and ...
    • Data Profiling Configuration

      This section walks you through the Data Profiling process in LakeFusion, which analyzes datasets to generate key metrics that reveal data structure, assess quality, and identify anomalies for informed decision-making and improved data management. ...
    • Data Flow in LakeFusion

      This section provides a structured overview of the LakeFusion Data Flow, outlining the key stages and enabling technologies that support seamless data ingestion, preprocessing, and Master Data Management (MDM). Each stage ensures data is unified, ...
    • Overview

      LakeFusion is an AI-powered Master Data Management (MDM) solution, purpose-built for the Databricks Lakehouse Platform. It delivers a single source of truth by unifying fragmented data across systems using advanced entity resolution and deduplication ...
    • Integration Hub

      Integration Task creation Navigate to Integration Hub post-Match Maven completion Configure new pipeline with required parameters: Task Name designation Entity selection Model specification Execute task creation Access workflow configuration via ...