Dataset Creation

Dataset Creation

    In the this configuration, the dataset’s location within Databricks is specified, and it is set up for subsequent tasks in LakeFusion.
  1. Locate and select the Datasets card on the home screen

  1. Initialize the dataset creation process with the “Create Dataset” button.

  1. Provide the following required information:

    • Dataset Name 

    • Comprehensive description of the dataset

    • Select the appropriate table path (A table path is the location of a dataset within the Databricks workspace that you want to connect to and use within LakeFusion)

  1. Complete the Dataset creation process with the “Create” button.

    • Related Articles

    • Entity Creation

      Entity configuration establishes the foundation for golden record generation by consolidating and organizing multiple data sources within a unified entity structure. Step 1: Entity Creation 1.Access the Entity Creation card (either from Home or from ...
    • Data Profiling Configuration

      This section walks you through the Data Profiling process in LakeFusion, which analyzes datasets to generate key metrics that reveal data structure, assess quality, and identify anomalies for informed decision-making and improved data management. ...
    • Data Flow in LakeFusion

      This section provides a structured overview of the LakeFusion Data Flow, outlining the key stages and enabling technologies that support seamless data ingestion, preprocessing, and Master Data Management (MDM). Each stage ensures data is unified, ...
    • Integration Hub

      Integration Task creation Navigate to Integration Hub post-Match Maven completion Configure new pipeline with required parameters: Task Name designation Entity selection Model specification Execute task creation Access workflow configuration via ...
    • Data Quality Notebook Configuration

      This section walks you through the steps to create, configure, run, and review a data quality task using Notebook Configuration in LakeFusion. The process is designed to ensure your data meets the required standards for accuracy, consistency, and ...