DP-203 - Data Engineering on Microsoft Azure: Data Factory
DP-203 - Data Engineering on Microsoft Azure: Data Factory
- discover the key concepts covered in this course
- describe trigger concepts and how they relate to the Azure Data Factory
- create an Azure Data Factory using the Azure portal
- handle schema drift when mapping data flow
- create Azure Data Factory pipelines and activities
- trigger the pipeline manually or using a schedule
- optimize pipelines for analytical or transactional purposes
- insert data that does not currently exist using the Data Factory
- troubleshoot Azure Stream Analytics using resource logs
- design and implement incremental data loads
- design and develop slowly changing dimensions
- handle security and compliance requirements
- summarize the key concepts covered in this course
Once you have data in storage, you'll need to have some mechanism for transforming the data into a usable format. Azure Data Factory is a data integration service that is used to create automated data pipelines that can be used to copy and transform data. In this course, you'll learn about the Azure Data Factory and the Integration Runtime. You'll explore the features of the Azure Data Factory such as linked services and datasets, pipelines and activities, and triggers. Finally, you'll learn how to create an Azure Data Factory using the Azure portal, create Azure Data Factory linked services and datasets, create Azure Data Factory pipelines and activities, and trigger the pipeline manually or using a schedule. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Data Flow Transformations
DP-203 - Data Engineering on Microsoft Azure: Data Flow Transformations
- discover the key concepts covered in this course
- describe the type of available Azure Data Flow transformations
- transform data using Azure Data Mapping Data Flows
- transform and split data using Azure Data Flow
- transform and flatten data using Azure Data Flow
- describe the types of expression functions available in Azure Data Flow
- configure Azure Data Flow to perform error handling for data rows that would truncate data
- transform and use derived columns to normalize data values
- ingest and transform data using Azure Spark and Scala
- handle duplicate data using Azure Data Flows
- summarize the key concepts covered in this course
One of the key components of the Azure Cloud platform is the ability to store and process large amounts of data. Azure Data Flow Transformations can be used to ingest and transform data. In this course, you'll learn about the types of Azure Data Flow transformations that are available. You'll explore how to transform, split, and flatten data, as well as handle duplicate data, using Azure Data Mapping Data Flows. Next, you'll examine the types of expression functions available in Azure Data Flow and how to perform error handling for data rows that would truncate data. Finally, you'll learn how to transform and use derived columns to normalize data values, and how to ingest and transform data using Azure Spark and Scala. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Data Lake Storage
DP-203 - Data Engineering on Microsoft Azure: Data Lake Storage
- discover the key concepts covered in this course
- describe Azure Data Lake Storage Gen2 features and when to use this storage type
- create an Azure Data Lake Storage Gen2 storage account
- manage directories, files, and Access Control Lists in Azure Data Lake Storage Gen2 using the .NET framework
- perform extract, transform, and load operations using Azure Databricks from Azure Data Lake Storage Gen2
- transform data using Apache Spark
- transform data by using Data Factory and data flows
- integrate pipelines using Synapse Studio
- stream data into Azure Databricks by using Event Hubs
- transform data using Azure Databricks
- summarize the key concepts covered in this course
Azure Data Lake Storage Gen2 provides features to work with big data analytics using Azure Blob Storage. Azure Blob Storage systems provide performance, management, and security functionality. In this course, you'll learn about the features of the Azure Data Lake Storage Gen2 and when to use this storage type. You'll explore features and methods for securing data for the Azure Data Lake Storage Gen2 service and data at rest. You'll examine methods for processing big data using the Azure Data Lake Storage Gen2 service and monitoring Azure Blob Storage. You'll learn how to manage directories, files, and Access Control Lists in Azure Data Lake Storage Gen2 using the .NET framework, as well as how to perform extract, transform, and load operations using Azure Databricks from Azure Data Lake Storage Gen2. Finally, you'll learn how to access Azure Data Lake Storage Gen2 data using Azure Databricks and Spark. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Data Partitioning
DP-203 - Data Engineering on Microsoft Azure: Data Partitioning
- discover the key concepts covered in this course
- recognize data partitioning concepts
- describe data partitioning strategies for different services
- compare transactional and analytical workloads to determine which data store and partitioning strategy to implement
- recognize criteria for determining how to partition files for efficient distribution and querying
- describe how to partition a table to ensure efficient scalability for analytical workloads
- describe how the index table and materialized view design patterns can increase efficiency and performance of queries
- describe table partitions used by Azure Synapse Analytics and how to size them, and recognize the differences from SQL Server
- describe when to implement partitioning at the storage layer
- describe how data sharding distributes load over multiple datastores
- summarize the key concepts covered in this course
Partitioning data is key to ensuring efficient processing. In this course, you'll explore what data partitioning is and the strategies for implementation. You'll learn about transactional and analytical workloads and how to determine the best strategy for your files and table storage. Then, you'll examine design patterns for efficiency and performance. You'll learn about partitioning dedicated SQL pools in Azure Synapse Analytics and partitioning data lakes. Finally, you'll learn how data sharding across multiple data stores can be used for improving transaction performance. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Data Policies & Standards
DP-203 - Data Engineering on Microsoft Azure: Data Policies & Standards
- discover the key concepts covered in this course
- describe scenarios for encrypting data and the best practices for utilizing disk encryption on Azure
- describe how Transparent Data Encryption can be used to encrypt an Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics database at rest
- describe the Always Encrypted feature that client applications can use to encrypt data stored in Azure SQL Database
- describe how dynamic data masking and classification can identify sensitive data and mask sensitive data in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics
- describe best practices for defining a data retention policy
- describe guidelines, processes, limitations, and considerations for performing a data purge in Azure Data Explorer
- recognize the options for controlling access to Azure Data Lake
- describe what should be audited for and how technologies on the Azure platform can enable an auditing strategy
- describe how Row-Level Security in Azure SQL Database implements restrictions on access to data rows
- summarize the key concepts covered in this course
Data policies and standards help to ensure a repeatable security standard is maintained. In this course, you'll learn about data encryption scenarios and best practices. You'll explore how Azure Transparent Database Encryption and Always Encrypted can be used to ensure data at rest is protected. Next, you'll examine how data classification and data masking can protect data being viewed. You'll learn to configure data retention and purging to ensure data is retained or removed. You'll also explore the various means of controlling access to Azure Data Lake Storage Gen2. Finally, you'll learn how to plan a data auditing strategy and how to limit access to data at the row level in a database. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Data Process Monitoring
DP-203 - Data Engineering on Microsoft Azure: Data Process Monitoring
- discover the key concepts covered in this course
- describe the features of the Azure Monitor tools and the concepts of continuous monitoring and visualization
- create metric charts using the Azure Monitor
- collect and analyze Azure resource log data
- perform queries against the Azure Monitor logs
- create and share dashboards that display data from Log Analytics
- describe the features of Azure HDInsight for ingesting, processing, and analyzing big data
- use the Azure Data Factory Analytics solution to monitor pipelines
- describe how to monitor and tune query performance
- query Azure Log Analytics and filter, sort, and group query results
- describe how to implement versioning control in Azure Data Factory
- describe how to monitor cluster performance
- describe how to implement logging for custom monitoring
- archive data from a Log Analytics workspace to Azure storage using Logic App
- summarize the key concepts covered in this course
Being able to monitor data processes to ensure they are operational and working correctly is a crucial part of running your business. Azure provides the Azure Monitor service and the Azure Log Analytics service to perform this function. In this course, you'll learn about the features of the Azure Monitor tools and the concepts of continuous monitoring and visualization. Next, you'll examine how to create metric charts using the Azure Monitor, as well as how to collect and analyze Azure resource log data and perform queries against the Azure Monitor logs. You'll explore how to create and share dashboards that display data from Log Analytics, create Azure Monitor alerts, and use the Azure Data Factory Analytics solution to monitor pipelines. You'll learn how to send the Azure Databricks logs to the Azure Monitor and use the dashboard to analyze the Azure Databricks metrics. Finally, you'll learn how to enable monitoring for Azure Stream Analytics and configure alerts, and also query Azure Log Analytics and filter, sort, and group query results. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Data Solution Optimization
DP-203 - Data Engineering on Microsoft Azure: Data Solution Optimization
- discover the key concepts covered in this course
- describe the concept of cloud optimization and some best practices for optimizing data using data partitions, Azure Data Lake Storage tuning, Azure Synapse Analytics tuning, and Azure Databricks auto-optimizing
- describe how to optimize Azure Stream Analytics jobs
- describe various strategies for partitioning data using Azure-based storage solutions
- describe how to optimize the Azure Data Lake Storage Gen2 for performance
- describe methods for optimizing Azure Synapse Analytics
- describe how to optimize the Azure Cosmos DB using indexing and partitioning
- describe methods for monitoring Azure Databricks
- describe methods for optimizing Azure Databricks
- describe how to manage common data troubleshooting techniques
- summarize the key concepts covered in this course
Ensuring that data storage and processing systems are operating efficiently will allow your organization to save both time and money. There are several tips and tricks that can be used to optimize both Azure Data Storage service and processes. In this course, you'll learn about cloud optimization, as well as some best practices for optimizing data using data partitions, Azure Data Lake Storage tuning, Azure Synapse Analytics tuning, and Azure Databricks auto-optimizing. Next, you'll learn about strategies for partitioning data using Azure-based storage solutions. You'll learn about the stages of the Azure Blob lifecycle management and how to optimize Azure Data Lake Storage Gen2, Azure Stream Analytics, and Azure Synapse Analytics. Finally, you'll explore how to optimize Azure Data Storage services such Azure Cosmos DB using indexing and partitioning, as well as Azure Blob Storage and Azure Databricks. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Data Storage Monitoring
DP-203 - Data Engineering on Microsoft Azure: Data Storage Monitoring
- discover the key concepts covered in this course
- describe the features of the Azure Monitor service and how it can be used to monitor storage data
- monitor Azure Blob storage
- access diagnostic logs to monitor Data Lake Storage Gen2
- monitor Azure Synapse Analytics jobs and the adaptive cache
- monitor Azure Cosmos DB using the portal and resource logs
- configure, manage, and view metric alerts using the Azure Monitor
- describe the features and concepts of Azure Log Analytics
- configure, manage, and view activity log alerts using the Azure Monitor
- monitor Azure Stream Analytics jobs
- describe types of Azure Cosmos DB indexes
- summarize the key concepts covered in this course
Being able to monitor data storage systems to ensure they are operational and working correctly is a crucial part of running your business. Azure provides the Azure Monitor service and the Azure Log Analytics service to perform this function. In this course, you'll learn about the features of Azure Log Analytics, as well as the Azure Monitor service and how it can be used to monitor storage data and monitor Azure Blob storage. Next, you'll explore how to access diagnostic logs to monitor Data Lake Storage Gen2, monitor the Azure Synapse Analytics jobs and the adaptive cache, and monitor Azure Cosmos DB using the portal and resource logs. Finally, you'll examine how to configure, manage, and view metric alerts using the Azure Monitor and activity log alerts using the Azure Monitor. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Databrick Processing
DP-203 - Data Engineering on Microsoft Azure: Databrick Processing
- discover the key concepts covered in this course
- describe the types of available processing when using Azure Databricks such as stream, batch, image and parallel processing
- create an Azure Databricks workspace using an Apache Spark cluster
- run jobs in the Azure Databricks Workspace jobs using a service principal
- query data in SQL server using an Azure Databricks notebook
- validate and handle failed batch loads
- implement a Cosmos DB service endpoint for Azure Databricks
- extract, transform, and load data using Azure Databricks
- perform sentiment analysis for steam data by making use of Azure Databricks
- debug Spark Jobs running on HDInsight
- summarize the key concepts covered in this course
When working with big data there needs to be a mechanism to process and transform this data quickly and efficiently. Azure Databricks is a service that provides the latest version of Apache Spark that provides functionality processing data from Azure Storage. In this course, you will learn about the types of processing that can be performed with Azure Databricks such as stream, batch, image and parallel processing. Next, you'll learn how to create an Azure Databricks workspace using an Apache Spark cluster, run jobs in the Azure Databricks Workspace jobs using a service principal and query data in SQL server using an Azure Databricks notebook. Next, you'll learn how to retrieve data from an Azure Blob Storage using Azure Databricks and the Azure Key Vault, implement a Cosmos DB service endpoint for Azure Databricks, and extract, transform, and load data using Azure Databricks. Finally, you'll learn how to stream data into Azure Databricks by using Event Hubs and perform sentiment analysis for steam data by making use of Azure Databricks. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Databricks
DP-203 - Data Engineering on Microsoft Azure: Databricks
- discover the key concepts covered in this course
- describe the features of Azure Databricks
- describe the features and concepts of Azure Databricks clusters
- capture stream data using Event Hub
- describe notebooks and how they can be used with Azure Databricks
- create, open, use, and delete notebooks
- describe the features and concepts of Azure Databricks jobs
- create, open, use, and delete Azure Databricks jobs
- describe the concept of autoscaling local storage when configuring clusters
- describe how to configure checkpoints and watermarking during stream processing
- summarize the key concepts covered in this course
When working with big data, there needs to be a mechanism to process and transform this data quickly and efficiently. Azure Databricks is a service that provides the latest version of Apache Spark, which provides functionality for machine learning and data warehousing. In this course, you'll learn about the features of Azure Databricks such as clusters, notebooks, and jobs. Next, you'll learn about autoscaling local storage when configuring clusters. Next, you'll explore how to create, manage, and configure Azure Databricks clusters, as well as how to create, open, use, and delete notebooks. Finally, you'll learn how to create, open, use, and delete jobs. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Designing Data Storage Structures
DP-203 - Data Engineering on Microsoft Azure: Designing Data Storage Structures
- discover the key concepts covered in this course
- describe key considerations for designing a data lake
- identify and evaluate criteria for selecting a file format for big data applications
- recognize the defining characteristics of the supported file formats in Azure Data Lake
- describe steps for efficient read operations for a table storage service
- describe the dynamic data pruning feature in Databricks at the file and partition level
- recognize an efficient folder structure design
- define the zones within a data lake for organizing data distribution
- describe the data access tiers in Azure Blob storage and how data can be moved between them for efficient and cost-effective storage
- describe the steps to archive data in an Azure Blob storage container, rehydrate blob data, and automate access tiers using life cycle management
- summarize the key concepts covered in this course
Planning the structure for data storage is integral to performance in big data operations. In this course, you'll learn about key considerations for data lakes and how to determine which file type and file format are the most appropriate for your use case. Then, you'll explore how to define how to design table storage for efficient querying and how data pruning can remove unnecessary data to accelerate transactions. You'll examine folder structures and data lake zones for organizing data effectively. Finally, you'll learn how to define storage tiers and how to manage the life cycle of data. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Designing the Serving Layer
DP-203 - Data Engineering on Microsoft Azure: Designing the Serving Layer
- discover the key concepts covered in this course
- recognize the concepts of dimensional data modeling
- describe multidimensional data modeling dimensional hierarchies
- describe slowly changing dimensions used to capture changing data over time, as well as the various types and their applications
- describe temporal databases and steps for designing a database for temporalness
- describe the differences between the star and snowflake schemes for data modeling
- recognize the rules and best practices to follow when designing a star schema
- implement incremental data loading using Azure Data Factory
- select the appropriate technology for analytical data storage
- describe the options for storing metadata external to Azure Synapse Analytics and Azure Databricks
- summarize the key concepts covered in this course
The serving layer is where data is stored for consumption by processing services. In this course, you'll explore dimensional data modeling and hierarchies. You'll learn how to define slowly changing dimensions and temporal design within databases. Then, you'll learn about the differences between the star and snowflake schemas as well as how to design a star schema. Next, you'll examine incremental data loading for stream processing and the options for analytical data stores. Finally, you'll learn about options for creating metastores for use by Azure Databricks and Azure Synapse Analytics. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Logical Data Structures
DP-203 - Data Engineering on Microsoft Azure: Logical Data Structures
- discover the key concepts covered in this course
- define key concepts for maturing data lake storage structures
- describe how system-versioned temporal tables in are used for point-in-time analysis
- create and manage system-versioned temporal tables in an Azure SQL Database
- describe the different types of slowly changing dimensions
- build a slowly changing dimension type 1 deployment
- build a slowly changing dimension type 2 deployment
- define an effective logical file and folder structure for efficient data ingestion and manipulation
- use PolyBase to build an external table
- describe best practices for accelerating queries against data in Azure Data Lake Storage Gen2
- summarize the key concepts covered in this course
Logical data structures, also called entity-relationship models, are models used to define a high-level model of data and the relationships contained within. In this course, you'll learn about the stages of data lake maturity. You'll explore temporal database tables and how to manage them. You'll also learn how to define slowly changing dimensions and how to implement them. You'll then move on to explore logical file and folder structures for data ingestion. You'll discover how PolyBase can be used to connect to external tables. Finally, you'll explore the best practices for accelerating queries. This course is one in a collection that prepares learners for the Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Physical Data Storage Structures
DP-203 - Data Engineering on Microsoft Azure: Physical Data Storage Structures
- discover the key concepts covered in this course
- define considerations for implementing data compression technologies at the database and file level
- create a data partition in a SQL database
- describe how a shard map manager is used by an application to connect to the required Azure SQL Databases
- describe the key concepts for designing tables in an Azure Synapse Analytics dedicated SQL pool
- deploy Azure SQL Database geo-replication
- describe the options for redundancy in Azure Blob storage
- determine the appropriate distribution scheme for an Azure Synapse Analytics database and build it into the table on creation
- archive data in Azure storage and rehydrate it when necessary
- configure a long-term retention policy for an Azure SQL Database
- summarize the key concepts covered in this course
An effective storage structure is critical to big data implementation success. In this course, you'll explore data compression in databases and file storage. Then, you'll discover how partitioning and sharding are implemented in the database. Next, you'll explore designing tables in an Azure Synapse Analytics dedicated SQL pool, and implement geo-replication for redundancy in both databases and Azure Blob storage. You'll also discover implementing distribution schemes in Azure Synapse Analytics. Finally, you'll discover data archiving and long-term retention policies for Azure Blob storage and Azure SQL Databases. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Securing Data
DP-203 - Data Engineering on Microsoft Azure: Securing Data
- discover the key concepts covered in this course
- describe the tools used by Azure to encrypt data across the platform
- manage Transparent Data Encryption on Azure SQL Databases
- describe how Always Encrypted is used by the database engine to process queries on encrypted data
- enable Always Encrypted on an Azure SQL Database
- encrypt single columns in an Azure SQL Database
- use group membership to control access to Azure SQL Database rows
- use DataFrames in Databricks to perform mixed functions
- configure Advanced Threat Protection and dynamic data masking in an Azure SQL Database, Azure SQL Managed Instance, or Azure Synapse Analytics instance
- utilize immutable storage on Azure Blob storage to store business data in a manner that cannot be erased or modified to meet time-based or legal holds
- summarize the key concepts covered in this course
The final line of defense for protecting against a data breach is securing the data itself. With today's cloud environments, data is often in transit, duplicated, and stored in various data centers around the world, making effective data protection a challenge.
In this course, you'll explore the various methods available for encrypting data stored in SQL databases. You'll examine how to use DataFrames in Databricks, as well as how to implement Advanced Threat Protection and dynamic data masking in Azure databases. Finally, you'll learn how immutable blobs can be used to manage sensitive information.
This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Securing Data Access
DP-203 - Data Engineering on Microsoft Azure: Securing Data Access
- discover the key concepts covered in this course
- recognize how Azure Key Vault can be used to store and manage keys and secrets used by multiple sources
- describe private endpoints used for ensuring data flows only within your private link and service endpoints used to provide secure direct connectivity
- utilize managed virtual networks and managed private endpoints to secure traffic between Azure Synapse Analytics and other Azure resources
- describe how resources and apps can utilize Azure managed identities to securely connect to Azure services
- manage access control lists on Azure Data Lake Storage Gen2
- manage access to resources using Azure role-based access control
- describe how token-based authentication can be utilized to manage authentication to Azure Databricks
- Manage access Azure Databricks workspaces using Azure Databricks Token Authentication
- manage retention policies for temporal tables in Azure SQL Database
- enable auditing on an Azure SQL Database
- summarize the key concepts covered in this course
Securing access to data is a fundamental part of any security strategy. In this course, you'll explore how Azure Key Vault can be used to store and manage keys and secrets for accessing data. You'll discover how to connect to Azure resources through private and service endpoints and managed virtual networks and how to use Azure managed identities for connections between Azure resources. Next, you'll learn how to utilize access control lists and Azure role-based access control to provide only the necessary permissions to users to access your data. You'll also learn how token-based authentication works in Azure Databricks. Finally, you'll examine how to audit an Azure SQL Database to monitor for unauthorized access. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Storage Accounts
DP-203 - Data Engineering on Microsoft Azure: Storage Accounts
- discover the key concepts covered in this course
- describe the storage capabilities Azure Blob storage provides
- recognize how to architect a blob storage deployment to meet performance and scalability requirements
- utilize the geo-redundancy features in Azure Storage to design highly available applications
- describe the available options and design options for Azure Storage account disaster recovery
- describe the role played by Azure Data Lake Storage Gen2 in managing big data used in analytics scenarios
- recognize the scenarios where Azure Data Lake Storage Gen2 would be applied for big data processing
- effectively plan an Azure Data Lake Gen2 deployment
- apply best practices when designing a solution incorporating Azure Data Lake Storage Gen2
- deploy an Azure Data Lake Gen2 storage account
- summarize the key concepts covered in this course
Microsoft Azure Blob storage is a container system for storing a variety of file types. In this course, you'll learn about the capabilities of blob storage and how to architect a deployment for optimal performance and scalability. Then, you'll explore the options for redundancy and how to recover from disasters. You'll discover where Azure Data Lake Storage Gen2, a feature set within blob storage, can be utilized for big data operations. You'll also learn how to plan for a data lake deployment, examine best practices, and explore how to deploy a Data Lake Gen2 account on Azure. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Stream Analytics
DP-203 - Data Engineering on Microsoft Azure: Stream Analytics
- discover the key concepts covered in this course
- describe how Azure Stream Analytics is used to process streaming data
- describe the available inputs that can be used with Azure Stream Analytics
- describe the available outputs that can be used with Azure Stream Analytics
- describe how to create user-defined functions for Azure Stream Analytics
- process data by using Spark structured streaming
- describe how to use Stream Analytics windowing functions
- set up and process time series data
- create a Stream Analytics job by using the Azure portal
- summarize the key concepts covered in this course
Azure Stream Analytics is a complex, serverless, and highly scalable processing engine that can be used to perform real-time analytics on multiple data streams. Alerts can be configured to forecast trends, trigger workflows, and detect irregularities. In this course, you'll learn to use Azure Stream Analytics to process streaming data. You'll examine how to implement security, create user-defined functions, and optimize jobs for Azure Stream Analytics, as well as explore the available inputs and outputs. Finally, you'll learn how to create an Azure Stream Analytics job, create an Azure Stream Analytics dedicated cluster, run Azure Functions from Azure Stream Analytics jobs, and monitor Azure Stream Analytics jobs. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: Synapse Analytics
DP-203 - Data Engineering on Microsoft Azure: Synapse Analytics
- discover the key concepts covered in this course
- describe the Azure Synapse Analytics platform and how it is used for data warehousing and big data analytics
- integrate pipelines using Synapse Studio
- visualize data using a Power BI workspace that is linked to an Azure Synapse Workspace
- monitor a Synapse Workspace
- recognize features of the Synapse Knowledge Center
- describe the features of Azure Synapse Analytics and PolyBase
- summarize the key concepts covered in this course
Azure Synapse Analytics is an analytics service that provides functionality for data integration, enterprise data warehousing, and big data analytics. Services provided include ingesting, exploring, preparing, managing, and serving data for BI and machine learning needs. In this course, you'll learn about the Azure Synapse Analytics platform and how it is used for data warehousing and big data analytics. Next, you'll learn how to create a Synapse Workspace, a dedicated SQL pool, and a serverless Apache Spark pool. You'll move on to explore how to analyze data using a dedicated SQL pool, Apache Spark for Azure Synapse, Serverless SQL Pools, and a Spark database, as well as how to analyze data that is in a storage account. You'll learn how to integrate pipelines using Synapse Studio, visualize data using a Power BI workspace, and monitor a Synapse Workspace. Finally, you'll learn about the Synapse Knowledge Center and the features of Azure Synapse Analytics and PolyBase. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
DP-203 - Data Engineering on Microsoft Azure: The Serving Layer
DP-203 - Data Engineering on Microsoft Azure: The Serving Layer
- discover the key concepts covered in this course
- compare the traditional relational schema model with the modern data lake star schema
- describe the steps required for modeling a star schema
- define what a Parquet file is and how it's structured
- design and query a dimensional hierarchy
- deploy an Azure Synapse Analytics workspace
- deploy an Azure Synapse Analytics Dedicated SQL pool
- deploy an Azure Synapse Analytics Apache Spark cluster and attach a notebook
- describe how Azure Synapse Analytics shares databases between Spark clusters
- deploy a shared database and a shared metadata table
- summarize the key concepts covered in this course
Implementing an effective serving layer requires consideration for the design, methods, and tools. In this course, you'll learn how traditional relational models can be replaced by the star schema and how to design a star schema. Then, you'll explore the purpose and structure of Parquet files used by Azure Databricks. You'll learn how to design and query a dimensional hierarchy. You'll move on to examine Azure Synapse Analytics, including deploying dedicated SQL pools and Apache Spark clusters. Finally you'll learn how to create shared metadata tables between Spark clusters. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.