Enterprise Database Systems
Data Modeling for Hadoop
Introduction to Data Modeling in Hadoop
Introduction to Hadoop

Introduction to Data Modeling in Hadoop

Course Number:
df_dmhp_a02_it_enus
Lesson Objectives

Introduction to Data Modeling in Hadoop

  • start the course
  • define data management
  • recognize important data modeling concepts in Hadoop
  • identify important issues for storing data in Hadoop
  • recognize important considerations when designing HDFS schema
  • recognize important points when designing HDFS schema
  • identify basic concepts of data movement in Hadoop
  • list important factors that need to be considered for importing data into Hadoop
  • identify tools and methods for moving data into Hadoop
  • recognize characteristics of a data stream
  • define how data lakes enable batch processing
  • define data security management and its major domains
  • define Kerberos
  • define basics of authentication in Hadoop using Kerberos
  • identify central issues in processing and management of big data
  • identify important points in Hadoop data modeling

Overview/Description
This course covers various data genres and management tools, the reasons behind the evolving plethora of new big data platforms from the perspective of big data management systems, and analytical tools.

Target Audience
Individuals who are new to big data, Hadoop, and data modeling, and wish to understand key concepts and features of Hadoop and its tools

Introduction to Hadoop

Course Number:
df_dmhp_a01_it_enus
Lesson Objectives

Introduction to Hadoop

  • start the course
  • recognize what Big Data is, sources and types of data, evolution and characteristics of Big Data, and use cases of Big Data
  • identify Big Data infrastructure issues, and explain benefits of Hadoop
  • recognize basics of Hadoop, history, milestones, and core components
  • set up a virtual machine
  • install Linux on a virtual machine
  • recognize basic and most useful UNIX commands
  • identify Hadoop components
  • define HDFS components
  • recognize how to read and write in HDFS
  • use HDFS
  • recognize basics of YARN
  • define basics of MapReduce
  • identify how MapReduce processes information
  • use code that runs on Hadoop
  • define Pig, HIVE, and HBase
  • define Sqoop, Flume, Mahout, and Oozie
  • recognize storing and modeling data in Hadoop
  • identify available commercial distributions for Hadoop
  • recognize Spark and its benefits over traditional MapReduce
  • filter information in Hadoop

Overview/Description
Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This course will introduce Hadoop, and its key tools and their applications.

Target Audience
Individuals who are new to big data, Hadoop, and data modeling, and wish to understand key concepts and features of Hadoop and its tools

Close Chat Live