CBT Campus' Online Skills Training Courses.

IT Skills

Enterprise Database Systems

Big Data

Data Modeling for Hadoop

df_dmhp_a02_it_enus

df_dmhp_a01_it_enus

Introduction to Data Modeling in Hadoop

Course Number:
df_dmhp_a02_it_enus

Expected Duration (hours)
1.1

Lesson Objectives

Introduction to Data Modeling in Hadoop

start the course
define data management
recognize important data modeling concepts in Hadoop
identify important issues for storing data in Hadoop
recognize important considerations when designing HDFS schema
recognize important points when designing HDFS schema
identify basic concepts of data movement in Hadoop
list important factors that need to be considered for importing data into Hadoop
identify tools and methods for moving data into Hadoop
recognize characteristics of a data stream
define how data lakes enable batch processing
define data security management and its major domains
define Kerberos
define basics of authentication in Hadoop using Kerberos
identify central issues in processing and management of big data
identify important points in Hadoop data modeling

Overview/Description
This course covers various data genres and management tools, the reasons behind the evolving plethora of new big data platforms from the perspective of big data management systems, and analytical tools.

Target Audience
Individuals who are new to big data, Hadoop, and data modeling, and wish to understand key concepts and features of Hadoop and its tools

Introduction to Hadoop

Course Number:
df_dmhp_a01_it_enus

Expected Duration (hours)
1.5

Lesson Objectives

Introduction to Hadoop

start the course
recognize what Big Data is, sources and types of data, evolution and characteristics of Big Data, and use cases of Big Data
identify Big Data infrastructure issues, and explain benefits of Hadoop
recognize basics of Hadoop, history, milestones, and core components
set up a virtual machine
install Linux on a virtual machine
recognize basic and most useful UNIX commands
identify Hadoop components
define HDFS components
recognize how to read and write in HDFS
use HDFS
recognize basics of YARN
define basics of MapReduce
identify how MapReduce processes information
use code that runs on Hadoop
define Pig, HIVE, and HBase
define Sqoop, Flume, Mahout, and Oozie
recognize storing and modeling data in Hadoop
identify available commercial distributions for Hadoop
recognize Spark and its benefits over traditional MapReduce
filter information in Hadoop

Overview/Description
Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This course will introduce Hadoop, and its key tools and their applications.

Target Audience
Individuals who are new to big data, Hadoop, and data modeling, and wish to understand key concepts and features of Hadoop and its tools