Oracle’s newly released course, Oracle Big Data Fundamentals, is designed to enable you to understand:
When such technologies should be used within the larger scope of a Big Data project and
How they may be used together to provide the highest ROI to the enterprise.
A five modules training course
This re-designed, 5-day course provides lessons grouped in five Modules. The course is designed to convey Oracle’s 3-phase strategy on how to work with, and leverage, Big Data as an Oracle customer. These phases include:
Data Acquisition and Storage (Phase 1)
Data Access and Processing (Phase 2)
Data Unification and Analysis (Phase 3)
Each of these phases includes coverage of Hadoop’s core and ecosystem technologies, and Oracle Big Data technologies / products. The three phases comprise the middle three of the five course modules.
First, you learn about how Oracle’s Information Management System incorporates a holistic approach to integrating Big Data (unstructured and semi-structured) with relational data (structured), in order to enable discovery of more value that is embedded inside the big data pool.
Then, the 3-phase design mentioned previously covers a combination of Big Data technologies – including Hadoop (core components and ecosystem), NoSQL and Oracle – within the context of how big data is acquired, processed, and analyzed.
Data Acquisition and Storage
– Introduction to HDFS (Hadoop Distributed File System)
– The CLI, FuseDFS, and Flume
– Oracle NoSQL Database (topics include using and administering)
Data Access and Processing
– Introduction to MapReduce
– Apache Hive and Pig
– Cloudera Impala
– Oracle XQuery for Hadoop
– Apache Spark
Data Unification and Analysis
– Apache Scoop
– Oracle Loader for Hadoop
– Copy to BDA (an Oracle technology)
– Oracle SQL Connector for Hadoop
– Oracle Data Integrator and Oracle GoldenGate
– Oracle Big Data SQL
– Oracle Advance Analytics (Oracle Data Miner and Oracle R Enterprise)
– Oracle Big Data Discovery
To fully understand Big Data, it is critical to understand Hadoop and both its core and ecosystem components. The Oracle Big Data Fundamental course presents this critical information in easy to understand diagrams and provides hands-on learning.
The last module of the course introduces the Oracle Big Data Appliance (BDA) engineered system which provides many benefits over a “Do-it-yourself Hadoop”. This module provides a high level overview of some of the features and configurations of the Oracle BDA along with the tools that you can use to manage the BDA.
Finally, the course describes how to secure data on the Big Data Appliance using strong authentication with Kerberos.
Benefit from a hands-on, case-study approach
The course includes many hands-on exercises throughout and uses the Oracle Big Data Lite Virtual Machine version 4.0.1 for the practice sessions. Oracle Big Data Lite Virtual Machine provides an integrated environment to help you get started with the Oracle Big Data platform. Many Oracle Big Data platform components have been installed and configured – allowing you to begin using the system right away.
Hadoop is an open source project managed by the Apache Software Foundation. Hadoop is a fundamental building block both in capturing and processing big data. At a high level, Hadoop is designed to facilitate parallel processing of massive data volumes across numerous servers. Servers can be added or removed from the cluster dynamically because Hadoop is designed to be “self-healing.” In other words, Hadoop is able to detect changes, including failures, and adjust to those changes and continue to operate without interruption.
Apache Hadoop contains several core components, including:
Hadoop Distributed File System (HDFS) is a distributed file system for storing information and sits on top of the OS.
MapReduce and Spark are parallel processing frameworks that operate on local data whenever possible. They abstract the complexity of parallel processing. This enables developers to focus more on the business logic rather than on the processing framework.
Hadoop enables parallel processing of large data sets because the distributed file system has been designed to work in conjunction with the distributed processing framework. This allows for clusters with thousands of nodes.