Get $1 credit for every $25 spent!

The Big Data Bundle

Ending In:
Add to Cart - $45
Add to Cart ($45)
$781
94% off
wishlist
(194)
Courses
9
Lessons
418
Enrolled
1,937

What's Included

Video icon Video Overview

Product Details

Access
Lifetime
Content
15 hours
Lessons
86

From 0 to 1 : Hive for Big Data Processing

Connect the Dots Between SQL & Hive to Enhance Your Big Data Processing Skills

By LoonyCorn | in Online Courses

Hive is a Big Data processing tool that helps you leverage the power of distributed computing and Hadoop for analytical processing. Its interface is somewhat similar to SQL, but with some key differences. This course is an end-to-end guide to using Hive and connecting the dots to SQL. It's perfect for both professional and aspiring data analysts and engineers alike. Don't know SQL? No problem, there's a primer included in this course!

  • Access 86 lectures & 15 hours of content 24/7
  • Write complex analytical queries on data in Hive & uncover insights
  • Leverage ideas of partitioning & bucketing to optimize queries in Hive
  • Customize Hive w/ user defined functions in Java & Python
  • Understand what goes on under the hood of Hive w/ HDFS & MapReduce
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but knowledge of SQL and Java would be helpful

Compatibility

  • Internet required

Course Outline

  • You, Us & This Course
    • You, Us & This Course (2:02)
  • Introducing Hive
    • Hive: An Open-Source Data Warehouse (12:59)
    • Hive and Hadoop (9:19)
    • Hive vs Traditional Relational DBMS (13:52)
    • HiveQL and SQL (7:20)
  • Hadoop and Hive Install
    • Hadoop Install Modes (8:32)
    • Setting up a Virtual Linux Instance - For Windows Users (13:50)
    • Hadoop Install Step 1 : Standalone Mode (9:33)
    • Hadoop Install Step 2 : Pseudo-Distributed Mode (14:25)
    • Hive install (12:05)
    • Code-Along: Getting started (6:24)
  • Hadoop and HDFS Overview
    • What is Hadoop? (7:25)
    • HDFS or the Hadoop Distributed File System (11:01)
  • Hive Basics
    • Primitive Datatypes (17:07)
    • CollectionsArraysMaps (9:28)
    • Structs and Unions (5:57)
    • Create Table (13:15)
    • Insert Into Table (12:05)
    • Insert into Table 2 (6:51)
    • Alter Table (7:22)
    • HDFS (9:25)
    • HDFS CLI - Interacting with HDFS (10:59)
    • Code-Along: Create Table (9:54)
    • Code-Along : Hive CLI (3:06)
  • Built-in Functions
    • Three types of Hive functions (6:45)
    • The Case-When statement, the Size function, the Cast function (10:09)
    • The Explode function (13:07)
    • Code-Along : Hive Built - in functions (4:28)
  • Sub-Queries
    • Quirky Sub-Queries (7:13)
    • More on subqueries: Exists and In (15:13)
    • Inserting via subqueries (5:23)
    • Code-Along : Use Subqueries to work with Collection Datatypes (5:57)
    • Views (12:18)
  • Partitioning
    • Indices (6:40)
    • Partitioning Introduced (6:36)
    • The Rationale for Partitioning (6:16)
    • How Tables are Partitioned (9:52)
    • Using Partitioned Tables (5:27)
    • Dynamic Partitioning: Inserting data into partitioned tables (12:44)
    • Code-Along : Partitioning (4:03)
  • Bucketing
    • Introducing Bucketing (11:56)
    • The Advantages of Bucketing (4:54)
    • How Tables are Bucketed
    • Using Bucketed Tables (7:22)
    • Sampling (11:13)
  • Windowing
    • Windowing Introduced (12:59)
    • Windowing - A Simple Example: Cumulative Sum (9:39)
    • Windowing - A More Involved Example: Partitioning (11:55)
    • Windowing - Special Aggregation Functions (15:08)
  • Understanding MapReduce
    • The basic philosophy underlying MapReduce (8:49)
    • MapReduce - Visualized and Explained (9:03)
    • MapReduce - Digging a little deeper at every step (10:21)
  • MapReduce logic for queries: Behind the scenes
    • MapReduce Overview: Basic Select-From-Where (11:33)
    • MapReduce Overview: Group-By and Having (9:12)
    • MapReduce Overview: Joins (14:17)
  • Join Optimizations in Hive
    • Improving Join performance with tables of different sizes (13:12)
    • The Where clause in Joins (4:52)
    • The Left Semi Join (12:11)
    • Map Side Joins: The Inner Join (9:41)
    • Map Side Joins: The Left, Right and Full Outer Joins (11:36)
    • Map Side Joins: The Bucketed Map Join and the Sorted Merge Join (7:52)
  • Custom Functions in Python
    • Custom functions in Python (10:40)
    • Code-Along : Custom Function in Python (5:45)
  • Custom functions in Java
    • Introducing UDFs - you're not limited by what Hive offers (4:38)
    • The Simple UDF: The standard function for primitive types (7:03)
    • The Simple UDF: Java implementation for replacetext() (8:34)
    • Generic UDFs, the Object Inspector and DeferredObjects (13:50)
    • The Generic UDF: Java implementation for containsstring() (9:11)
    • The UDAF: Custom aggregate functions can get pretty complex (14:09)
    • The UDAF: Java implementation for max() (9:21)
    • The UDAF: Java implementation for Standard Deviation (10:47)
    • The Generic UDTF: Custom table generating functions (7:38)
    • The Generic UDTF: Java implementation for namesplit() (10:21)
  • SQL Primer - Select Statemets
    • Select Statements (11:46)
    • Select Statements 2 (14:11)
    • Operator Functions (6:55)
  • SQL Primer - Group By, Order By and Having
    • Aggregation Operators Introduced (18:15)
    • The Group By Clause (17:19)
    • More Group By Examples (19:46)
    • Order By (16:15)
    • Having (19:52)
  • SQL Primer - Joins
    • Introduction to SQL Joins (9:54)
    • Cross Joins aka Cartesian Joins (17:02)
    • Inner Joins (19:52)
    • Left Outer Joins (15:31)
    • RIght, Full Outer Joins, Natural Joins, Self Joins (16:08)

View Full Curriculum


Access
Lifetime
Content
13 hours
Lessons
71

Learn By Example: Hadoop & MapReduce for Big Data Problems

Discover Mass Data Processing Methods by Using the Leading Data Frameworks

By Loonycorn | in Online Courses

Big Data sounds pretty daunting doesn't it? Well, this course aims to make it a lot simpler for you. Using Hadoop and MapReduce, you'll learn how to process and manage enormous amounts of data efficiently. Any company that collects mass amounts of data, from startups to Fortune 500, need people fluent in Hadoop and MapReduce, making this course a must for anybody interested in data science.

  • Access 71 lectures & 13 hours of content 24/7
  • Set up your own Hadoop cluster using virtual machines (VMs) & the Cloud
  • Understand HDFS, MapReduce & YARN & their interaction
  • Use MapReduce to recommend friends in a social network, build search engines & generate bigrams
  • Chain multiple MapReduce jobs together
  • Write your own customized partitioner
  • Learn to globally sort a large amount of data by sampling input files
Loonycorn is comprised of four individuals--Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh--who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime access
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels

Compatibility

  • Internet required

Course Outline

  • Introduction
    • You, this course and Us (1:52)
  • Why is Big Data a Big Deal
    • The Big Data Paradigm (14:20)
    • Serial vs Distributed Computing (8:37)
    • What is Hadoop? (7:25)
    • HDFS or the Hadoop Distributed File System (11:01)
    • MapReduce Introduced (11:39)
    • YARN or Yet Another Resource Negotiator (4:00)
  • Installing Hadoop in a Local Environment
    • Hadoop Install Modes (8:32)
    • Setup a Virtual Linux Instance (For Windows users) (15:31)
    • Hadoop Standalone mode Install (9:33)
    • Hadoop Pseudo-Distributed mode Install (14:25)
  • The MapReduce "Hello World"
    • The basic philosophy underlying MapReduce (8:49)
    • MapReduce - Visualized And Explained (9:03)
    • MapReduce - Digging a little deeper at every step (10:21)
    • "Hello World" in MapReduce (10:29)
    • The Mapper (9:48)
    • The Reducer (7:46)
    • The Job (12:28)
  • Run a MapReduce Job
    • Get comfortable with HDFS (10:59)
    • Run your first MapReduce Job (14:30)
  • Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
    • Parallelize the reduce phase - use the Combiner (14:40)
    • Not all Reducers are Combiners (14:31)
    • How many mappers and reducers does your MapReduce have? (8:23)
    • Parallelizing reduce using Shuffle And Sort (14:55)
    • MapReduce is not limited to the Java language - Introducing the Streaming API (5:05)
    • Python for MapReduce (12:19)
  • HDFS and Yarn
    • HDFS - Protecting against data loss using replication (15:32)
    • HDFS - Name nodes and why they're critical (6:48)
    • HDFS - Checkpointing to backup name node information (11:10)
    • Yarn - Basic components (8:33)
    • Yarn - Submitting a job to Yarn (13:10)
    • Yarn - Plug in scheduling policies (14:21)
    • Yarn - Configure the scheduler (12:26)
  • Setting up a Hadoop Cluster
    • Manually configuring a Hadoop cluster (Linux VMs) (13:50)
    • Getting started with Amazon Web Servicies (6:25)
    • Start a Hadoop Cluster with Cloudera Manager on AWS (13:04)
  • MapReduce Customizations For Finer Grained Control
    • Setting up your MapReduce to accept command line arguments (13:47)
    • The Tool, ToolRunner and GenericOptionsParser (12:36)
    • Configuring properties of the Job object (10:41)
    • Customizing the Partitioner, Sort Comparator, and Group Comparator (15:16)
  • The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
    • The heart of search engines - The Inverted Index (14:41)
    • Generating the inverted index using MapReduce (10:25)
    • Custom data types for keys - The Writable Interface (10:23)
    • Represent a Bigram using a WritableComparable (13:13)
    • MapReduce to count the Bigrams in input text (8:26)
    • Test your MapReduce job using MRUnit (13:41)
  • Input and Output Formats and Customized Partitioning
    • Introducing the File Input Format (12:48)
    • Text And Sequence File Formats (10:21)
    • Data partitioning using a custom partitioner (7:11)
    • Make the custom partitioner real in code (10:25)
    • Total Order Partitioning (10:10)
    • Input Sampling, Distribution, Partitioning and configuring these (9:04)
    • Secondary Sort (14:34)
  • Recommendation Systems using Collaborative Filtering
    • Introduction to Collaborative Filtering (7:25)
    • Friend recommendations using chained MR jobs (17:15)
    • Get common friends for every pair of users - the first MapReduce (14:50)
    • Top 10 friend recommendation for every user - the second MapReduce (13:46)
  • Hadoop as a Database
    • Structured data in Hadoop (14:08)
    • Running an SQL Select with MapReduce (15:31)
    • Running an SQL Group By with MapReduce (14:02)
    • A MapReduce Join - The Map Side (14:20)
    • A MapReduce Join - The Reduce Side (13:08)
    • A MapReduce Join - Sorting and Partitioning (8:49)
    • A MapReduce Join - Putting it all together (13:46)
  • K-Means Clustering
    • What is K-Means Clustering? (14:04)
    • A MapReduce job for K-Means Clustering (16:33)
    • K-Means Clustering - Measuring the distance between points (13:52)
    • K-Means Clustering - Custom Writables for Input/Output (8:26)
    • K-Means Clustering - Configuring the Job (10:50)
    • K-Means Clustering - The Mapper and Reducer (11:23)
    • K-Means Clustering : The Iterative MapReduce Job (3:40)

View Full Curriculum


Access
Lifetime
Content
8 hours
Lessons
52

From 0 to 1 : Spark for Data Science in Python

Make Your Data Fly Using Spark for Analytics, Machine Learning, & Data Science

By LoonyCorn | in Online Courses

Analysts and data scientists typically have to work with several systems to effectively manage mass sets of data. Spark, on the other hand, provides you a single engine to explore and work with large amounts of data, run machine learning algorithms, and perform many other functions in a single interactive environment. This course's focus on new and innovating technologies in data science and machine learning makes it an excellent one for anyone who wants to work in the lucrative, growing field of Big Data.

  • Access 52 lectures & 8 hours of content 24/7
  • Use Spark for a variety of analytics & machine learning tasks
  • Implement complex algorithms like PageRank & Music Recommendations
  • Work w/ a variety of datasets from airline delays to Twitter, web graphs, & product ratings
  • Employ all the different features & libraries of Spark, like RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming & GraphX
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but some knowledge of Python and Java are assumed

Compatibility

  • Internet required

Course Outline

  • You, This Course and Us
    • You, This Course and Us (2:15)
  • Introduction to Spark
    • What does Donald Rumsfeld have to do with data analysis? (8:45)
    • Why is Spark so cool? (12:23)
    • An introduction to RDDs - Resilient Distributed Datasets (9:39)
    • Built-in libraries for Spark (15:37)
    • Installing Spark (6:42)
    • The PySpark Shell (4:51)
    • Transformations and Actions (13:33)
    • See it in Action : Munging Airlines Data with PySpark - I (10:13)
  • Resilient Distributed Datasets
    • RDD Characteristics: Partitions and Immutability (12:35)
    • RDD Characteristics: Lineage, RDDs know where they came from (6:06)
    • What can you do with RDDs? (11:09)
    • Create your first RDD from a file (16:11)
    • Average distance travelled by a flight using map() and reduce() operations (5:50)
    • Get delayed flights using filter(), cache data using persist() (5:24)
    • Average flight delay in one-step using aggregate() (15:10)
    • Frequency histogram of delays using countByValue() (3:26)
    • See it in Action : Analyzing Airlines Data with PySpark - II (6:25)
  • Advanced RDDs: Pair Resilient Distributed Datasets
    • Special Transformations and Actions (14:45)
    • Average delay per airport, use reduceByKey(), mapValues() and join() (18:11)
    • Average delay per airport in one step using combineByKey() (11:53)
    • Get the top airports by delay using sortBy() (4:34)
    • Lookup airport descriptions using lookup(), collectAsMap(), broadcast() (14:03)
    • See it in Action : Analyzing Airlines Data with PySpark - III (4:58)
  • Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes
    • Get information from individual processing nodes using accumulators (13:35)
    • See it in Action : Using an Accumulator variable (2:41)
    • Long running programs using spark-submit (5:58)
    • See it in Action : Running a Python script with Spark-Submit (3:58)
    • Behind the scenes: What happens when a Spark script runs? (14:30)
    • Running MapReduce operations (13:44)
    • See it in Action : MapReduce with Spark (2:05)
  • Java and Spark
    • The Java API and Function objects (15:59)
    • Pair RDDs in Java (4:49)
    • Running Java code (3:49)
    • Installing Maven (2:20)
    • See it in Action : Running a Spark Job with Java (5:09)
  • PageRank: Ranking Search Results
    • What is PageRank? (16:44)
    • The PageRank algorithm (6:15)
    • Implement PageRank in Spark (12:01)
    • Join optimization in PageRank using Custom Partitioning (7:27)
    • See it Action : The PageRank algorithm using Spark (3:46)
  • Spark SQL
    • Dataframes: RDDs + Tables (16:05)
    • See it in Action : Dataframes and Spark SQL (4:50)
  • MLlib in Spark: Build a recommendations engine
    • Collaborative filtering algorithms (12:19)
    • Latent Factor Analysis with the Alternating Least Squares method (11:39)
    • Music recommendations using the Audioscrobbler dataset (7:51)
    • Implement code in Spark using MLlib (16:05)
  • Spark Streaming
    • Introduction to streaming (9:55)
    • Implement stream processing in Spark using Dstreams (10:54)
    • Stateful transformations using sliding windows (9:26)
    • See it in Action : Spark Streaming (4:17)
  • Graph Libraries
    • The Marvel social network using Graphs (18:01)

View Full Curriculum


Access
Lifetime
Content
8.50 hours
Lessons
51

Scalable Programming with Scala & Spark

Get Rich Using Scala & Spark for Data Analysis, Machine Learning & Analytics

By LoonyCorn | in Online Courses

The functional programming nature and the availability of a REPL environment make Scala particularly well suited for a distributed computing framework like Spark. Using these two technologies in tandem can allow you to effectively analyze and explore data in an interactive environment with extremely fast feedback. This course will teach you how to best combine Spark and Scala, making it perfect for aspiring data analysts and Big Data engineers.

  • Access 51 lectures & 8.5 hours of content 24/7
  • Use Spark for a variety of analytics & machine learning tasks
  • Understand functional programming constructs in Scala
  • Implement complex algorithms like PageRank & Music Recommendations
  • Work w/ a variety of datasets from airline delays to Twitter, web graphs, & Product Ratings
  • Use the different features & libraries of Spark, like RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming, & GraphX
  • Write code in Scala REPL environments & build Scala applications w/ an IDE
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but some knowledge of Java or C++ is assumed

Compatibility

  • Internet required

Course Outline

  • You, This Course and Us
    • You, This Course and Us (2:16)
  • Introducing Scala
    • Scala - A "better Java"? (10:13)
    • Installing Scala and Hello World (9:43)
    • How do Classes work in Scala? (11:02)
    • Classes in Scala - continued (15:50)
    • Functions are different from Methods (7:31)
    • Collections in Scala (10:12)
    • Map, Flatmap - The Functional way of looping (11:36)
    • First Class Functions revisited (8:46)
    • Partially Applied Functions (7:31)
    • Closures (8:07)
    • Currying (10:34)
  • Introduction to Spark
    • What does Donald Rumsfeld have to do with data analysis? (8:45)
    • Why is Spark so cool? (12:23)
    • An introduction to RDDs - Resilient Distributed Datasets (9:39)
    • Built-in libraries for Spark (15:37)
    • Installing Spark (11:44)
    • The Spark Shell (6:55)
    • See it in Action : Munging Airlines Data with Spark (3:44)
    • Transformations and Actions (17:06)
  • Resilient Distributed Datasets
    • RDD Characteristics: Partitions and Immutability (12:35)
    • RDD Characteristics: Lineage, RDDs know where they came from (6:06)
    • What can you do with RDDs? (11:09)
    • Create your first RDD from a file (14:54)
    • Average distance travelled by a flight using map() and reduce() operations (6:59)
    • Get delayed flights using filter(), cache data using persist() (6:11)
    • Average flight delay in one-step using aggregate() (12:21)
    • Frequency histogram of delays using countByValue() (2:10)
  • Advanced RDDs: Pair Resilient Distributed Datasets
    • Special Transformations and Actions (14:45)
    • Average delay per airport, use reduceByKey(), mapValues() and join() (13:35)
    • Average delay per airport in one step using combineByKey() (8:23)
    • Get the top airports by delay using sortBy() (2:51)
    • Lookup airport descriptions using lookup(), collectAsMap(), broadcast() (10:57)
  • Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes
    • Get information from individual processing nodes using accumulators (9:25)
    • Long running programs using spark-submit (7:11)
    • Spark-Submit with Scala - A demo (6:10)
    • Behind the scenes: What happens when a Spark script runs? (14:30)
    • Running MapReduce operations (10:53)
  • PageRank: Ranking Search Results
    • What is PageRank? (16:44)
    • The PageRank algorithm (6:15)
    • Implement PageRank in Spark (9:45)
    • Join optimization in PageRank using Custom Partitioning (6:28)
  • Spark SQL
    • Dataframes: RDDs + Tables (15:48)
  • MLlib in Spark: Build a recommendations engine
    • Collaborative filtering algorithms (12:19)
    • Latent Factor Analysis with the Alternating Least Squares method (11:39)
    • Music recommendations using the Audioscrobbler dataset (5:38)
    • Implement code in Spark using MLlib (14:45)
  • Spark Streaming
    • Introduction to streaming (9:55)
    • Implement stream processing in Spark using Dstreams (9:19)
    • Stateful transformations using sliding windows (8:17)
  • Graph Libraries
    • The Marvel social network using Graphs (14:30)

View Full Curriculum


Access
Lifetime
Content
4.50 hours
Lessons
41

Learn by Example: HBase - The Hadoop Database

Create More Flexible Databases by Mastering HBase

By LoonyCorn | in Online Courses

For Big Data engineers and data analysts, HBase is an extremely effective databasing tool for organizing and manage massive data sets. HBase allows an increased level of flexibility, providing column oriented storage, no fixed schema and low latency to accommodate the dynamically changing needs of applications. With the 25 examples contained in this course, you'll get a complete grasp of HBase that you can leverage in interviews for Big Data positions.

  • Access 41 lectures & 4.5 hours of content 24/7
  • Set up a database for your application using HBase
  • Integrate HBase w/ MapReduce for data processing tasks
  • Create tables, insert, read & delete data from HBase
  • Get a complete understanding of HBase & its role in the Hadoop ecosystem
  • Explore CRUD operations in the shell, & with the Java API
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but some knowledge Java is assumed

Compatibility

  • Internet required

Course Outline

  • You, This Course and Us
    • You, This Course and Us (1:50)
  • Introduction to HBase
    • The problem with distributed computing (7:17)
    • Installing HBase (10:57)
    • The Hadoop ecosystem (8:01)
    • The role of HBase in the Hadoop ecosystem (9:42)
    • How is HBase different from RDBMS? (3:10)
    • HBase Data Model (10:44)
    • Introducing CRUD operations (8:32)
    • HBase is different from Hive (4:48)
  • CRUD operations using the HBase Shell
    • Example1 - Creating a table for User Notifications (5:24)
    • Example 2 - Inserting a row (19:52)
    • Example 3 - Updating a row (19:15)
    • Example 4 - Retrieving a row (20:25)
    • Example 5 - Retrieving a range of rows (3:48)
    • Example 6 - Deleting a row (2:11)
    • Example 7 - Deleting a table (2:17)
  • CRUD operations using the Java API
    • Example 8 - Creating a table with HBaseAdmin (6:36)
    • Example 9 - Inserting a row using a Put object (8:33)
    • Example 10 - Inserting a list of Puts (3:30)
    • Example 11 - Retrieving data - Get and Result objects (10:55)
    • Example 12 - A list of Gets (3:34)
    • Example 13 - Deleting a row (2:25)
    • Example 14 - A list of Deletes (2:36)
    • Example 15 - Mix and match with batch operations (6:02)
    • Example 16 - Scanning a range of rows (8:06)
    • Example 17 - Deleting a table (3:51)
  • HBase Architecture
    • HBase Architecture (9:20)
  • Advanced operations - Filters and Counters
    • Example 18 - Filter by Row id - RowFilter (8:56)
    • Example 19 - Filter by column value - SingleColumnValueFilter (5:13)
    • Example 20 - Apply multiple conditions - Filterlist (4:31)
    • Example 21 - Retrieve rows within a time range (2:11)
    • Example 22 - Atomically incrementing a value with Counters (7:31)
  • MapReduce with HBase
    • Example 23 : A MapReduce task to count Notifications by Type (10:24)
    • Example 23 continued: Implementing the MapReduce in Java (13:35)
    • Demo : Running a MapReduce task (2:21)
  • Build a Notification Service
    • Example 24 : Implement a Notification Hierarchy (13:30)
    • Example 25: Implement a Notifications Manager (12:05)
  • Installing Hadoop in a Local Environment
    • Hadoop Install Modes (8:32)
    • Setup a Virtual Linux Instance (For Windows users) (15:31)
    • Hadoop Standalone mode Install (9:33)
    • Hadoop Pseudo-Distributed mode Install (14:25)

View Full Curriculum


Access
Lifetime
Content
5 hours
Lessons
34

Pig for Wrangling Big Data

Become a Well-Paid Data Handler by Learning to Load, Transform & Extract Data Using Pig

By LoonyCorn | in Online Courses

Think about the last time you saw a completely unorganized spreadsheet. Now imagine that spreadsheet was 100,000 times larger. Mind-boggling, right? That's why there's Pig. Pig works with unstructured data to wrestle it into a more palatable form that can be stored in a data warehouse for reporting and analysis. With the massive sets of disorganized data many companies are working with today, people who can work with Pig are in major demand. By the end of this course, you could qualify as one of those people.

  • Access 34 lectures & 5 hours of content 24/7
  • Clean up server logs using Pig
  • Work w/ unstructured data to extract information, transform it, & store it in a usable form
  • Write intermediate level Pig scripts to munge data
  • Optimize Pig operations to work on large data sets
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but some knowledge of SQL, Hadoop, and MapReduce is assumed

Compatibility

  • Internet required

Course Outline

  • You, This Course and Us
    • You, This Course and Us (1:46)
  • Where does Pig fit in?
    • Pig and the Hadoop ecosystem (9:37)
    • Install and set up (8:50)
    • How does Pig compare with Hive? (10:15)
    • Pig Latin as a data flow language (6:17)
    • Pig with HBase (5:18)
  • Pig Basics
    • Operating modes, running a Pig script, the Grunt shell (9:52)
    • Loading data and creating our first relation (8:45)
    • Scalar data types (9:55)
    • Complex data types - The Tuple, Bag and Map (13:45)
    • Partial schema specification for relations (10:00)
    • Displaying and storing relations - The dump and store commands
  • Pig Operations And Data Transformations
    • Selecting fields from a relation (10:22)
    • Built-in functions (5:08)
    • Evaluation functions (10:31)
    • Using the distinct, limit and order by keywords (5:04)
    • Filtering records based on a predicate (11:01)
  • Advanced Data Transformations
    • Group by and aggregate transformations (12:12)
    • Combining datasets using Join (16:19)
    • Concatenating datasets using Union (4:32)
    • Generating multiple records by flattening complex fields (5:24)
    • Using Co-Group, Semi-Join and Sampling records (9:26)
    • The nested Foreach command (13:47)
    • Debug Pig scripts using Explain and Illustrate (12:55)
  • Optimizing Data Transformations
    • Parallelize operations using the Parallel keyword (8:02)
    • Join Optimizations: Multiple relations join, large and small relation join (10:34)
    • Join Optimizations: Skew join and sort-merge join (8:51)
    • Common sense optimizations (5:25)
  • A real-world example
    • Parsing server logs (7:55)
    • Summarizing error logs (8:47)
  • Installing Hadoop in a Local Environment
    • Hadoop Install Modes (8:32)
    • Setup a Virtual Linux Instance (For Windows users) (15:31)
    • Hadoop Standalone mode Install (9:33)
    • Hadoop Pseudo-Distributed mode Install (14:25)

View Full Curriculum


Access
Lifetime
Content
5.5 hours
Lessons
44

From 0 to 1 : The Cassandra Distributed Database

Learn the Cassandra Distributed Database & Greatly Improve Your Big Data Resume

By LoonyCorn | in Online Courses

Data sets can outgrow traditional databases, much like children outgrow clothes. Unlike, children's growth patterns, however, massive amounts of data can be extremely unpredictable and unstructured. For Big Data, the Cassandra distributed database is the solution, using partitioning and replication to ensure that your data is structured and available even when nodes in a cluster go down. Children, you're on your own.

  • Access 44 lectures & 5.5 hours of content 24/7
  • Set up & manage a cluster using the Cassandra Cluster Manager (CCM)
  • Create keyspaces, column families, & perform CRUD operations using the Cassandra Query Language (CQL)
  • Design primary keys & secondary indexes, & learn partitioning & clustering keys
  • Understand restrictions on queries based on primary & secondary key design
  • Discover tunable consistency using quorum & local quorum
  • Learn architecture & storage components: Commit Log, MemTable, SSTables, Bloom Filters, Index File, Summary File & Data File
  • Build a Miniature Catalog Management System using the Cassandra Java driver
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but some knowledge of SQL, Hadoop, and MapReduce is assumed

Compatibility

  • Internet required

Course Outline

  • You, This Course and Us
    • You, This Course and Us (1:45)
  • Introduction: Cassandra as a distributed, decentralized, columnar database
    • A Column Oriented Database (10:40)
    • Requirements For A Product Catalog System (8:08)
    • What Is Cassandra (8:33)
    • Cassandra Vs HBase (4:37)
  • Install And Set Up
    • Install Cassandra (Mac and Unix based systems) (9:54)
    • Install the Cassandra Cluster Manager (Mac and Unix) (2:21)
    • Install Maven On Your Machine (2:20)
  • The Cassandra Cluster Manager
    • Create A Cassandra Cluster On Your Local Machine (11:54)
    • Basic CCM Commands (7:04)
  • The Cassandra Data Model
    • Column And Column Family (8:03)
    • Super Column Family And Keyspace (7:18)
    • Comparing Cassandra With A Relational Database (4:20)
  • Shell Commands
    • Connecting To Cassandra And Creating A Keyspace (6:55)
    • Column Families And Their Properties (12:02)
    • Modifying Column Families (2:42)
    • Insert Data Into A Column Family (6:52)
    • Advanced Data Types Collections And Counters (10:56)
    • Update Simple And Collection Data Types (15:54)
    • Manage Cluster Roles (5:01)
  • Keys And Indexes: Primary Keys, Partition Keys, Clustering Key, Secondary Indexe
    • Partition Keys: Distributing Data Across Cluster Nodes (12:15)
    • Partition Keys: Properties (5:08)
    • Clustering Keys: Data Layout On A Node (3:36)
    • Restrictions On Partition Keys (14:38)
    • Restrictions On Clustering Keys (9:12)
    • Secondory Indexes (8:32)
    • Restrictions On Secondary Indexes (8:52)
    • Allow Filtering (2:27)
  • Tunable Consistency
    • Write Consistency Levels And Hinted Handoff (12:18)
    • Read Consistency Levels (11:19)
    • Replication Factors And Quorum Value (8:14)
  • Storage Systems
    • Overview Of Cassandra Storage Components (6:38)
    • The SS Table And Its Components (9:44)
    • Row Cache And Key Cache (3:14)
    • Anatomy Of A Write Request (8:33)
    • Anatomy Of A Read Request And The Gossip Protocol (7:25)
  • A Mini-Project: A Miniature Catalog Management System
    • Overview And Basic Setup (4:29)
    • Creating A Session And Executing Our First Query (7:40)
    • Create A Column Family (3:27)
    • Check If A Column Family Has Been Created (4:59)
    • Insert Data Into The Listings Column Family (9:13)
    • Insert Data Into The Products Column Family (9:59)
    • Search For Products (13:32)
    • Delete A Listing (4:17)
    • Update Mulitple Column Families Using Logged Batch (14:42)

View Full Curriculum


Access
Lifetime
Content
3 hours
Lessons
23

Oozie: Workflow Scheduling for Big Data Systems

Streamline Your Big Data Workflow by Learning to Use Workflows, Coordinators & Bundles in Oozie

By LoonyCorn | in Online Courses

Working with Big Data, obviously, can be a very complex task. That's why it's important to master Oozie. Oozie makes managing a multitude of jobs at different time schedules, and managing entire data pipelines significantly easier as long as you know the right configurations parameters. This course will teach you how to best determine those parameters, so your workflow will be significantly streamlined.

  • Access 23 lectures & 3 hours of content 24/7
  • Install & set up Oozie
  • Configure Workflows to run jobs on Hadoop
  • Create time-triggered & data-triggered Workflows
  • Build & optimize data pipelines using Bundles
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but some knowledge of Hadoop and MapReduce is assumed

Compatibility

  • Internet required

Course Outline

  • Introduction
    • You, This Course and Us (1:38)
  • A Brief Overview Of Oozie
    • What is Oozie? (11:16)
    • Oozie architectural components (10:46)
  • Oozie Install And Set Up
    • Installing Oozie on your machine (16:29)
  • Workflows: A Directed Acyclic Graph Of Tasks
    • Running MapReduce on the command line (4:41)
    • The lifecycle of a Workflow (6:12)
    • Running our first Oozie Workflow MapReduce application (11:15)
    • The job.properties file (8:45)
    • The workflow.xml file (24:14)
    • A Shell action Workflow (7:46)
    • Control nodes, Action nodes and Global configurations within Workflows (9:57)
  • Coordinators: Managing Workflows
    • Running our first Coordinator application (12:27)
    • A time-triggered Coordinator definition (8:52)
    • Coordinator control mechanisms (7:09)
    • Data availability triggers (10:03)
    • Running a Coordinator which waits for input data (6:11)
    • Coordinator configuration to use data triggers (15:25)
  • Bundles: A Collection Of Coordinators For Data Pipelines
    • Bundles and why we need them (9:15)
    • The Bundle kick-off time (11:12)
  • Installing Hadoop in a Local Environment
    • Hadoop Install Modes (8:32)
    • Setup a Virtual Linux Instance (For Windows users) (15:31)
    • Hadoop Standalone mode Install (9:33)
    • Hadoop Pseudo-Distributed mode Install (14:25)

View Full Curriculum


Access
Lifetime
Content
2 hours
Lessons
16

Flume & Sqoop for Ingesting Big Data

Efficiently Import Data to HDFS, HBase & Hive From a Variety of Sources & Watch Your Job Prospects Grow

By LoonyCorn | in Online Courses

Flume and Sqoop are important elements of the Hadoop ecosystem, transporting data from sources like local file systems to data stores. This is an essential component to organizing and effectively managing Big Data, making Flume and Sqoop great skills to set you apart from other data analysts.

  • Access 16 lectures & 2 hours of content 24/7
  • Use Flume to ingest data to HDFS & HBase
  • Optimize Sqoop to import data from MySQL to HDFS & Hive
  • Ingest data from a variety of sources including HTTP, Twitter & MySQL
Loonycorn is comprised of four individuals—Janani Ravi, Vitthal Srinivasan, Swetha Kolalapudi and Navdeep Singh—who have honed their tech expertises at Google and Flipkart. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but knowledge of HDFS, HBase, and Hive shells is required

Compatibility

  • Internet required

Course Outline

  • You, This Course and Us
    • You, This Course and Us (1:46)
  • Why do we need Flume and Sqoop?
    • Why do we need Flume and Sqoop? (18:23)
  • Flume
    • Installing Flume (2:43)
    • Flume Agent - the basic unit of Flume (10:57)
    • Example 1 : Spool to Logger (14:34)
    • Flume Events are how data is transported (6:07)
    • Example 2 : Spool to HDFS (9:08)
    • Example 3: HTTP to HDFS (9:24)
    • Example 4: HTTP to HDFS with Event Bucketing (5:40)
    • Example 5: Spool to HBase (6:22)
    • Example 6: Using multiple sinks and Channel selectors (9:43)
    • Example 7: Twitter Source with Interceptors (10:48)
  • Sqoop
    • Installing Sqoop (4:25)
    • Example 8: Sqoop Import from MySQL to HDFS (7:49)
    • Example 9: Sqoop Import from MySQL to Hive (4:26)
    • Example 10: Incremental Imports using Sqoop Jobs (5:24)

View Full Curriculum



Terms

  • Instant digital redemption

15-Day Satisfaction Guarantee

We want you to be happy with every course you purchase! If you're unsatisfied for any reason, we will issue a store credit refund within 15 days of purchase.