Cart
Free US shipping over $10
Proud to be B-Corp

Practical Hadoop Ecosystem Deepak Vohra

Practical Hadoop Ecosystem By Deepak Vohra

Practical Hadoop Ecosystem by Deepak Vohra


$109.79
Condition - New
Only 2 left

Practical Hadoop Ecosystem Summary

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools by Deepak Vohra

Learn how to use the Apache Hadoop projects, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout, and Apache Solr. From setting up the environment to running sample applications each chapter in this book is a practical tutorial on using an Apache Hadoop ecosystem project.

While several books on Apache Hadoop are available, most are based on the main projects, MapReduce and HDFS, and none discusses the other Apache Hadoop ecosystem projects and how they all work together as a cohesive big data development platform.


What You Will Learn:
  • Set up the environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5
  • Run a MapReduce job
  • Store data with Apache Hive, and Apache HBase
  • Index data in HDFS with Apache Solr
  • Develop a Kafka messaging system
  • Stream Logs to HDFS with Apache Flume
  • Transfer data from MySQL database to Hive, HDFS, and HBase with Sqoop
  • Create a Hive table over Apache Solr
  • Develop a Mahout User Recommender System

Who This Book Is For:
Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

About Deepak Vohra

Deepak Vohra is a coder, developer, programmer, book author, and technical reviewer.

Table of Contents

Introduction1. HDFS and MapReduceHadoop Distributed FileSystemMapReduce FrameworksSetting the EnvironmentHadoop Cluster ModesRunning a MapReduce Job with MR1 FrameworkRunning MR1 in Standalone ModeRunning MR1 in Psuedo-Distributed ModeRunning MapReduce with Yarn FrameworkRunning YARN in Psuedo-Distributed ModeRunning Hadoop Streaming
Section II Storing & Querying
2. Apache HiveSetting the EnvironmentConfiguring HadoopConfiguring HiveStarting HDFSStarting the Hive ServerStarting the Hive CLICreating a DatabaseUsing a DatabaseCreating a Managed TableLoading Data into a TableCreating a table using LIKEAdding Data with INSERT INTO TABLEAdding Data with INSERT OVERWRITECreating Table using AS SELECTAltering a TableTruncating a TableDropping a TableCreating an External Table

3. Apache HBase
Setting the EnvironmentConfiguring HadoopConfiguring HBaseConfiguring HiveStarting HBaseStarting HBase ShellCreating a HBase TableAdding Data To HBase TableListing All TablesGetting a Row of DataScanning a TableCounting Number of Rows in a TableAltering a TableDeleting a RowDeleting a ColumnDisabling and Enabling a TableTruncating a TableDropping a TableFinding if a Table existsCreating a Hive External Table
Section III Bulk Transferring & Streaming
4. Apache Sqoop
Installing MySQL DatabaseCreating MySQL Database TablesSetting the EnvironmentConfiguring HadoopStarting HDFSConfiguring HiveConfiguring HBaseImporting into HDFSExporting from HDFSImporting into HiveImporting into HBase

5. Apache Flume
Setting the EnvironmentConfiguring HadoopConfiguring HBaseStarting HDFSConfiguring FlumeRunning a Flume AgentConfiguring Flume for HBase SinkStreaming MySQL Log to HBase Sink
Section IV Serializing
6. Apache Avro
Setting the EnvironmentCreating an Avro SchemaCreating a Hive Managed TableCreating a Hive (version prior to 0.14) External Table Stored as Avro
7. Apache Parquet
Setting the Environment Creating a Oracle Database Table Exporting Oracle Database to a CSV File Importing the CSV File in MongoDB Exporting MongoDB Document as CSV File Importing a CSV File to Oracle Database
Section V Messaging & Indexing
8. Apache Kafka
Setting the EnvironmentStarting the Kafka ServerCreating a TopicStarting a Kafka ProducerStarting a Kafka ConsumerProducing and Consuming MessagesStreaming Log Data to Apache Kafka with Apache Flume Setting the Environment Creating Kafka Topics Configuring Flume< Running Flume Agent Consuming Log Data as Kafka Messages


9. Apache Solr
Setting the EnvironmentConfiguring the Solr SchemaStarting the Solr Server Indexing a Document in SolrDeleting a Document from Solr Indexing a Document in Solr with Java ClientSearching a Document in SolrCreating a Hive Managed TableCreating a Hive External TableLoading Hive External Table DataSearching Hive Table Data Indexed in Solr
Section VI Machine Learning 10.Apache Mahout
Setting the EnvironmentStarting HDFSSetting the Mahout EnvironmentRunning a Mahout Classification SampleRunning a Mahout Clustering SampleDeveloping a User Based Recommender System The Sample Data Setting the Environment Creating a Maven Project in Eclipse Creating a User Based Recommender Creating a Recommender Evaluator Running the Recommender Choosing a Recommender Type Choosing a User Similarity Measure Choosing a Neighborhood Type Choosing a Neighborhood Size for NearestNUserNeighborhood Choosing a Threshold for ThresholdUserNeighborhood Running the Evaluator Choosing the Split between Training Percentage and Test Percentage

Additional information

NLS9781484221983
9781484221983
1484221982
Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools by Deepak Vohra
New
Paperback
APress
2016-10-01
421
N/A
Book picture is for illustrative purposes only, actual binding, cover or edition may vary.
This is a new book - be the first to read this copy. With untouched pages and a perfect binding, your brand new copy is ready to be opened for the first time

Customer Reviews - Practical Hadoop Ecosystem