mercredi 8 avril 2015

ADM 201 - Hadoop Operations: Cluster Administration

This is article gathers notes taken from MapR's ADM 201 class which is mainly about:
- Testing & verifying Hardware before installing MapR Hadoop
- Installing MapR Hadoop
- Benchmarking MapR Hadoop Cluster configure new cluster for production
- Monitoring Cluster for failures & performance

Prerequisites

Install the cluster shell utility, declare the slave nodes and check if it is accessing the nodes properly
$ sudo -i
$ apt-get install clustershell
$ mv /etc/clustershell/groups /etc/clustershell/groups.original
$ echo "all: 192.168.2.212 192.168.2.200" > /etc/clustershell/groups
$ clush -a date

Mapr Cluster validation

Inconsistency in the hardware (e.g. different disk sizes or cpu cores) may not cause installation failure but may cause poor performance of the cluster. The use of benchmarking tools (cluster validation github repo) allows the measurement of the cluster performance.

The remaining of this section address pre-Install cluster hardware tests:
1. Download Benchmark Tools
2. Prepare Cluster Hardware for Parallel Execution of Tests
3. Test & Measure Subsystem Components
4. Validate Component SOftware & Firmware

Grap the validation tools from the github repo
$ curl -L -o cluster-validation.tgz http://github.com/jbenninghoff/cluster-validation/tarball/master
$ tar xvzf cluster-validation.tgz
$ mv jbenninghoff-cluster-validation-*/ ./
$ cd pre-install/

Copy the pre-install folder to all nodes, and check if it succeeded
$ clush -a --copy /root/pre-install/
$ clush -a ls /root/pre-install/

Test the hardware for specification heterogeneity
$ /root/pre-install/cluster-audit.sh | tee cluster-audit.log

Test the network bandwidth for its ability to handle MapReduce operations:
First, set the IP addresses of the node in network-test.sh (divide them between half1 and half2).
$ /root/pre-install/network-test.sh | tee network-test.log

Test memory performance
clush -Ba '/root/pre-install/memory-test.sh | grep ^Triad' | tee memory-test.log

Test disk performance
The disk-test.sh script checks the disk health and performance (i.e. throughput for sequential and random I/O read/write), it destroys any data available on it.
$ clush -ab /root/pre-install/disk-test.sh
For each scanned disk there will be a result file of the form disk_name-iozone.log.

Mapr Quick Install - link

Minimum requirements:
  • 2-4 cores (at least two: 1 CPU for OS, 1 CPU for filesystem)
  • 6GB of ram
  • 20GB size of raw disk (should not be formatted/partitioned)

First, download installer script
$ wget http://package.mapr.com/releases/v4.1.0/ubuntu/mapr-setup
$ chmod 755 mapr-setup
$ ./mapr-setup

Second, configure the installation process (e.g. define data and control nodes). A sample configuration can be found in /opt/mapr-installer/bin/config.example
$ cd /opt/mapr-installer/bin
$ cp config.example config.example.original
Use following commands to find information on nodes to declare in the configuration
$ clush -a lsblk # list drivers name
$ clush -a mount # list ip addresses and mounted drivers

Edit config.example file
  • Declare the nodes information (IP addresses and data drives) under the Control_Nodes section. 
  • Customize the cluster domain by replacing my.cluster.com with your own.
  • Set a new password (e.g. mapr)
  • Declare the disks and set ForceFormat to true.
Installing mapr (the installation script uses Ansible behind the scene)
./install --cfg config.example --private-key /root/.ssh/id_rsa -u root -s -U root --debug new
MapR Cluster Services - link
In case the installation succeeded, you can login to https://master-node:8443/ with mapr:mapr to access MapR Control System (MCS) then get a new license.
Otherwise, if the installation fails, then remove install folder then check installation logs that can be found at /opt/mapr-installer/var/mapr-installer.log. Example of failures may be caused by:
  • problems formatting disks for MapR FS (check /opt/mapr/logs/disksetup.0.log).
  • one of the nodes has less than 4G of memory
  • disks with LVM setup
As last remedial, you can remove all mapr packages and re-install again:

$ rm -r -f /opt/mapr/ # remove installation folder
$ dpkg --get-selections | grep -v deinstall | grep mapr
mapr-cldb                                       install
mapr-core                                       install
mapr-core-internal                              install
mapr-fileserver                                 install
mapr-hadoop-core                                install
mapr-hbase                                      install
mapr-historyserver                              install
mapr-mapreduce1                                 install
mapr-mapreduce2                                 install
mapr-nfs                                        install
mapr-nodemanager                                install
mapr-resourcemanager                            install
mapr-webserver                                  install
mapr-zk-internal                                install
mapr-zookeeper                                  install
dpkg -r --force-depends # remove all listed packages

To check if the cluster is running properly, we can run the following quick test job.
Note: check that the names of cluster nodes are resolvable through DNS, otherwise declare them in the /etc/hosts of each node.
$ su - mapr
$ cd /opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/
$ yarn jar hadoop-mapreduce-examples-2.5.1-mapr-1501.jar pi 8 800

Benchmark the Cluster

1. Hardware Benchmarking
First, copy the post-install folder to all nodes
clush -a --copy /root/post-install
$ clush -a ls /root/post-install

Second, run tests to check drive throughput and establish a baseline for future comparison
$ cd /root/post-install
$ clush -Ba '/root/post-install/runRWSpeedTest.sh' | tee runRWSpeedTest.log

2. Application Benchmarking
Use specific MapReduce jobs to create test data and process it in order to challenge the performance limits of the cluster.
First, create a volume for the test data
$ maprcli volume create -name benchmarks -replication 1 -mount 1 -path /benchmarks

Second, generate random sequence of data
$ su mapr
$ yarn jar /opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1-mapr-1501.jar teragen 5000000 /benchmarks/teragen1

Then, sort the data and write the output to a directory
$ yarn jar /opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1-mapr-1501.jar terasort /benchmarks/teragen1 /benchmarks/terasort1

To analyze how long it takes to perform each step check the logs on the JobHistoryServer
$ clush -a jps | grep -i JobHistoryServer

Cluster Storage Resources

MapR FS organizes the drives of a cluster into Storage Pools. The later is a group of drives (three by default) on a single physical node. Data is stored across drives of the cluster storage pools. In case, one drive fails then the entire storage pool is lost. To recover it, we need to put all drives of this pool offline, replace the failed drive then return then back to the cluster.
The 3 drives per pool gives us a good balance read/write speed for ingestion huge data and recovery time for failed drives.
Storage pools hold units called Containers (32Gb size by default) which are logically organized into Volumes (which are specific to MapR FS). By default, containers has replication factor inside a volume set to three. We can choose a pattern for replication across containers: chain pattern, star pattern.
$ maprcli volume create name type 0|1

When writing for a file, Container Location Database (CLDB) is used to determine first container where data is written. CLDB replaces the function of a NameNode in MapR hadoop, it stores container replication factor and pattern information. A file is divided into chuncks (default size 256 Mb): small chunk size leads to high writes scheduling overhead, big chunk size requires more memory.
A topology defines the physical layout of a cluster nodes. It's recommended to have two top-level topologies:
  • /data the parent topology for active nodes in the cluster
  • /decommissioned the parent topology used to segregate offline nodes or nodes to be repaired.
Usually, racks that house the physical nodes are used as sub-topology to /data.

Data Ingestion

Ingestion data to MapR FS can be done through:
  • NFS (e.g. Gateway Strategy, Colocation Strategy) by using traditional applications with multiple concurrent read/writes easily - link,
  • Sqoop to transfer data between MapR-FS and relational databases,
  • Flume a distributed service for collecting, aggregating & moving data into MapR-FS

Snapshots are read-only images of volumes at a specific point in time, more accurately a pointer that costs almost nothing. It's good idea to create them regularly to protect the integrity of the data. By default, a snapshot is scheduled automatically at the creation of a volume, it can be customized through the MCS or manually created as follows:
maprcli volume snapshot create -volume DemoVolume -snapshotname 12042015-DemoSnapshot

Mirrors are volumes that represents an exact copy of a source volume from same or different cluster, it takes an extra amount of resources and time to create them. By default, a mirror is a read-only volume but can be made writable. They can be created through the MCS, set the replication factor or manually as follows:
$ maprcli volume mirror start -name DemoVolumeMirror

$ maprcli volume mirror push -name DemoVolumeMirror

Configuring remote mirrors
First, edit cluster configuration file (in both clusters) to include the location of CLDB nodes on the remote one:
$ echo "cldb_addr1:7222 cldb_addr2:7222 cldb_addr3:7222" >> /opt/mapr/conf/mapr-clusters.conf
Second, copy this new configuration to all nodes in the cluster
$ clush -a --copy /opt/mapr/conf/mapr-clusters.conf
Third, restart the Warden service so that the modification takes effect:
$ clush -a service mapr-warden restart
Finally, start the mirroring from the MCS interface.

Cluster Monitoring

Once a cluster is up and running, it has to be kept running smoothly. MCS provides many tools to monitor the health and to investigate failure causes of the cluster by providing:

  • alarms: sending emails, nagios notification, and 
  • statistics about nodes (e.g. services), volumes, jobs (MapR metrics database). MapR Hadoop provide ways to

Standard logs for each node are stored at /opt/mapr/hadoop/hadoop-2.5.1/logs, however the centralized logs are stored in /mapr/MaprQuickInstallDemo/var/mapr/local/c200-01/logs at the cluster level.

Centralized logging automate for us the gathering of logs from all cluster nodes. It provides a job-centric view. The following command can be used to create a centralized log direcotry populated with symbolic links to all log files related to: tasks, map attempts, reduce attempts, pretaited to this specific job.
$ maprcli job linklogs -jobid JOB_ID -todir MAPRFS_DIR

The MapR centralized logging feature is enabled by default in /opt/mapr/hadoop/hadoop-0.20.2/conf/hadoop-env.sh through the environement variable HADOOP_TASKTRACKER_ROOT_LOGGER.
Standard log for each node is stored under /opt/mapr/hadoop/hadoop-0.20.2/logs,
on the other hands the centralized logs are stored in the /map/ when starting at the cluster level.

Alarms
When a disk failure alarm is raised, the report at /opt/mapr/logs/faileddisk.log gives information about what disks have failed, the reason of the failure and recommended resolution.


Cluster Statistics
MapR collects a variety of statistics about the cluster and running jobs. There information helps track the cluster usage and health. They can be writting to an output file or consumed by ganglia, the output type is specified in two hadoop-metrics.properties files:
  • /opt/mapr/hadoop/hadoop-0.20.2/conf/hadoop-metrics.properties for output of hadoop standard services
  • /opt/mapr/conf/hadoop-metrics.properties for output of MapR specific services
Collected metrics can be about servicesjobsnodes and monitoring node.

Schedule Maintenance Jobs
The collected metrics give us a good view of the cluster performance and health. The complexity of the cluster makes it hard to use these metrics to optimize how the cluster is running.
Running test jobs regularly to gather job statistics and watch cluster performance. If a variance in the cluster performance can be seen the actions need to be taken to get back the cluster performance. By doing this in a controlled environment we can try different ways (e.g. tweak Disk and Role balancers settings) to optimize the cluster performance.

Resources:

  • MapR installation - Lab GuideQuick Installation Guide
  • Preparing Each Node - link
  • Setting up a MapR Cluster on Amazon Elastic MapReduce - link
  • Cluster service planning - link
  • Tuning cluster for MapReduce performance for specific jobs - link
  • MapR Hadoop data storage - link


vendredi 3 avril 2015

Hadoop interview questions

1) HDFS file can ...

  • ... be duplicated on several nodes
  • ... compressed
  • ... combine multiple files
  • ... contain multiple blocks of different sizes

2) How does HDFS ensure the integrity of the stored data?
  • by comparing the replicated data blocks with each other
  • through error logs
  • using checksums
  • by comparing the replicated blocks to the master copy
3) HBase is ...
  • ... column oriented
  • ... key-value oriented
  • ... versioned
  • ... unversioned
  • ... use zookeeper for synchronization
  • ... use zookeeper for electing a master
4) An HBase table ...
  • ... need a scheme
  • ... doesn't need a scheme
  • ... is served by only one server
  • ... is distributed by region
5) What does a major_compact on an HBase table?
  • It compresses the table files.
  • It combines multiple existing store files to one for each family.
  • It merges region to limit the region number.
  • It splits regions that are too big.
6) What is the relationship between Jobs and Tasks in Hadoop?
  • One job contains only one task
  • One task contains only one job
  • One Job can contain multiple tasks
  • One task can contain multiple tasks
7) The number of Map tasks to be launched in a given job mostly depends on...
  • the number of nodes in the cluster
  • property mapred.map.tasks
  • the number of reduce tasks
  • the size of input splits
8) If no custom partitioner is defined in Hadoop then how is data partitioned before it is sent to the reducer?
  • One by one on each available reduce slot
  • Statistically
  • By hash
9) In Hadoop can you set
  • Number of map
  • Number of reduce
  • Both map and reduce number
  • None, it's automatic
10) What is the minimum number of Reduce tasks for a Job?
  • 0
  • 1
  • 100
  • As many as there are nodes in the cluster
11) When a task fails, hadoop....
  • ... try it again
  • ... try it again until a failure threshold stops the job
  • ... stop the job
  • ... continue without this particular task
12) How can you debug map reduce job?
  • By adding counters.
  • By analyzing log.
  • By running in local mode in an IDE.
  • You can't debut a job.
References:
  • Hadoop wiki - link
  • Hadoop tutorial - link