vendredi 3 avril 2015

Hadoop interview questions

1) HDFS file can ...

  • ... be duplicated on several nodes
  • ... compressed
  • ... combine multiple files
  • ... contain multiple blocks of different sizes

2) How does HDFS ensure the integrity of the stored data?
  • by comparing the replicated data blocks with each other
  • through error logs
  • using checksums
  • by comparing the replicated blocks to the master copy
3) HBase is ...
  • ... column oriented
  • ... key-value oriented
  • ... versioned
  • ... unversioned
  • ... use zookeeper for synchronization
  • ... use zookeeper for electing a master
4) An HBase table ...
  • ... need a scheme
  • ... doesn't need a scheme
  • ... is served by only one server
  • ... is distributed by region
5) What does a major_compact on an HBase table?
  • It compresses the table files.
  • It combines multiple existing store files to one for each family.
  • It merges region to limit the region number.
  • It splits regions that are too big.
6) What is the relationship between Jobs and Tasks in Hadoop?
  • One job contains only one task
  • One task contains only one job
  • One Job can contain multiple tasks
  • One task can contain multiple tasks
7) The number of Map tasks to be launched in a given job mostly depends on...
  • the number of nodes in the cluster
  • property mapred.map.tasks
  • the number of reduce tasks
  • the size of input splits
8) If no custom partitioner is defined in Hadoop then how is data partitioned before it is sent to the reducer?
  • One by one on each available reduce slot
  • Statistically
  • By hash
9) In Hadoop can you set
  • Number of map
  • Number of reduce
  • Both map and reduce number
  • None, it's automatic
10) What is the minimum number of Reduce tasks for a Job?
  • 0
  • 1
  • 100
  • As many as there are nodes in the cluster
11) When a task fails, hadoop....
  • ... try it again
  • ... try it again until a failure threshold stops the job
  • ... stop the job
  • ... continue without this particular task
12) How can you debug map reduce job?
  • By adding counters.
  • By analyzing log.
  • By running in local mode in an IDE.
  • You can't debut a job.
References:
  • Hadoop wiki - link
  • Hadoop tutorial - link

2 commentaires: