Oracle® In-Database Container for Hadoop Java API Reference
Release 1.0.1

E54638-01

oracle.hadoop.indbmr.lib.input
Class HInputSplits

java.lang.Object
  extended by oracle.hadoop.indbmr.lib.input.HInputSplits

public class HInputSplits
extends java.lang.Object

A library to return the InputSplits of a given JobContext and serialize them to stdout or a file

If the goal is to get splits from a Hive table, the input format in the conf must be set to HiveToJavaInputFormat


Constructor Summary
HInputSplits()
           
 
Method Summary
protected  org.apache.hadoop.mapreduce.InputFormat<?,?> createInputFormat(org.apache.hadoop.conf.Configuration conf)
           
 java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
          Reads the name of an input format class from a Hadoop Configuration, instantiates the InputFormat and returns a list of InputSplit
 void writeSplits(org.apache.hadoop.mapreduce.JobContext jobContext, java.io.OutputStream os)
          Serializes splits to an OutputStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HInputSplits

public HInputSplits()
Method Detail

writeSplits

public void writeSplits(org.apache.hadoop.mapreduce.JobContext jobContext,
                        java.io.OutputStream os)
                 throws java.io.IOException,
                        java.lang.InterruptedException
Serializes splits to an OutputStream

How to serialize a split: Hadoop has an API for pluggable serialization frameworks. See 'Hadoop: A Definitive Guide' book for details. Here is the summary:

Old: In the mapred API, InputSplit was

Parameters:
jobContext -
Throws:
java.io.IOException
java.lang.InterruptedException

getSplits

public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                                 throws java.io.IOException,
                                                                        java.lang.InterruptedException
Reads the name of an input format class from a Hadoop Configuration, instantiates the InputFormat and returns a list of InputSplit

Hadoop needs a JobContext to get splits. A Configuration is not enough. JobContext is a read-only view of a Job.

Parameters:
jobContext -
Returns:
List of InputSplis
Throws:
java.io.IOException
java.lang.InterruptedException

createInputFormat

protected org.apache.hadoop.mapreduce.InputFormat<?,?> createInputFormat(org.apache.hadoop.conf.Configuration conf)

Oracle® In-Database Container for Hadoop Java API Reference
Release 1.0.1

E54638-01

Copyright © 2014, Oracle and/or its affiliates. All rights reserved.