@InterfaceAudience.LimitedPrivate(value="Tools") public class BulkLoadHFilesTool extends org.apache.hadoop.conf.Configured implements BulkLoadHFiles, org.apache.hadoop.util.Tool
BulkLoadHFiles, and also can be executed from command line as a
tool.BulkLoadHFiles.LoadQueueItem| Modifier and Type | Field and Description |
|---|---|
static String |
BULK_LOAD_HFILES_BY_FAMILY
HBASE-24221 Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile because
of compacting at server side
|
static String |
NAME |
ALWAYS_COPY_FILES, ASSIGN_SEQ_IDS, CREATE_TABLE_CONF_KEY, IGNORE_UNMATCHED_CF_CONF_KEY, MAX_FILES_PER_REGION_PER_FAMILY, RETRY_ON_IO_EXCEPTION| Constructor and Description |
|---|
BulkLoadHFilesTool(org.apache.hadoop.conf.Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> |
bulkLoad(TableName tableName,
Map<byte[],List<org.apache.hadoop.fs.Path>> family2Files)
Perform a bulk load of the given directory into the given pre-existing table.
|
Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> |
bulkLoad(TableName tableName,
org.apache.hadoop.fs.Path dir)
Perform a bulk load of the given directory into the given pre-existing table.
|
protected void |
bulkLoadPhase(AsyncClusterConnection conn,
TableName tableName,
Deque<BulkLoadHFiles.LoadQueueItem> queue,
org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups,
boolean copyFiles,
Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> item2RegionMap)
This takes the LQI's grouped by likely regions and attempts to bulk load them.
|
void |
disableReplication()
Disables replication for all bulkloads done via this instance, when bulkload replication is
configured.
|
protected Pair<List<BulkLoadHFiles.LoadQueueItem>,String> |
groupOrSplit(AsyncClusterConnection conn,
TableName tableName,
org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups,
BulkLoadHFiles.LoadQueueItem item,
List<Pair<byte[],byte[]>> startEndKeys)
Attempt to assign the given load queue item into its target region group.
|
static byte[][] |
inferBoundaries(SortedMap<byte[],Integer> bdryMap)
Infers region boundaries for a new table.
|
void |
initialize() |
boolean |
isReplicationDisabled() |
void |
loadHFileQueue(AsyncClusterConnection conn,
TableName tableName,
Deque<BulkLoadHFiles.LoadQueueItem> queue,
boolean copyFiles)
Used by the replication sink to load the hfiles from the source cluster.
|
static void |
main(String[] args) |
static void |
prepareHFileQueue(AsyncClusterConnection conn,
TableName tableName,
Map<byte[],List<org.apache.hadoop.fs.Path>> map,
Deque<BulkLoadHFiles.LoadQueueItem> queue,
boolean silence)
Prepare a collection of
LoadQueueItem from list of source hfiles contained in the
passed directory and validates whether the prepared queue has all the valid table column
families in it. |
static void |
prepareHFileQueue(org.apache.hadoop.conf.Configuration conf,
AsyncClusterConnection conn,
TableName tableName,
org.apache.hadoop.fs.Path hfilesDir,
Deque<BulkLoadHFiles.LoadQueueItem> queue,
boolean validateHFile,
boolean silence)
Prepare a collection of
LoadQueueItem from list of source hfiles contained in the
passed directory and validates whether the prepared queue has all the valid table column
families in it. |
int |
run(String[] args) |
void |
setBulkToken(String bulkToken) |
void |
setClusterIds(List<String> clusterIds) |
protected CompletableFuture<Collection<BulkLoadHFiles.LoadQueueItem>> |
tryAtomicRegionLoad(AsyncClusterConnection conn,
TableName tableName,
boolean copyFiles,
byte[] first,
Collection<BulkLoadHFiles.LoadQueueItem> lqis)
Attempts to do an atomic load of many hfiles into a region.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcreatepublic static final String NAME
public static final String BULK_LOAD_HFILES_BY_FAMILY
public BulkLoadHFilesTool(org.apache.hadoop.conf.Configuration conf)
public void initialize()
public static void prepareHFileQueue(AsyncClusterConnection conn, TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> map, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean silence) throws IOException
LoadQueueItem from list of source hfiles contained in the
passed directory and validates whether the prepared queue has all the valid table column
families in it.map - map of family to List of hfilestableName - table to which hfiles should be loadedqueue - queue which needs to be loaded into the tablesilence - true to ignore unmatched column familiesIOException - If any I/O or network error occurredpublic static void prepareHFileQueue(org.apache.hadoop.conf.Configuration conf,
AsyncClusterConnection conn,
TableName tableName,
org.apache.hadoop.fs.Path hfilesDir,
Deque<BulkLoadHFiles.LoadQueueItem> queue,
boolean validateHFile,
boolean silence)
throws IOException
LoadQueueItem from list of source hfiles contained in the
passed directory and validates whether the prepared queue has all the valid table column
families in it.hfilesDir - directory containing list of hfiles to be loaded into the tablequeue - queue which needs to be loaded into the tablevalidateHFile - if true hfiles will be validated for its formatsilence - true to ignore unmatched column familiesIOException - If any I/O or network error occurredpublic void loadHFileQueue(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean copyFiles) throws IOException
conn - Connection to usetableName - Table to which these hfiles should be loaded toqueue - LoadQueueItem has hfiles yet to be loadedIOException@InterfaceAudience.Private protected CompletableFuture<Collection<BulkLoadHFiles.LoadQueueItem>> tryAtomicRegionLoad(AsyncClusterConnection conn, TableName tableName, boolean copyFiles, byte[] first, Collection<BulkLoadHFiles.LoadQueueItem> lqis)
conn - Connection to usetableName - Table to which these hfiles should be loaded tocopyFiles - whether replicate to peer cluster while bulkloadingfirst - the start key of regionlqis - hfiles should be loaded@InterfaceAudience.Private protected void bulkLoadPhase(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups, boolean copyFiles, Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> item2RegionMap) throws IOException
IOException@InterfaceAudience.Private protected Pair<List<BulkLoadHFiles.LoadQueueItem>,String> groupOrSplit(AsyncClusterConnection conn, TableName tableName, org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups, BulkLoadHFiles.LoadQueueItem item, List<Pair<byte[],byte[]>> startEndKeys) throws IOException
IOException - if an IO failure is encounteredpublic static byte[][] inferBoundaries(SortedMap<byte[],Integer> bdryMap)
Algo:
public Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> bulkLoad(TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> family2Files) throws IOException
BulkLoadHFilesbulkLoad in interface BulkLoadHFilestableName - the table to load intofamily2Files - map of family to List of hfilesTableNotFoundException - if table does not yet existIOExceptionpublic Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> bulkLoad(TableName tableName, org.apache.hadoop.fs.Path dir) throws IOException
BulkLoadHFilesbulkLoad in interface BulkLoadHFilestableName - the table to load intodir - the directory that was provided as the output path of a job using
HFileOutputFormatTableNotFoundException - if table does not yet existIOExceptionpublic void setBulkToken(String bulkToken)
public int run(String[] args) throws Exception
run in interface org.apache.hadoop.util.ToolExceptionpublic void disableReplication()
BulkLoadHFilesdisableReplication in interface BulkLoadHFilespublic boolean isReplicationDisabled()
isReplicationDisabled in interface BulkLoadHFilesCopyright © 2007–2020 The Apache Software Foundation. All rights reserved.