Apache Hbase Reference Guide

0
192

Apache Hbase Reference Guide
13.4.1. Changes of Note!
First we’ll cover deployment / operational changes that you might hit when upgrading to HBase 2.0+. After that we’ll call out changes for downstream applications. Please note that Coprocessors are covered in the operational section. Also note that this section is not meant to convey information about new features that may be of interest to you. For a complete summary of changes, please see the CHANGES.txt file in the source release artifact for the version you are planning to upgrade to.

Update to basic prerequisite minimums in HBase 2.0+

https://marketplace.visualstudio.com/items?itemName=gasruksdsdFV.EDIT1

https://marketplace.visualstudio.com/items?itemName=noviws.noviws12

https://marketplace.visualstudio.com/items?itemName=noviwinarni.the-batman-movie-anjinh

https://articlenetwork.site/8-signs-youve-fallen-out-of-love-with-your-partner/

As noted in the section Basic Prerequisites, HBase 2.0+ requires a minimum of Java 8 and Hadoop 2.6. The HBase community recommends ensuring you have already completed any needed upgrades in prerequisites prior to upgrading your HBase version.

HBCK must match HBase server version

You must not use an HBase 1.x version of HBCK against an HBase 2.0+ cluster. HBCK is strongly tied to the HBase server version. Using the HBCK tool from an earlier release against an HBase 2.0+ cluster will destructively alter said cluster in unrecoverable ways.

As of HBase 2.0, HBCK (A.K.A HBCK1 or hbck1) is a read-only tool that can report the status of some non-public system internals but will often misread state because it does not understand the workings of hbase2.

Related, before you upgrade, ensure that hbck1 reports no INCONSISTENCIES. Fixing hbase1-type inconsistencies post-upgrade is an involved process. Configuration settings no longer in HBase 2.0+

The following configuration settings are no longer applicable or available. For details, please see the detailed release notes.

Configuration properties that were renamed in HBase 2.0+

The following properties have been renamed. Attempts to set the old property will be ignored at run time.

Table 5. Renamed properties Old name New name hbase.rpc.server.nativetransport

hbase.netty.nativetransport

hbase.netty.rpc.server.worker.count

hbase.netty.worker.count

hbase.hfile.compactions.discharger.interval

hbase.hfile.compaction.discharger.interval

hbase.hregion.percolumnfamilyflush.size.lower.bound

hbase.hregion.percolumnfamilyflush.size.lower.bound.min

Configuration settings with different defaults in HBase 2.0+

The following configuration settings changed their default value. Where applicable, the value to set to restore the behavior of HBase 1.2 is given.

* hbase.security.authorization now defaults to false. set to true to restore same behavior as previous default.

* hbase.client.retries.number is now set to 10. Previously it was 35. Downstream users are advised to use client timeouts as described in section Timeout settings instead.

* hbase.client.serverside.retries.multiplier is now set to 3. Previously it was 10. Downstream users are advised to use client timesout as describe in section Timeout settings instead.

* hbase.master.fileSplitTimeout is now set to 10 minutes. Previously it was 30 seconds.

* hbase.regionserver.logroll.multiplier is now set to 0.5. Previously it was 0.95. This change is tied with the following doubling of block size. Combined, these two configuration changes should make for WALs of about the same size as those in hbase-1.x but there should be less incidence of small blocks because we fail to roll the WAL before we hit the blocksize threshold. See HBASE for discussion.

* hbase.regionserver.hlog.blocksize defaults to 2x the HDFS default block size for the WAL dir. Previously it was equal to the HDFS default block size for the WAL dir.

* hbase.client.start.log.errors.counter changed to 5. Previously it was 9.

* hbase.ipc.server.callqueue.type changed to ‘fifo’. In HBase versions 1.0 – 1.2 it was ‘deadline’. In prior and later 1.x versions it already defaults to ‘fifo’.

* hbase.hregion.memstore.chunkpool.maxsize is 1.0 by default. Previously it was 0.0. Effectively, this means previously we would not use a chunk pool when our memstore is onheap and now we will. See the section Long GC pauses for more infromation about the MSLAB chunk pool.

* hbase.master.cleaner.interval is now set to 10 minutes. Previously it was 1 minute.

* hbase.master.procedure.threads will now default to 1/4 of the number of available CPUs, but not less than 16 threads. Previously it would be number of threads equal to number of CPUs.

* hbase.hstore.blockingStoreFiles is now 16. Previously it was 10.

* hbase.http.max.threads is now 16. Previously it was 10.

* hbase.client.max.perserver.tasks is now 2. Previously it was 5.

* hbase.normalizer.period is now 5 minutes. Previously it was 30 minutes.

* hbase.regionserver.region.split.policy is now SteppingSplitPolicy. Previously it was IncreasingToUpperBoundRegionSplitPolicy.

* replication.source.ratio is now 0.5. Previously it was 0.1.

“Master hosting regions” feature broken and unsupported

The feature “Master acts as region server” and associated follow-on work available in HBase 1.y is non-functional in HBase 2.y and should not be used in a production setting due to deadlock on Master initialization. Downstream users are advised to treat related configuration settings as experimental and the feature as inappropriate for production settings.

A brief summary of related changes:

* Master no longer carries regions by default

* hbase.balancer.tablesOnMaster is a boolean, default false (if it holds an HBase 1.x list of tables, will default to false)

* hbase.balancer.tablesOnMaster.systemTablesOnly is boolean to keep user tables off master. default false

* those wishing to replicate old list-of-servers config should deploy a stand-alone RegionServer process and then rely on Region Server Groups

“Distributed Log Replay” feature broken and removed

The Distributed Log Replay feature was broken and has been removed from HBase 2.y+. As a consequence all related configs, metrics, RPC fields, and logging have also been removed. Note that this feature was found to be unreliable in the run up to HBase 1.0, defaulted to being unused, and was effectively removed in HBase 1.2.0 when we started ignoring the config that turns it on (HBASE-14465). If you are currently using the feature, be sure to perform a clean shutdown, ensure all DLR work is complete, and disable the feature prior to upgrading.

prefix-tree encoding removed

The prefix-tree encoding was removed from HBase 2.0.0 (HBASE-19179). It was (late!) deprecated in hbase-1.2.7, hbase-1.4.0, and hbase-1.3.2.

This feature was removed because it as not being actively maintained. If interested in reviving this sweet facility which improved random read latencies at the expensive of slowed writes, write the HBase developers list at dev at hbase dot apache dot org.

The prefix-tree encoding needs to be removed from all tables before upgrading to HBase 2.0+. To do that first you need to change the encoding from PREFIX_TREE to something else that is supported in HBase 2.0. After that you have to major compact the tables that were using PREFIX_TREE encoding before. To check which column families are using incompatible data block encoding you can use Pre-Upgrade Validator.

The following metrics have changed names:

* Metrics previously published under the name “AssignmentManger” [sic] are now published under the name “AssignmentManager”

The following metrics have changed their meaning:

* The metric ‘blockCacheEvictionCount’ published on a per-region server basis no longer includes blocks removed from the cache due to the invalidation of the hfiles they are from (e.g. via compaction).

* The metric ‘totalRequestCount’ increments once per request; previously it incremented by the number of Actions carried in the request; e.g. if a request was a multi made of four Gets and two Puts, we’d increment ‘totalRequestCount’ by six; now we increment by one regardless. Expect to see lower values for this metric in hbase-2.0.0.

* The ‘readRequestCount’ now counts reads that return a non-empty row where in older hbases, we’d increment ‘readRequestCount’ whether a Result or not. This change will flatten the profile of the read-requests graphs if requests for non-existent rows. A YCSB read-heavy workload can do this dependent on how the database was loaded.

The following metrics have been removed:

The following metrics have been added:

* ‘totalRowActionRequestCount’ is a count of region row actions summing reads and writes.

HBase-2.0.0 now uses slf4j as its logging frontend. Prevously, we used log4j (1.2). For most the transition should be seamless; slf4j does a good job interpreting log4j.properties logging configuration files such that you should not notice any difference in your log system emissions.

That said, your log4j.properties may need freshening. See HBASE for example, where a stale log configuration file manifest as netty configuration being dumped at DEBUG level as preamble on every shell command invocation.

ZooKeeper configs no longer read from zoo.cfg

HBase no longer optionally reads the ‘zoo.cfg’ file for ZooKeeper related configuration settings. If you previously relied on the ‘hbase.config.read.zookeeper.config’ config for this functionality, you should migrate any needed settings to the hbase-site.xml file while adding the prefix ‘hbase.zookeeper.property.’ to each property name.

The following permission related changes either altered semantics or defaults:

* Permissions granted to a user now merge with existing permissions for that user, rather than over-writing them. (see the release note on HBASE for details)

* Region Server Group commands (added in 1.4.0) now require admin privileges.

Most Admin APIs don’t work against an HBase 2.0+ cluster from pre-HBase 2.0 clients

A number of admin commands are known to not work when used from a pre-HBase 2.0 client. This includes an HBase Shell that has the library jars from pre-HBase 2.0. You will need to plan for an outage of use of admin APIs and commands until you can also update to the needed client version.

The following client operations do not work against HBase 2.0+ cluster when executed from a pre-HBase 2.0 client:

* list_procedures

* split

* merge_region

* list_quotas

* enable_table_replication

* disable_table_replication

* Snapshot related commands

Deprecated in 1.0 admin commands have been removed.

The following commands that were deprecated in 1.0 have been removed. Where applicable the replacement command is listed.

* The ‘hlog’ command has been removed. Downstream users should rely on the ‘wal’ command instead.

Additionally, HBase 2.0 has changed how memstore memory is tracked for flushing decisions. Previously, both the data size and overhead for storage were used to calculate utilization against the flush threashold. Now, only data size is used to make these per-region decisions. Globally the addition of the storage overhead is used to make decisions about forced flushes.

Web UI for splitting and merging operate on row prefixes

Previously, the Web UI included functionality on table status pages to merge or split based on an encoded region name. In HBase 2.0, instead this functionality works by taking a row prefix.

The HBase shell command relies on a bundled JRuby instance. This bundled JRuby been updated from version 1.6.8 to version 9.1.10.0. The represents a change from Ruby 1.8 to Ruby 2.3.3, which introduces non-compatible language changes for user scripts.

The HBase shell command now ignores the ‘–return-values’ flag that was present in early HBase 1.4 releases. Instead the shell always behaves as though that flag were passed. If you wish to avoid having expression results printed in the console you should alter your IRB configuration as noted in the section irbrc.

Coprocessor APIs have changed in HBase 2.0+

All Coprocessor APIs have been refactored to improve supportability around binary API compatibility for future versions of HBase. If you or applications you rely on have custom HBase coprocessors, you should read the release notes for HBASE for details of changes you will need to make prior to upgrading to HBase 2.0+.

For example, if you had a BaseRegionObserver in HBase 1.2 then at a minimum you will need to update it to implement both RegionObserver and RegionCoprocessor and add the method


@Override
public Optional getRegionObserver() {
return Optional.of(this);
}

HBase 2.0+ can no longer write HFile v2 files.

HBase has simplified our internal HFile handling. As a result, we can no longer write HFile versions earlier than the default of version 3. Upgrading users should ensure that hfile.format.version is not set to 2 in hbase-site.xml before upgrading. Failing to do so will cause Region Server failure. HBase can still read HFiles written in the older version 2 format.

HBase 2.0+ can no longer read Sequence File based WAL file.

HBase can no longer read the deprecated WAL files written in the Apache Hadoop Sequence File format. The hbase.regionserver.hlog.reader.impl and hbase.regionserver.hlog.writer.impl configuration entries should be set to use the Protobuf based WAL reader / writer classes. This implementation has been the default since HBase 0.96, so legacy WAL files should not be a concern for most downstream users.

A clean cluster shutdown should ensure there are no WAL files. If you are unsure of a given WAL file’s format you can use the hbase wal command to parse files while the HBase cluster is offline. In HBase 2.0+, this command will not be able to read a Sequence File based WAL. For more information on the tool see the section WALPrettyPrinter.

Change in behavior for filters

The Filter ReturnCode NEXT_ROW has been redefined as skipping to next row in current family, not to next row in all family. it’s more reasonable, because ReturnCode is a concept in store level, not in region level.

Downstream HBase 2.0+ users should use the shaded client

Downstream users are strongly urged to rely on the Maven coordinates org.apache.hbase:hbase-shaded-client for their runtime use. This artifact contains all the needed implementation details for talking to an HBase cluster while minimizing the number of third party dependencies exposed.

Note that this artifact exposes some classes in the org.apache.hadoop package space (e.g. o.a.h.configuration.Configuration) so that we can maintain source compatibility with our public API. Those classes are included so that they can be altered to use the same relocated third party dependencies as the rest of the HBase client code. In the event that you need to also use Hadoop in your code, you should ensure all Hadoop related jars precede the HBase client jar in your classpath.

Downstream HBase 2.0+ users of MapReduce must switch to new artifact

Downstream users of HBase’s integration for Apache Hadoop MapReduce must switch to relying on the org.apache.hbase:hbase-shaded-mapreduce module for their runtime use. Historically, downstream users relied on either the org.apache.hbase:hbase-server or org.apache.hbase:hbase-shaded-server artifacts for these classes. Both uses are no longer supported and in the vast majority of cases will fail at runtime.

Note that this artifact exposes some classes in the org.apache.hadoop package space (e.g. o.a.h.configuration.Configuration) so that we can maintain source compatibility with our public API. Those classes are included so that they can be altered to use the same relocated third party dependencies as the rest of the HBase client code. In the event that you need to also use Hadoop in your code, you should ensure all Hadoop related jars precede the HBase client jar in your classpath.

Significant changes to runtime classpath

A number of internal dependencies for HBase were updated or removed from the runtime classpath. Downstream client users who do not follow the guidance in Downstream HBase 2.0+ users should use the shaded client will have to examine the set of dependencies Maven pulls in for impact. Downstream users of LimitedPrivate Coprocessor APIs will need to examine the runtime environment for impact. For details on our new handling of third party libraries that have historically been a problem with respect to harmonizing compatible runtime versions, see the reference guide section The hbase-thirdparty dependency and shading/relocation.

Multiple breaking changes to source and binary compatibility for client API

The Java client API for HBase has a number of changes that break both source and binary compatibility for details see the Compatibility Check Report for the release you’ll be upgrading to.

Tracing implementation changes

The backing implementation of HBase’s tracing features was updated from Apache HTrace 3 to HTrace 4, which includes several breaking changes. While HTrace 3 and 4 can coexist in the same runtime, they will not integrate with each other, leading to disjoint trace information.

The internal changes to HBase during this upgrade were sufficient for compilation, but it has not been confirmed that there are no regressions in tracing functionality. Please consider this feature experimental for the immediate future.

If you previously relied on client side tracing integrated with HBase operations, it is recommended that you upgrade your usage to HTrace 4 as well.

After the Apache HTrace project moved to the Attic/retired, the traces in HBase are left broken and unmaintained since HBase 2.0. A new project HBASE will replace HTrace with OpenTelemetry. It will be shipped in 3.0.0 release. Please see the reference guide section Tracing for more details.

HFile lose forward compatability

HFiles generated by 2.0.0, 2.0.1, 2.1.0 are not forward compatible to 1.4.6-, 1.3.2.1-, 1.2.6.1-, and other inactive releases. Why HFile lose compatability is hbase in new versions (2.0.0, 2.0.1, 2.1.0) use protobuf to serialize/deserialize TimeRangeTracker (TRT) while old versions use DataInput/DataOutput. To solve this, We have to put HBASE to 2.x and put HBASE in 1.x. For more information, please check HBASE-21008.

You will likely see a change in the performance profile on upgrade to hbase-2.0.0 given read and write paths have undergone significant change. On release, writes may be slower with reads about the same or much better, dependent on context. Be prepared to spend time re-tuning (See Apache HBase Performance Tuning). Performance is also an area that is now under active review so look forward to improvement in coming releases (See HBASE TESTING Performance).

Integration Tests and Kerberos

Integration Tests (IntegrationTests*) used to rely on the Kerberos credential cache for authentication against secured clusters. This used to lead to tests failing due to authentication failures when the tickets in the credential cache expired. As of hbase-2.0.0 (and hbase-1.3.0+), the integration test clients will make use of the configuration properties hbase.client.keytab.file and hbase.client.kerberos.principal. They are required. The clients will perform a login from the configured keytab file and automatically refresh the credentials in the background for the process lifetime (See HBASE-16231).

Default Compaction Throughput

HBase 2.x comes with default limits to the speed at which compactions can execute. This limit is defined per RegionServer. In previous versions of HBase earlier than 1.5, there was no limit to the speed at which a compaction could run by default. Applying a limit to the throughput of a compaction should ensure more stable operations from RegionServers.

Take care to notice that this limit is per RegionServer, not per compaction.

The throughput limit is defined as a range of bytes written per second, and is allowed to vary within the given lower and upper bound. RegionServers observe the current throughput of a compaction and apply a linear formula to adjust the allowed throughput, within the lower and upper bound, with respect to external pressure. For compactions, external pressure is defined as the number of store files with respect to the maximum number of allowed store files. The more store files, the higher the compaction pressure.

Configuration of this throughput is governed by the following properties.

* The lower bound is defined by hbase.hstore.compaction.throughput.lower.bound and defaults to 50 MB/s ( ).

* The upper bound is defined by hbase.hstore.compaction.throughput.higher.bound and defaults to 100 MB/s ( ).

To revert this behavior to the unlimited compaction throughput of earlier versions of HBase, please set the following property to the implementation that applies no limits to compactions.

hbase.regionserver.throughput.controller=org.apache.hadoop.hbase.regionserver.throttle.NoLimitThroughputController

13.4.2. Upgrading Coprocessors to 2.0
1. Pass Interfaces instead of Implementations; e.g. TableDescriptor instead of HTableDescriptor and Region instead of HRegion (HBASE Change client.Table and client.Admin to not use HTableDescriptor).

2. Design refactor so implementers need to fill out less boilerplate and so we can do more compile-time checking (HBASE-17732)

3. Purge Protocol Buffers from Coprocessor API (HBASE-18859, HBASE-16769, etc)

4. Cut back on what we expose to Coprocessors removing hooks on internals that were too private to expose (for eg. HBASE CompactionRequest should not be exposed to user directly; HBASE RegionServerServices Interface cleanup for CP expose; etc)

To use coprocessors in 2.0, they should be rebuilt against new API otherwise they will fail to load and HBase processes will die.

Suggested order of changes to upgrade the coprocessors:

1. Directly implement observer interfaces instead of extending Base*Observer classes. Change Foo extends BaseXXXObserver to Foo implements XXXObserver. (HBASE-17312).

2. Adapt to design change from Inheritence to Composition (HBASE-17732) by following this example.

3. getTable() has been removed from the CoprocessorEnvrionment, coprocessors should self-manage Table instances.

Some examples of writing coprocessors with new API can be found in hbase-example module here .

Lastly, if an api has been changed/removed that breaks you in an irreparable way, and if there’s a good justification to add it back, bring it our notice ().

13.4.3. Rolling Upgrade from 1.x to 2.x
Rolling upgrades are currently an experimental feature. They have had limited testing. There are likely corner cases as yet uncovered in our limited experience so you should be careful if you go this route. The stop/upgrade/start described in the next section, Upgrade process from 1.x to 2.x, is the safest route.

That said, the below is a prescription for a rolling upgrade of a 1.4 cluster.

* Upgrade to the latest 1.4.x release. Pre 1.4 releases may also work but are not tested, so please upgrade to 1.4.3+ before upgrading to 2.x, unless you are an expert and familiar with the region assignment and crash processing. See the section Upgrading from pre-1.4 to 1.4+ on how to upgrade to 1.4.x.

* Make sure that the zk-less assignment is enabled, i.e, set hbase.assignment.usezk to false. This is the most important thing. It allows the 1.x master to assign/unassign regions to/from 2.x region servers. See the release note section of HBASE on how to migrate from zk based assignment to zk less assignment.

* Before you upgrade, ensure that hbck1 reports no INCONSISTENCIES. Fixing hbase1-type inconsistencies post-upgrade is an involved process.

* We have tested rolling upgrading from 1.4.3 to 2.1.0, but it should also work if you want to upgrade to 2.0.x.

1. Unload a region server and upgrade it to 2.1.0. With HBASE in place, the meta region and regions for other system tables will be moved to this region server immediately. If not, please move them manually to the new region server. This is very important because * The schema of meta region is hard coded, if meta is on an old region server, then the new region servers can not access it as it does not have some families, for example, table state. * Client with lower version can communicate with server with higher version, but not vice versa. If the meta region is on an old region server, the new region server will use a client with higher version to communicate with a server with lower version, this may introduce strange problems.

2. Rolling upgrade all other region servers.

3. Upgrading masters.

It is OK that during the rolling upgrading there are region server crashes. The 1.x master can assign regions to both 1.x and 2.x region servers, and HBASE fixed a problem so that 1.x region server can also read the WALs written by 2.x region server and split them.

please read the Changes of Note! section carefully before rolling upgrading. Make sure that you do not use the removed features in 2.0, for example, the prefix-tree encoding, the old hfile format, etc. They could both fail the upgrading and leave the cluster in an intermediate state and hard to recover. If you have success running this prescription, please notify the dev list with a note on your experience and/or update the above with any deviations you may have taken so others going this route can benefit from your efforts. 13.4.4. Upgrade process from 1.x to 2.x
To upgrade an existing HBase 1.x cluster, you should:

* Ensure that hbck1 reports no INCONSISTENCIES. Fixing hbase1-type inconsistencies post-upgrade is an involved process. Fix all hbck1 complaints before proceeding.

* Clean shutdown of existing 1.x cluster

* Update coprocessors

* Upgrade Master roles first

* Upgrade RegionServers

* (Eventually) Upgrade Clients