Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17461. Collect thread-level IOStatistics. #4639

Merged
merged 1 commit into from
Jul 27, 2022

Conversation

mehakmeet
Copy link
Contributor

This adds a thread-level collector of IOStatistics, IOStatisticsContext,
which can be:

  • Retrieved for a thread and cached for access from other
    threads.
  • reset() to record new statistics.
  • Queried for live statistics through the
    IOStatisticsSource.getIOStatistics() method.
  • Queries for a statistics aggregator for use in instrumented
    classes.
  • Asked to create a serializable copy in snapshot()

The goal is to make it possible for applications with multiple
threads performing different work items simultaneously
to be able to collect statistics on the individual threads,
and so generate aggregate reports on the total work performed
for a specific job, query or similar unit of work.

Some changes in IOStatistics-gathering classes are needed for
this feature

  • Caching the active context's aggregator in the object's
    constructor
  • Updating it in close()

Slightly more work is needed in multithreaded code,
such as the S3A committers, which collect statistics across
all threads used in task and job commit operations.

Currently the IOStatisticsContext-aware classes are:

  • The S3A input stream, output stream and list iterators.
  • RawLocalFileSystem's input and output streams.
  • The S3A committers.
  • The TaskPool class in hadoop-common, which propagates
    the active context into scheduled worker threads.

Collection of statistics in the IOStatisticsContext
is disabled process-wide by default until the feature
is considered stable.

To enable the collection, set the option
fs.thread.level.iostatistics.enabled
to "true" in core-site.xml;

Contributed by Mehakmeet Singh and Steve Loughran

Description of PR

How was this patch tested?

Region: ap-south-1
All tests ran successfully.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

This adds a thread-level collector of IOStatistics, IOStatisticsContext,
which can be:
* Retrieved for a thread and cached for access from other
  threads.
* reset() to record new statistics.
* Queried for live statistics through the
  IOStatisticsSource.getIOStatistics() method.
* Queries for a statistics aggregator for use in instrumented
  classes.
* Asked to create a serializable copy in snapshot()

The goal is to make it possible for applications with multiple
threads performing different work items simultaneously
to be able to collect statistics on the individual threads,
and so generate aggregate reports on the total work performed
for a specific job, query or similar unit of work.

Some changes in IOStatistics-gathering classes are needed for 
this feature
* Caching the active context's aggregator in the object's
  constructor
* Updating it in close()

Slightly more work is needed in multithreaded code,
such as the S3A committers, which collect statistics across
all threads used in task and job commit operations.

Currently the IOStatisticsContext-aware classes are:
* The S3A input stream, output stream and list iterators.
* RawLocalFileSystem's input and output streams.
* The S3A committers.
* The TaskPool class in hadoop-common, which propagates
  the active context into scheduled worker threads.

Collection of statistics in the IOStatisticsContext
is disabled process-wide by default until the feature 
is considered stable.

To enable the collection, set the option
fs.thread.level.iostatistics.enabled
to "true" in core-site.xml;
	
Contributed by Mehakmeet Singh and Steve Loughran
@mehakmeet
Copy link
Contributor Author

backports: #4352.
CC: @steveloughran

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 7m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ branch-3.3 Compile Tests _
+0 🆗 mvndep 15m 13s Maven dependency ordering for branch
+1 💚 mvninstall 28m 48s branch-3.3 passed
+1 💚 compile 21m 19s branch-3.3 passed
+1 💚 checkstyle 3m 46s branch-3.3 passed
+1 💚 mvnsite 3m 35s branch-3.3 passed
+1 💚 javadoc 2m 33s branch-3.3 passed
+1 💚 spotbugs 5m 0s branch-3.3 passed
+1 💚 shadedclient 26m 55s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 1m 50s the patch passed
+1 💚 compile 17m 25s the patch passed
+1 💚 javac 17m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 3m 13s root: The patch generated 0 new + 97 unchanged - 1 fixed = 97 total (was 98)
+1 💚 mvnsite 3m 35s the patch passed
+1 💚 javadoc 2m 37s the patch passed
+1 💚 spotbugs 5m 4s the patch passed
+1 💚 shadedclient 26m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 16s hadoop-common in the patch passed.
+1 💚 unit 3m 12s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 34s The patch does not generate ASF License warnings.
202m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4639/1/artifact/out/Dockerfile
GITHUB PR #4639
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 60b0e0809438 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.3 / 6483021
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4639/1/testReport/
Max. process+thread count 1674 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4639/1/console
versions git=2.17.1 maven=3.6.0 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit 363f813 into apache:branch-3.3 Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants