Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dont create new reader in recvMsg #940

Merged
merged 1 commit into from
Jun 22, 2017

Conversation

apolcyn
Copy link
Contributor

@apolcyn apolcyn commented Oct 21, 2016

looking at the total memory allocated over the 30-second protobuf streaming benchmark period, we see that 15-16GB is allocated in total.

It seems simple to eliminate the 1GB of that within the (*recvBufferReader).Read method:
The current functions that allocate significant amount are:

4513.90MB 28.62% 28.62%  4681.40MB 29.68%  github.com/golang/protobuf/proto.Marshal
 4419.88MB 28.02% 56.64%  5088.90MB 32.26%  github.com/golang/protobuf/proto.UnmarshalMerge
 1066.05MB  6.76% 63.40%  1066.05MB  6.76%  google.golang.org/grpc/transport.(*recvBufferReader).Read
 1041.55MB  6.60% 70.00%  1041.55MB  6.60%  golang.org/x/net/http2.parseDataFrame
 1017.55MB  6.45% 76.45% 13544.44MB 85.87%  google.golang.org/grpc/benchmark.(*testServer).StreamingCall
  994.05MB  6.30% 82.75%   994.05MB  6.30%  google.golang.org/grpc/transport.(*Stream).write
  674.52MB  4.28% 87.03%   674.52MB  4.28%  google.golang.org/grpc/benchmark.newPayload
  669.02MB  4.24% 91.27%   669.02MB  4.24%  reflect.unsafe_New
  658.52MB  4.17% 95.45%  6815.47MB 43.21%  google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).Recv
     182MB  1.15% 96.60%  5036.90MB 31.93%  google.golang.org/grpc.(*serverStream).SendMsg
     168MB  1.07% 97.67%  1162.05MB  7.37%  google.golang.org/grpc/transport.(*http2Server).handleData
  167.50MB  1.06% 98.73%   167.50MB  1.06%  github.com/golang/protobuf/proto.(*Buffer).enc_struct_message
     167MB  1.06% 99.79%  4848.40MB 30.74%  google.golang.org/grpc.encode
    1.50MB 0.0095% 99.80% 13545.94MB 85.88%  google.golang.org/grpc.(*Server).processStreamingRPC
    1.50MB 0.0095% 99.81%  1068.05MB  6.77%  google.golang.org/grpc.(*parser).recvMsg

But

1066.05MB  6.76% 63.40%  1066.05MB  6.76%  google.golang.org/grpc/transport.(*recvBufferReader).Read

appears to be just from the frequent creation of a NewReader. If the stream has small messages, then that Reader should get created per data frame.

After this change, no significant amount is allocated in that function. see:

15074.53MB of 15106.60MB total (99.79%)
Dropped 50 nodes (cum <= 75.53MB)
      flat  flat%   sum%        cum   cum%
 4617.42MB 30.57% 30.57%  5298.94MB 35.08%  github.com/golang/protobuf/proto.UnmarshalMerge
 4517.90MB 29.91% 60.47%  4707.40MB 31.16%  github.com/golang/protobuf/proto.Marshal
 1084.05MB  7.18% 67.65% 12862.43MB 85.14%  google.golang.org/grpc/benchmark.(*testServer).StreamingCall
 1028.05MB  6.81% 74.45%  1028.05MB  6.81%  golang.org/x/net/http2.parseDataFrame
 1014.55MB  6.72% 81.17%  1014.55MB  6.72%  google.golang.org/grpc/transport.(*Stream).write
  709.52MB  4.70% 85.87%   709.52MB  4.70%  google.golang.org/grpc/benchmark.newPayload
  706.02MB  4.67% 90.54%  6006.46MB 39.76%  google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).Recv
  681.52MB  4.51% 95.05%   681.52MB  4.51%  reflect.unsafe_New
  189.50MB  1.25% 96.31%   189.50MB  1.25%  github.com/golang/protobuf/proto.(*Buffer).enc_struct_message
     185MB  1.22% 97.53%  5062.40MB 33.51%  google.golang.org/grpc.(*serverStream).SendMsg
     174MB  1.15% 98.68%  1188.55MB  7.87%  google.golang.org/grpc/transport.(*http2Server).handleData
     166MB  1.10% 99.78%  4873.40MB 32.26%  google.golang.org/grpc.encode
       1MB 0.0066% 99.79% 12863.43MB 85.15%  google.golang.org/grpc.(*Server).processStreamingRPC

Benchmarks don't currently show any improvements beyond noise, but I think this should be more significant after #939

@tamird
Copy link
Contributor

tamird commented Oct 22, 2016

Note: it is possible to do this with a smaller diff using https://godoc.org/bytes#Buffer.Reset

@apolcyn
Copy link
Contributor Author

apolcyn commented Oct 22, 2016

@tamird as the underlying slice that's passed to bytes.NewReader is changed on every call to it, thanks but I'm actually having trouble seeing how that could work without first copying into the reset buffer, and then reading back out of it, and causing an extra copy.

@tamird
Copy link
Contributor

tamird commented Oct 22, 2016

The issue here is not the copy of the slice, but the allocation of the bytes.Reader - Reset allows you to reuse the reader by replacing its backing slice.

@apolcyn
Copy link
Contributor Author

apolcyn commented Oct 22, 2016

ah I think I see, it was https://godoc.org/bytes#Reader.Reset rather than https://godoc.org/bytes#Buffer.Reset :), thanks!

@apolcyn
Copy link
Contributor Author

apolcyn commented Oct 22, 2016

on second thought, IMO I'd actually lean towards keeping the current code. It's a minor detail, but the change ends up being the same size since we need to change initialization of recvBuffer, plus it's another object to allocate per stream, open to it though...

@tamird
Copy link
Contributor

tamird commented Oct 25, 2016

Ah, yeah, that's the one :)

On Fri, Oct 21, 2016 at 9:41 PM, apolcyn [email protected] wrote:

ah I think I see, it was https://godoc.org/bytes#Reader.Reset rather than
https://godoc.org/bytes#Buffer.Reset :), thanks!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#940 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABdsPMsByYYWRrhpUssWWByKnSXB4igIks5q2WmsgaJpZM4KdqMy
.

@tamird
Copy link
Contributor

tamird commented Dec 22, 2016

Why is it another object to allocate? You can make recvBufferReader.last a value rather than a pointer.

@apolcyn apolcyn force-pushed the remove_buffer_recreation_in_recv branch 2 times, most recently from 9bd2f6b to 5d292f3 Compare December 23, 2016 00:05
@apolcyn
Copy link
Contributor Author

apolcyn commented Dec 23, 2016

@tamird for the extra allocation, I was just thinking about the per-stream overhead, where the new recvBufferReader is created per stream. But as noted this can probably be avoided some other way if it's necessary, and I don't have any evidence of it being a potential issue in that case anyways...

latest updates use a Reader.Reset instead of copy functions, small detail but used an extra null check instead of changing constructors.

@tamird
Copy link
Contributor

tamird commented Dec 23, 2016 via email

@apolcyn
Copy link
Contributor Author

apolcyn commented Jan 5, 2017

It's looking like the problem on Travis right now is due to *bytes.Reader - Reset now being available on earlier go versions (1.5.4 and 1.6.3 is failing) - unfortunately I think I may have to stick with the original copy used earlier.

Copy link
Contributor

@MakMukhi MakMukhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Found and interesting blog reading up for this PR: https://blog.kowalczyk.info/article/d/Improving-speed-of-pure-Go-SMAZ-compressor-by-26.html

@MakMukhi
Copy link
Contributor

Re-running the tests since it's an old change.

@MakMukhi
Copy link
Contributor

The tests are failing because the latest travis.yml changes aren't pulled in. Can you rebase, please?

@apolcyn apolcyn force-pushed the remove_buffer_recreation_in_recv branch from cffd5df to df3e202 Compare June 21, 2017 23:05
@apolcyn
Copy link
Contributor Author

apolcyn commented Jun 21, 2017

Just squashed and rebased, hopefully fixed

@MakMukhi MakMukhi merged commit 79fc236 into grpc:master Jun 22, 2017
@menghanl menghanl added 1.5 Type: Performance Performance improvements (CPU, network, memory, etc) labels Jun 22, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants