Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context lost after running in loop in an interval. GPU memory leak? #572

Closed
CipSoft-Components opened this issue Jan 28, 2020 · 43 comments
Closed

Comments

@CipSoft-Components
Copy link

What is wrong?

WebGL context lost. When running a kernel in an interval in a loop many times, after some time the WebGL context get lost, an nothing works any more.
If you open the task manager from windows, in the tab Performance in the GPU section, you can see, that the gpu memory rises, then stays very high for a while, and the drops when the context is lost.

Where does it happen?

In GPU.js in the browser on windows, when running in an interval and loop.

How do we replicate the issue?

Here is a jsfiddle, to replicate the issue: https://jsfiddle.net/zbo8q3wv/
I used a special argument type in this example, but the issue appears also without any arguments.

How important is this (1-5)?

5

Expected behavior (i.e. solution)

Running smoothly without gpu memory leak.

Other Comments

I tried to update my gpu.js from 2.0.4 to 2.6.5, because of a bug when loading an image into a kernel, and afterwards use other kernels with arguments of the type "Array1D(2)". In 2.0.4 If the arguments have more than 2 entries, the entries are strange ordered and so on. I tested this issue against gpu.js 2.6.5, and realised, that it is fixed, but found this issue, when I was really updating my project.
Here also a jsfiddle, with the same code as above, but with gpu.js 2.0.4: https://jsfiddle.net/w0Ljs735/
If you look here at the gpu memory, you will see, that it not rises at all.

@robertleeplummerjr
Copy link
Member

I believe you are running out of memory, and the context crashes. You need to cleanup your previous result by using texture.delete().

@CipSoft-Components
Copy link
Author

Is this new? In gpu.js 2.0.4, I didn't needed to call texture.delete(). When do I have to do it, because I use the last result as input for the next run. In the old version it looks like, it is cleaned up after no reference is on it. Can you adjust gpu.js, to auto clean up the textures like in 2.0.4?

@robertleeplummerjr
Copy link
Member

This is new functionality, here is a version of the jsfiddle that cleans up textures: https://jsfiddle.net/robertleeplummerjr/Lhqrod8f/1/

Can you adjust gpu.js, to auto clean up the textures like in 2.0.4?

Currently, no. In your exact use case there is no need to use pipeline, turning it off will revert to the previous behavior.

Why was it changed? Textures that were outputs from a single kernel would all be overwritten, and passing a texture output to a kernel that it came from would need to be cloned. In short, it was for GPU.js to mimic javascript better (you now can just pass the same texture into the kernel from which it came), and to be more performant (textures are automatically recycled internally).

@CipSoft-Components
Copy link
Author

CipSoft-Components commented Jan 28, 2020

@robertleeplummerjr
Thank you for the information. Unfortunately, in my use case, I can not do without the pipeline setting, because I need the speed. When I turn off the pipeline, the results are read from the gpu, which is too expensive.
Is it possible, to use the output textures from pipelining without deleting?

What do you mean with "be more performant (textures are automatically recycled internally)"? When I do not use pipeline, calling a kernel is very slow, because the texture data is read from the gpu into js.

@robertleeplummerjr
Copy link
Member

Is it possible, to use the output textures from pipelining without deleting?

Yes, you can use them, it is just when you no longer need a texture, go ahead and delete it.

What do you mean with "be more performant (textures are automatically recycled internally)"?

Here is a use the use case where the input and output textures are recycled:

const gpu = new GPU();
const onesKernel = gpu.createKernel(function() { return 1; }, { output: [100, 100], pipeline: true });
const addOneKernel = gpu.createKernel(function(input) {
  return input[this.thread.y][this.thread.x] + 1;
}, { output: [100, 100], pipeline: true });

const ones = onesKernel();
const twos = addOneKernel(ones);
// NOTE: here is where things get tricky!
const threes = addOneKernel(twos);

Why is this "tricky"? Because twos is the result from addOneKernel, and now we're sending it in as an argument to addOneKernel. addOneKernel still has a reference to it's output, and understands that what is about to come in as an argument is a reference to the previous output.
Normally the GPU would just throw an error, saying something cryptic that would allude to "you cannot write to the same texture you are reading from", which is correct.
BUT, what GPU.js does, it detects what we want to do, creates a copy on the internal internal GL texture, and prevents there from being an issue. In addition to that, we now want to do the following, since twos is no longer needed:

twos.delete();

This doesn't actually delete the internal GL texture, it removes a reference to it, and allows GPU.js to recycle textures. It won't delete the internal textures until there are no references to them.

@CipSoft-Components
Copy link
Author

It looks like, that every time a kernel is called, a new texture is created, instead that textures are recycled. Is that right?
Can I somehow configure gpu.js or a kernel to have the old behaviour? In the old version, I could recycle the textures, by have two times the the same kernel, and switch between them. So I had no longer the "read from and write to the same texture" problem, and did not have to cleanup textures by myself.

@CipSoft-Components
Copy link
Author

@robertleeplummerjr
If I understand you correctly, GPU.js should only make a copy of the texture, if it detects, that the input texture is the same like the output texture, or did I understood it wrong? If got it right, then in my jsfiddle, there should be no memory leak, if I don't delete the texture, because I never use it as input again, or not? It looks like, that never a texture is recycled.

Normally, I think, a new texture should only be created if a kernel is called the first time, or gets it output as input, otherwise you have a memory leak, if you not manually delete every texture.
Or there should be configurable setting, to recyclce textures, or auto cleanup, or something similar.

@robertleeplummerjr
Copy link
Member

If I understand you correctly, GPU.js should only make a copy of the texture, if it detects, that the input texture is the same like the output texture, or did I understood it wrong?

It will only copy a texture if it is sent in as an argument. However, if the texture has not been deleted, a new texture is created for the output. Thus the memory leak you potentially are experiencing.

Or there should be configurable setting, to recyclce textures, or auto cleanup, or something similar.

We're open to ideas, proposals that could help the community are welcomed.

@CipSoft-Components
Copy link
Author

@robertleeplummerjr
I made a copy of my project, and tried to change everything in my project, to delete the textures when no longer needed. With this, I do not have the memory leak any longer, but sadly, the performance is bad (also the overall used memory is higher). And in Firefox, I don't know why, the CPU and Memory go haywire. Sadly, because of this performance issue, I have to stay on version 2.0.4.

I would be happy, if GPU.js would get a configuration setting, to configure the old behaviour, because it looks like the new behaviour, to create always a new texture and delete textures, costs more performance than the old one.

@robertleeplummerjr
Copy link
Member

Lets just come up with a setting name for this behavior, and I'll get it in. Previously it was called 'immutable', perhaps just resurrecting this property would be ideal.

@CipSoft-Components
Copy link
Author

I'm not good with names, perhaps:

  • recycleOutput, reuseOutput: Kernel reuses its output texture to write into, to save performance.
  • jsLikeOutput: When reuseOutput is set to true, the kernel will create a new texture, when it gets its output as input, to avoid an WebGL error, because of reading and writing of the same texture at the same time.

I think "jsLikeOutput" is not a good name, but I don't have a better name now. But I think it is best, to have both settings. What do you think @robertleeplummerjr ?

@robertleeplummerjr
Copy link
Member

Sadly, because of this performance issue, I have to stay on version 2.0.4.

Performance should only be affected if cloning is being overly done. The other thing you can do is use kernel.setDebug(true), and check the console, there will be a log every time a clone is done, in this scenario, fyi.

@Steve73
Copy link

Steve73 commented Mar 11, 2020

I'm experiencing the same performance drawback with the newer versions of gpu.js. Following example is blistering fast with version 2.0:

const gpu = new GPU();

const createTexture = gpu.createKernel(function() {
  return 1;
}, { output: [100, 100], pipeline: true});

var changeTextureKernel = function(input) {
  return input[this.thread.y][this.thread.x] + 1;
}
const changeTexture1 = gpu.createKernel(changeTextureKernel, { output: [100, 100], pipeline: true});
const changeTexture2 = gpu.createKernel(changeTextureKernel, { output: [100, 100], pipeline: true});

var t1 = createTexture();
var t2 = changeTexture1(t1);
var t1 = changeTexture2(t2);

var start = performance.now();
for (var i = 0; i < 50000; i++) {
  t2 = changeTexture1(t1);
  t1 = changeTexture2(t2);
}
var end = performance.now();
console.log(end - start);

Using this code with version 2.7 will lead to unnecessary texture cloning. Adding t2.delete() and t1.delete() to the loop will avoid the cloning - but there's still a performance penalty of around 10% compared to version 2.0.

Any idea why that is?

In case the old flag 'immutable' would bring back the old behaviour I would like to vote for resurrecting it. :)

@robertleeplummerjr
Copy link
Member

robertleeplummerjr commented Mar 11, 2020

Ok, I think we have enough of a need here to go ahead and bring back immutable.

@robertleeplummerjr
Copy link
Member

FYI, already have it working locally, just have to fix some broken tests.

#NobodyLeftBehind!

@robertleeplummerjr
Copy link
Member

Released: https://github.com/gpujs/gpu.js/releases/tag/2.8.0

@robertleeplummerjr
Copy link
Member

@CipSoft-Components, I've forked and updated your jsfiddle: https://jsfiddle.net/robertleeplummerjr/0pqcsvy8/

Ty for including a detailed example.

@robertleeplummerjr
Copy link
Member

As a precaution, I went ahead and traced memory as well, and couldn't see anything that was immediately apparent for memory leaks:
Screen Shot 2020-03-11 at 7 49 26 AM

There is a slight memory increase, and I don't yet have a handle on the GPU side, but I'm still digging to ensure this issue is properly put to rest.

@robertleeplummerjr
Copy link
Member

Profiled as well in chrome, and I couldn't even see the GPU memory allocations. As well the memory initially used by the kernel shrank and stabilized over about a 5 minute run with intervals set to 60 milliseconds. I believe we are set here. Please let me know if this becomes an issue again.

@Steve73
Copy link

Steve73 commented Mar 11, 2020

Wow, that was quick! Thank you for the awesome support!!

I just tested the new version 2.8. It's working right away with my old code. So v2.8 with default options behaves exactly like v2.0. Great!

However for some reason v2.0 is still around 10% faster than v2.8 for me. You can test it with my code example above (works with v2.0 and v.2.8 without modifications). Any idea why that is?

@Steve73
Copy link

Steve73 commented Mar 11, 2020

Ok, I've done some more performance testing with the different versions. Version 2.3.1 is the last version without any performance penalty. Version 2.4 which introduced the new memory pipeline management shows the 10% performance drawback (this version also needs texture.delete() to be compatible).

@CipSoft-Components
Copy link
Author

Thank you! Now it works like before, with the benefits of all the changes from the versions in between.

@robertleeplummerjr
Copy link
Member

@Steve73

Version 2.4 which introduced the new memory pipeline management shows the 10% performance drawback (this version also needs texture.delete() to be compatible).

I'll see if I can run a profile to identify the areas where this penalty is introduced today or tomorrow, but if you wanted to help as a contributor, we welcome it.

My guess is here:

We simply need to add the check if kernel.immutable in addition to what is there now, kernel.pipeline.

@robertleeplummerjr
Copy link
Member

@Steve73 since I knew it to do a but of work that is completely irrelevant to mutable kernels, I went ahead and added the boolean to the related if statements and released it as a quick fix/patch: https://github.com/gpujs/gpu.js/releases/tag/2.8.1

Can you test performance and see what it is like now for us now?

TY again for your hard work!

@Steve73
Copy link

Steve73 commented Mar 12, 2020

Just tested version 2.8.1. Unfortunately, the problem ist still there. Are there any other relevant changes in 2.4 that could be the reason?

@robertleeplummerjr
Copy link
Member

@Steve73 a once-over on 5738698 doesn't reveal anything glaring, we just need a profiler to tell us. How are you profiling?

@Steve73
Copy link

Steve73 commented Mar 12, 2020

I'm profiling with chrome's performance monitor. I looked a bit more into it and found that in v2.3.1 _setupOutputTexture() doesn't get called. In v2.8.1 it is called and quite expensive.

In v2.8.1 the code looks like this:

    if (this.graphical) {
      if (this.pipeline) {
        gl.bindRenderbuffer(gl.RENDERBUFFER, null);
        gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer);
        this._setupOutputTexture();
        gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
        return this.immutable ? this.texture.clone() : this.texture;
      }
      gl.bindRenderbuffer(gl.RENDERBUFFER, null);
      gl.bindFramebuffer(gl.FRAMEBUFFER, null);
      gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
      return;
    }

    gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer);
    this._setupOutputTexture();

    if (this.subKernels !== null) {
      this._setupSubOutputTextures();
      this.drawBuffers();
    }

    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
  }

In v2.3.1 the same code looked like this:

    if (this.graphical) {
      if (this.pipeline) {
        gl.bindRenderbuffer(gl.RENDERBUFFER, null);
        gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer);
        if (!this.outputTexture || this.immutable) {
          this._setupOutputTexture();
        }
        gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
        return new this.TextureConstructor({
          texture: this.outputTexture,
          size: texSize,
          dimensions: this.threadDim,
          output: this.output,
          context: this.context,
          internalFormat: this.getInternalFormat(),
          textureFormat: this.getTextureFormat(),
        });
      }
      gl.bindRenderbuffer(gl.RENDERBUFFER, null);
      gl.bindFramebuffer(gl.FRAMEBUFFER, null);
      gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
      this.garbageCollect();
      return;
    }

    gl.bindFramebuffer(gl.FRAMEBUFFER, this.framebuffer);
    if (this.immutable) {
      this._setupOutputTexture();
    }

    if (this.subKernels !== null) {
      if (this.immutable) {
        this._setupSubOutputTextures();
      }
      this.drawBuffers();
    }

    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
    this.garbageCollect();
  }

What I see is, that in v2.8.1 the "if (this.immutable) {" is missing several times.

@robertleeplummerjr
Copy link
Member

Fantastic! I will implement by tomorrow morning if you don't test/beat me to it.

@robertleeplummerjr
Copy link
Member

Ok, though the lines you mention are not 1 to 1, I did find that I was still cloning textures and that has been resolved in https://www.npmjs.com/package/gpu.js/v/2.8.2

Added unit tests:

function testMutableLeak(mode) {

@robertleeplummerjr
Copy link
Member

Can you retry and let me know your results?

@Steve73
Copy link

Steve73 commented Mar 20, 2020

Thank you. Unfortunately the problem still persists. Here are my profiling reports of v2.3.1 and v2.8.5. Maybe this helps.

Bildschirmfoto 2020-03-20 um 20 30 10
Bildschirmfoto 2020-03-20 um 20 26 46

@robertleeplummerjr
Copy link
Member

That helps considerably. Worked on this last night and plan to today as well.

@robertleeplummerjr
Copy link
Member

I have found the culprit, and one I had not seen prior, I believe the net result will be faster than v2.3.1. I should have a fix out later today, or tomorrow.

robertleeplummerjr added a commit that referenced this issue Mar 22, 2020
feat: introduce WebGL._replaceOutputTexture and WebGL._replaceSubOutputTextures to cut down on resource usage
feat: All supportable Math.methods added
fix: Safari not able to render texture arguments
feat: CPU gets a pipeline that acts like GPU with/without immutable
@robertleeplummerjr
Copy link
Member

@robertleeplummerjr
Copy link
Member

robertleeplummerjr commented Mar 22, 2020

Script used for measuring:

  const size = 4096;
  const gpu = new GPU({ mode: 'gpu' });
  const kernel = gpu.createKernel(function(v) {
    return v[this.thread.x] + 1;
  }, {
    output: [size],
    pipeline: true,
    immutable: true,
  });
  console.time('run');
  let lastResult = null;
  let result = kernel(new Float32Array(size));
  for (let i = 0; i < 10000; i++) {
    result = kernel(lastResult = result);
    if (lastResult.delete) {
      lastResult.delete();
    }
  }
  console.log(result.toArray ? result.toArray() : result);
  console.timeEnd('run');

v2.3.1 run: 2945.3193359375ms
image

v2.9.0 run: 165.89599609375ms
image

You have helped me sincerely make this project faster. I cannot BELIEVE the difference in performance.

@robertleeplummerjr
Copy link
Member

Nearly 18 times faster!
2945.3193359375 / 165.89599609375 = 17.75401097850456

@robertleeplummerjr
Copy link
Member

robertleeplummerjr commented Mar 22, 2020

And as well a test for when pipeline is off:
Script used for measuring:

  const size = 4096;
  const gpu = new GPU({ mode: 'gpu' });
  const kernel = gpu.createKernel(function(v) {
    return v[this.thread.x] + 1;
  }, {
    output: [size],
    pipeline: true,
    immutable: false,
  });
  console.time('run');
  let argument = new Float32Array(size);
  let result = kernel(argument);
  for (let i = 0; i < 10000; i++) {
    result = kernel(argument);
  }
  console.log(result.toArray ? result.toArray() : result);
  console.timeEnd('run');

v2.3.0: run: 211.3349609375ms
image

v2.9.0 run: 203.379150390625ms
image

Still a bit of an improvement.

@Steve73
Copy link

Steve73 commented Mar 23, 2020

Thank you & great work! v2.9.0 is now even faster than v2.3.1 in my test case. Congrats!

Unfortunately my actual application with v2.9.0 is still somewhat slower than with 2.3.1. Also I discovered some unexpected behavior with the newer versions. I'll look into it...

@Steve73
Copy link

Steve73 commented Mar 23, 2020

I've pinned down the unexpected behaviour to following bug.

const gpu = new GPU({ mode: 'gpu' });

const createTexture1 = gpu.createKernel(function() {
  return 1;
}, { output: [2, 2], pipeline: false});

const createTexture2 = gpu.createKernel(function() {
  return 1;
}, { output: [4, 4], pipeline: true});

var t1 = createTexture1();
var t2 = createTexture2();

console.log(t2.toArray());

v2.9.0 shows following output:

0: Float32Array(4) [1, 1, 0, 0]
1: Float32Array(4) [1, 1, 0, 0]
2: Float32Array(4) [0, 0, 0, 0]
3: Float32Array(4) [0, 0, 0, 0]

This is incorrect. All entries should be 1. The bug seems to occur when using a kernel with small output size and pipeline disabled, followed by a kernel with bigger output size and pipeline enabled.

@robertleeplummerjr
Copy link
Member

Reproduced, working to resolve.

@robertleeplummerjr
Copy link
Member

I've created #586 for tracking.

@robertleeplummerjr
Copy link
Member

Fyi fixed.

@Steve73
Copy link

Steve73 commented Mar 24, 2020

Awesome! Now everything works in v2.9.1. Thank you for your perfect support!

ammyk9 pushed a commit to ammyk9/gpu.js that referenced this issue Aug 8, 2024
feat: introduce WebGL._replaceOutputTexture and WebGL._replaceSubOutputTextures to cut down on resource usage
feat: All supportable Math.methods added
fix: Safari not able to render texture arguments
feat: CPU gets a pipeline that acts like GPU with/without immutable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants