Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different memory usage behavior with Nim 2.0.0 vs 1.6.x #22510

Closed
guzba opened this issue Aug 19, 2023 · 14 comments
Closed

Different memory usage behavior with Nim 2.0.0 vs 1.6.x #22510

guzba opened this issue Aug 19, 2023 · 14 comments

Comments

@guzba
Copy link
Contributor

guzba commented Aug 19, 2023

Description

I have an HTTP server written in Nim using Mummy and several other libs I wrote (some info here). It is multi-threaded and built using --mm:orc and --threads:on when building with 1.6.x.

I have been running this server for months now built with Nim 1.6.10, 1.6.12 and now 1.6.14 and have had consistent memory behavior and nothing that appears to be a memory leak.

Yesterday I built the server using Nim 2.0.0 and deployed it into production to see how things went. Unfortunately I noticed a difference in memory usage behavior right away. The Nim 2.0.0 build had steadily increasing memory usage, growing to 3x of what the Nim 1.6.x server would use after being stable for hours in a handful of minutes and was still rising.

I took the Nim 2.0.0 build offline and then tested with Nim 2.0.0 + -d:useMalloc. In this case, the memory behavior was as expected based on my previous 1.6.x experience, so it seemed to "fix" the behavior.

I wanted to report this. Each different build is a clean build using the same deps and code, where the only change is the Nim version (and maybe adding -d:useMalloc).

The memory usage behavior change being just with Nim 2.0.0 and without -d:useMalloc + a long history of 1.6.x working well seems to indicate a decent probability the leak is not in the server code itself, though being sure of this is not straightforward. Is there any known behavior differences I should be expecting with the new Nim 2.0.0 memory management that could explain this, or are there known or reported issues that I could look into that may happen to be a cause?

Nim Version

Built using this Docker image: https://hub.docker.com/layers/nimlang/nim/2.0.0-alpine/images/sha256-94cfb2d2d31e23759dfb02b50995b9e24c2cde8cfe3c07298addb3d6b4755457

Current Output

No response

Expected Output

No response

Possible Solution

No response

Additional Information

No response

@Araq
Copy link
Member

Araq commented Aug 19, 2023

The allocator in 2.0 did change to use a shared heap and I think this change is only in the 1.9-2.0 line but cannot remember for sure.

or are there known or reported issues that I could look into that may happen to be a cause?

I created an "artificial" problem showing a leak, let's see if I can find it again...

@cvanelteren
Copy link

I want to chime in that I am experiencing a similar issue when moving from 1.6.4 to 2.0. I have a hard time finding the bug in a sufficient and minimal example. The code is part of a larger computational model and does not show increased memory footprint when running under 1.6.x versions but it does on 2.0.

@Araq
Copy link
Member

Araq commented Aug 29, 2023

@cvanelteren Does -d:useMalloc solve the problem for you?

@cvanelteren
Copy link

It does @Araq!

@ghost
Copy link

ghost commented Oct 26, 2023

@hamidb80 just to be sure, does it also leak with ORC?

@Araq
Copy link
Member

Araq commented Dec 4, 2023

Turned out my test program was invalid and had no leak. Any small examples reproducing the problem?

@cvanelteren
Copy link

Unfortunately my code is part of a simulator. Not sure exactly where the error originated from.

@PhilippMDoerner
Copy link
Contributor

PhilippMDoerner commented Dec 14, 2023

I just ran into this with channels and (wrongly) opened an issue for it, thinking it to be a channel-related problem. #23078

I have since learned it was instead a memory allocation issue because the example code provided there ate through 16GB of memory, crashed and freed it again in under a second. Based on @beef331 s and griffith1deadly advice I'm thus linking to it here.

For context:
The example in that issue is ping-ponging messages with seq's 10000 ints long between 2 threads, 300 at a time. The message is copied once on the other thread at one point because the thread calls a proc proc routeMessage*(msg: BackendMessage; container: ptr Container) = with the message. That somehow manages to eat through 16GB of Ram.

This problem does not occur if compiling -d:useMalloc or changing the routeMessage proc to proc routeMessage*(msg: sink BackendMessage; container: ptr Container) =. Either will reduce memory consumption to somewhere below 100 MB.

As a secondary problem, even with sink provided, you'll still see crashes (when running the example in the linked issue) that do not occur under -d:useMalloc.
Namely something like this:

/home/philipp/dev/threadbutler/src/threadButler/channelHub.nim(42) serverLoop
/usr/lib/nim/system/alloc.nim(1052) alloc
/usr/lib/nim/system/alloc.nim(890) rawAlloc
/usr/lib/nim/system/alloc.nim(810) freeDeferredObjects
/usr/lib/nim/system/alloc.nim(767) addToSharedFreeListBigChunks
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

I don't have a decent understanding there myself as this low in memory interaction still feels beyond me, but apparently the allocator is not reusing memory blocks or sth?

@Araq
Copy link
Member

Araq commented Dec 15, 2023

So we now have a small test program?

import std/[sequtils]

type
  BackendMessage* = object
    field*: seq[int]

type Container = ref object
  chan1: Channel[BackendMessage]
  chan2: Channel[BackendMessage]

proc routeMessage*(msg: BackendMessage; container: ptr Container) =
  discard container[].chan2.trySend(msg)

proc setupChannelReceiver(cont: var Container): Thread[ptr Container] =
  proc recvMsg(container: ptr Container) =
    while true:
      let resp = container[].chan1.tryRecv()
      if resp.dataAvailable:
        routeMessage(resp.msg, container)
    
  createThread(result, recvMsg, cont.addr)

const MESSAGE_COUNT = 100

proc main() =
  var cont = Container()
  cont.chan1.open()
  cont.chan2.open()
  
  let msg: BackendMessage =  BackendMessage(field: (0..500).toSeq())

  let channelReceiverThread = setupChannelReceiver(cont)
  while true:
    echo "New iteration"
    
    var counter = 0
    for _ in 1..MESSAGE_COUNT:
      discard cont.chan1.trySend(msg)
    echo "After sending"
    
    while counter < MESSAGE_COUNT:
      let resp = cont.chan2.tryRecv()
      if resp.dataAvailable:
        counter.inc
    echo "After receiving"

  joinThreads(channelReceiverThread)
  
main()

@PhilippMDoerner
Copy link
Contributor

PhilippMDoerner commented Dec 15, 2023

Aye. Though the latter bug (crash without the insane memory consumption) is pretty flaky in its occurrence, the memory issue reliably occurs.

And the stacktrace you see when crashing due to the memory spike looks identical to the one I get when crashing without the memory spike (at least I believe its without the memory spike as my system monitor doesn't detect memory consumption spikes in between the 500ms it takes to check each time), so maybe that also helps.

Edit: I'll likely need to derive a second example out of my own code that triggers the second error more reliably. That other code (not the example above) works nice to reproduce the error, but has a ton of "noise" associated with it that would make debugging harder.

@PhilippMDoerner
Copy link
Contributor

PhilippMDoerner commented Dec 15, 2023

Does the example I provided suffice to start troubleshooting at least on the first memory allocation issue?

Just trying to avoid communication errors, as me deriving a new example from the code where it occurs specifically for the second issue (segfault without it eating 15GB of RAM) will likely take a bit, cutting down the first example enough was a multiple-hour process that hopefully is going to be quicker this time around ^^'

@Araq
Copy link
Member

Araq commented Dec 17, 2023

Better test program that doesn't misuse the threading API:

import std / [atomics, strutils, sequtils]

type
  BackendMessage* = object
    field*: seq[int]

var
  chan1: Channel[BackendMessage]
  chan2: Channel[BackendMessage]

chan1.open()
chan2.open()

proc routeMessage*(msg: BackendMessage) =
  discard chan2.trySend(msg)

var
  recv: Thread[void]
  stopToken: Atomic[bool]

proc recvMsg() =
  while not stopToken.load(moRelaxed):
    let resp = chan1.tryRecv()
    if resp.dataAvailable:
      routeMessage(resp.msg)
      echo "child consumes ", formatSize getOccupiedMem()

createThread[void](recv, recvMsg)

const MESSAGE_COUNT = 100

proc main() =
  let msg: BackendMessage = BackendMessage(field: (0..500).toSeq())
  for j in 0..10:
    echo "New iteration"

    var counter = 0
    for _ in 1..MESSAGE_COUNT:
      discard chan1.trySend(msg)
    echo "After sending"

    while counter < MESSAGE_COUNT:
      let resp = chan2.tryRecv()
      if resp.dataAvailable:
        counter.inc
    echo "After receiving ", formatSize getOccupiedMem()

  stopToken.store true, moRelaxed
  joinThreads(recv)

main()

Araq added a commit that referenced this issue Dec 19, 2023
@erdman
Copy link

erdman commented Mar 20, 2024

I believe I'm also running into this bug. While I don't have experience with Nim 1.6, I am experiencing slowly increasing memory usage (where there should not be) while using threads in Nim 2.0.2. While "-d:useMalloc" does seem to solve the memory problem, it also runs 25%-50% slower :(

@simonkrauter
Copy link
Contributor

Related issue: #23361.
I posted a small test program there.

@Araq Araq closed this as completed in 69d0b73 Jun 5, 2024
narimiran pushed a commit that referenced this issue Jun 6, 2024
(cherry picked from commit 69d0b73)
narimiran added a commit that referenced this issue Jun 16, 2024
This reverts commit d6bc869.
narimiran pushed a commit that referenced this issue Aug 13, 2024
(cherry picked from commit 69d0b73)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants