Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the state gradient not implemented yet for the CUDA kernel? (hence bptt_truncated_learning still forced to be True?) #102

Open
shouldsee opened this issue Jul 31, 2024 · 1 comment

Comments

@shouldsee
Copy link

shouldsee commented Jul 31, 2024

Thanks for sharing this great project!

        # Add warning that bptt_truncated_learning is forced to be true
        # due to incomplete implementation of CUDA kernel for bptt_learning
        #
        # @TODO : remove this warning once the CUDA kernel, with state gradient, is implemented
        if self.bptt_truncated_learning == False:
            print("====================================================================")
            print("[WARNING]: bptt_truncated_learning is set as true (was configured as false), due to incomplete implementation of CUDA kernel for bptt_learning")
            print("====================================================================")
            self.bptt_truncated_learning = True

https://github.com/RWKV/RWKV-infctx-trainer/blob/70d02c4997578a027d110e3acb03a523d3986448/RWKV-v6/src/model.py#L291C1-L300C1

Just to confirm, when doing tbptt, this is essentially similar to the gradient estimator used in TransformerXL right?

@BlinkDL
Copy link
Contributor

BlinkDL commented Sep 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants