Is the state gradient not implemented yet for the CUDA kernel? (hence bptt_truncated_learning still forced to be True?) #102

shouldsee · 2024-07-31T14:42:57Z

Thanks for sharing this great project!

        # Add warning that bptt_truncated_learning is forced to be true
        # due to incomplete implementation of CUDA kernel for bptt_learning
        #
        # @TODO : remove this warning once the CUDA kernel, with state gradient, is implemented
        if self.bptt_truncated_learning == False:
            print("====================================================================")
            print("[WARNING]: bptt_truncated_learning is set as true (was configured as false), due to incomplete implementation of CUDA kernel for bptt_learning")
            print("====================================================================")
            self.bptt_truncated_learning = True

https://github.com/RWKV/RWKV-infctx-trainer/blob/70d02c4997578a027d110e3acb03a523d3986448/RWKV-v6/src/model.py#L291C1-L300C1

Just to confirm, when doing tbptt, this is essentially similar to the gradient estimator used in TransformerXL right?

BlinkDL · 2024-09-08T21:29:55Z

use https://github.com/JL-er/RWKV-PEFT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the state gradient not implemented yet for the CUDA kernel? (hence bptt_truncated_learning still forced to be True?) #102

Is the state gradient not implemented yet for the CUDA kernel? (hence bptt_truncated_learning still forced to be True?) #102

shouldsee commented Jul 31, 2024 •

edited

Loading

BlinkDL commented Sep 8, 2024

Is the state gradient not implemented yet for the CUDA kernel? (hence bptt_truncated_learning still forced to be True?) #102

Is the state gradient not implemented yet for the CUDA kernel? (hence bptt_truncated_learning still forced to be True?) #102

Comments

shouldsee commented Jul 31, 2024 • edited Loading

BlinkDL commented Sep 8, 2024

shouldsee commented Jul 31, 2024 •

edited

Loading