Highlights
- Pro
Stars
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
Structured state space sequence models
Fast, collaborative live terminal sharing over the web
Package for extracting and mapping the results of every single tensor operation in a PyTorch model in one line of code.
Proving the missing direction of Abel-Ruffini's Theorem
Tool for data extraction and interacting with Lean programmatically.
morph-labs / llm-verified-with-monte-carlo-tree-search
Forked from namin/llm-verified-with-monte-carlo-tree-searchLLM verified with Monte Carlo Tree Search
Experiments in Mechanistic Interpretability and AI Safety in general
Semantic search for competitive programming problems
Interactively grep source code. Source for http://livegrep.com/
Lean 4 programming language and theorem prover
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Steering Llama 2 with Contrastive Activation Addition
Using sparse coding to find distributed representations used by neural networks.
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
Language model alignment-focused deep learning curriculum
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Embedded Firmware for the CATS Flight Computers