Stars
An open-source RAG-based tool for chatting with your documents.
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
A Cloud Native Batch System (Project under CNCF)
Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena
Unofficial Python client library for Semantic Scholar APIs.
LLM based autonomous agent that does online comprehensive research on any given topic
the AI-native open-source embedding database
Biological foundation modeling from molecular to genome scale
A streaming SQL engine, a fast and lightweight alternative to ksqlDB and Apache Flink, 🚀 powered by ClickHouse.
The official repository of "ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory".
Distributed DataFrame for Python designed for the cloud, powered by Rust
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
KubeGene - A turn-key Genome Sequencing workflow management framework
A high-performance, zero-overhead, extensible Python compiler using LLVM
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Unsupervised text tokenizer for Neural Network-based text generation.
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch (NeurIPS 2021)
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Official code repository for GATK versions 4 and up
Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.
An open-source toolkit for large-scale genomic analysis
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.