Stars
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
💫 Industrial-strength Natural Language Processing (NLP) in Python
🔍 AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your da…
State-of-the-Art Text Embeddings
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Hydra is a framework for elegantly configuring complex applications
Hackable and optimized Transformers building blocks, supporting a composable construction.
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
A system for quickly generating training data with weak supervision
A data augmentations library for audio, image, text, and video.
A Unified Toolkit for Deep Learning Based Document Image Analysis
A Python implementation of LightFM, a hybrid recommendation algorithm.
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Pampy: The Pattern Matching for Python you always dreamed of.
Models, data loaders and abstractions for language processing, powered by PyTorch
Flexible Python configuration system. The last one you will ever need.
A fast, efficient universal vector embedding utility package.
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.