Skip to content
View numb3r3's full-sized avatar

Organizations

@jina-ai

Block or report numb3r3

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 217 7 Updated Sep 19, 2024

Official code for infimm-hd

Python 15 Updated Sep 4, 2024
Python 3,266 203 Updated Sep 20, 2024

A fast yet powerful Python Markdown parser with renderers and plugins.

Python 2,550 250 Updated Aug 15, 2024

Things you can do with the token embeddings of an LLM

Python 1,052 30 Updated Sep 20, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 3,718 282 Updated Sep 19, 2024

This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and i…

Python 96 12 Updated Jul 14, 2024

Data-Driven Evaluation for LLM-Powered Applications

Python 433 27 Updated Sep 2, 2024

Efficient Triton Kernels for LLM Training

Python 2,971 152 Updated Sep 20, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,532 244 Updated Sep 14, 2024

An open-source RAG-based tool for chatting with your documents.

Python 12,083 900 Updated Sep 18, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 663 31 Updated Sep 19, 2024

Package and scripts used to build a dataset of Wikipedia articles in Markdown.

Jupyter Notebook 17 1 Updated Sep 11, 2023

highlight.io: The open source, full-stack monitoring platform. Error monitoring, session replay, logging, distributed tracing, and more.

TypeScript 7,466 350 Updated Sep 20, 2024

Distributed LLM and StableDiffusion inference for mobile, desktop and server.

Rust 2,470 131 Updated Aug 30, 2024

OCR, layout analysis, reading order, line detection in 90+ languages

Python 9,871 644 Updated Sep 20, 2024

Convert HTML to Markdown

Python 1,015 135 Updated Jul 14, 2024

A tool to get summaries and get past paywalls

TypeScript 345 22 Updated Jun 11, 2024

[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Python 105 8 Updated Sep 6, 2024
Go 159 9 Updated Sep 20, 2024

High-quality datasets, tools, and concepts for LLM fine-tuning.

1,682 157 Updated Aug 18, 2024

It's a cooler way to store simple linear models.

Python 28 1 Updated Jul 15, 2024

the AI-native open-source embedding database

Rust 14,619 1,219 Updated Sep 20, 2024

Code for the paper "Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation"

Python 3 Updated Jul 6, 2023

The code of paper "Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation" published at NeurIPS 2022

Python 39 6 Updated Oct 9, 2022

MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.

Python 11 5 Updated Aug 2, 2024

Official Implementation of EAGLE-1 and EAGLE-2

Python 756 74 Updated Aug 28, 2024

A large-scale language model for scientific domain, trained on redpajama arXiv split

Python 120 14 Updated Mar 1, 2024

Multimodal language model benchmark, featuring challenging examples

Python 144 6 Updated Aug 13, 2024
Next