AIME API Server

Title
AIME API Server

AIME API Server - The Scalable AI Model Inference API Server

With AIME API one deploys deep learning models (Pytorch, Tensorflow) through a job queue as scalable API endpoint capable of serving millions of model inference requests.

Turn a console Python script to a secure and robust web API acting as your interface to the mobile, browser and desktop world.

Features

Fast - asynchronous and multi process API server
Scalable & Robust- distributed cluster ready architecture
Secure - type safe interface and input validation
Aggregates API requests to GPU batch jobs for maximum throughput
Easy integratable into exisiting Python and Tensorflow projects
High performance image and audio input/ouput conversion for common web formats
Pythonic - easily extendable in your favourite programming language

Overview of the AIME API Architecture

The AIME API server solution implements a distributed server architecture with a central API Server communicating through a job queue with a scalable GPU compute cluster. The GPU compute cluster can be heterogeneous and distributed at different locations without requiring an interconnect.

The central part is the API Server, an efficient asynchronous HTTP/HTTPS web server which can be used stand-alone web server or integrated into Apache, NGINX or similar web servers. It takes the client requests, load balances the requests and distributes them to the API compute workers.

Compute Workers

The model compute jobs are processed through so called compute workers which connect to the API server through a secure HTTPS interface.

You can easily turn your existing Pytorch and Tensorflow script into an API compute worker by integrating the AIME API Worker Interface.

Clients

Clients, like web browsers, smartphones, desktop apps can easily integrating model inference API class with the AIME API Client Interfaces.

Example Endpoints

To illustrate the usage and capabilities of AIME API we currently run following GenAI (generative AI) demo api services:

Llama3 Instruct Chat

Chat with 'Steve', our LlaMa 3 based instruct chat-bot.

AIME Demo Server: LLama3 Chat
Your Local Server: LLama3 Chat
Source: https://github.com/aime-labs/llama3_chat

Mixtral 8x7B / 8x22B Instruct Chat

Chat with 'Chloe', our Mixtral 8x7B or 8X22B based instruct chat-bot.

AIME Demo Server: Mixtral Chat
Your Local Server: Mixtral Chat
Source: https://github.com/aime-labs/mixtral_chat

Stable Diffusion XL

Create photo realistic images from text prompts.

AIME Demo Server: Stable Diffusion XL
Your Local Server: Stable Diffusion XL
Source: https://github.com/aime-labs/stable_diffusion_xl

Seamless Communication

Translate between 36 languages in near realtime: Text-to-Text, Speech-to-Text, Text-to-Speech and Speech-to-Speech!

AIME Demo Server: Seamless Communication
Your local Server: Seamless Communication
Source: https://github.com/aime-labs/seamless_communication

Implementation for following model endpoints are also available

Stable Diffusion 3

Create photo realistic images with Stable Diffusion 3

Local Endpoint: Stable Diffusion 3
Worker Implementation: https://github.com/aime-labs/stable_diffusion_3

Tortoise TTS

Tortoise TTS: high quality Text-To-Speech Demo

Local Endpoint: Tortoise TTS
Worker Implementation: https://github.com/aime-labs/tortoise-tts

Llama2 Chat

Chat with 'Dave', the Llama2 based chat-bot.

Local Endpoint: LLama2 Chat
Worker Implementation: https://github.com/aime-labs/llama2_chat

How to setup and start the AIME API Server

Setup the environment

We recommend creating a virtual environment for local development. Create and activate a virtual environment, like 'venv' with:

python3 -m venv venv
source ./venv/bin/activate

Download or clone the AIME API server:

git clone --recurse-submodules https://github.com/aime-team/aime-api-server.git

Alternative, for excluding Worker interface and Client interfaces submodules, which are not needed to run the API server itself, use:

git clone https://github.com/aime-team/aime-api-server.git

Then install required pip packages:

pip install -r requirements.txt

Optional: install ffmpeg (required for image and audio conversion)

Ubuntu/Debian:

sudo apt install ffmpeg

Starting the server

To start the API server run:

python3 run api_server.py [-H HOST] [-p PORT] [-c EP_CONFIG] [--dev]

The server is booting and loading the example endpoints configurations defined in the "/endpoints" directory.

When started it is reachable at http://localhost:7777 (or the port given). As default this README.md file is serverd. The example endpoints are available and are taking requests.

The server is now ready to connect corresponding compute workers.

Compute Workers

You can easily turn your existing Pytorch and Tensorflow script into an API compute worker by integrating the AIME API Worker Interface.

Following example workers implementations are available as open source, which easily can be be adapted to similair use cases:

How to run a Llama3 Chat Worker (Large Language Model Chat)

https://github.com/aime-labs/llama3_chat

How to run a Stable Diffusion Worker (Image Generation)

https://github.com/aime-labs/stable_diffusion_xl

How to run a Seamless Communication Worker (Text2Text, SpeechText, Text2Speech, Speech2Speech)

https://github.com/aime-labs/seamless_communication

Available Client Interfaces

Javascript

Simple single call example for an AIME API Server request on endpoint LlaMa 2 with Javascript:

<script src="/js/model_api.js"></script>
<script>
function onResultCallback(data) {
	console.log(data.text) // print generated text to console
}

params = new Object({
	text : 'Your text prompt' 
});

doAPIRequest('llama2_chat', params, onResultCallback, 'user_name', 'user_key');
</script>

Python

Simple synchronous single call example for an AIME API Server request on endpoint LlaMa 2 with Python:

aime_api_client_interface import do_api_request 

params = {'text': 'Your text prompt'}

result = do_api_request('https://api.aime.info', 'llama2_chat', params, 'user_name', 'user_key')
print(result.get('text')) # print generated text to console

More to come...

We are currently working on sample interfaces for: iOS, Android, Java, PHP, Ruby, C/C++,

Documentation

For more information about the AIME read our blog article about AIME API

The AIME API is free of charge for AIME customers. Details can be found in the LICENSE file. We look forward to hearing from you regarding collaboration or licensing on other devices: [email protected].

Or consult the AIME API documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 333 Commits
api_client_interfaces @ f21c10a		api_client_interfaces @ f21c10a
api_server		api_server
api_test		api_test
api_worker_interface @ 581352f		api_worker_interface @ 581352f
docs		docs
endpoints		endpoints
frontend		frontend
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
aime_api_server.cfg		aime_api_server.cfg
compile_readme.py		compile_readme.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements_api_benchmark.txt		requirements_api_benchmark.txt
run_api_benchmark.py		run_api_benchmark.py
run_api_server.py		run_api_server.py
run_api_test.py		run_api_test.py
run_performace_test.py		run_performace_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIME API Server - The Scalable AI Model Inference API Server

Features

Overview of the AIME API Architecture

AIME API Server

Compute Workers

Clients

Example Endpoints

Llama3 Instruct Chat

Mixtral 8x7B / 8x22B Instruct Chat

Stable Diffusion XL

Seamless Communication

Implementation for following model endpoints are also available

Stable Diffusion 3

Tortoise TTS

Llama2 Chat

How to setup and start the AIME API Server

Setup the environment

Optional: install ffmpeg (required for image and audio conversion)

Starting the server

Compute Workers

How to run a Llama3 Chat Worker (Large Language Model Chat)

How to run a Stable Diffusion Worker (Image Generation)

How to run a Seamless Communication Worker (Text2Text, SpeechText, Text2Speech, Speech2Speech)

Available Client Interfaces

Javascript

Python

More to come...

Documentation

About

Releases 3

Packages

Contributors 4

Languages

License

aime-team/aime-api-server

Folders and files

Latest commit

History

Repository files navigation

AIME API Server - The Scalable AI Model Inference API Server

Features

Overview of the AIME API Architecture

AIME API Server

Compute Workers

Clients

Example Endpoints

Llama3 Instruct Chat

Mixtral 8x7B / 8x22B Instruct Chat

Stable Diffusion XL

Seamless Communication

Implementation for following model endpoints are also available

Stable Diffusion 3

Tortoise TTS

Llama2 Chat

How to setup and start the AIME API Server

Setup the environment

Optional: install ffmpeg (required for image and audio conversion)

Starting the server

Compute Workers

How to run a Llama3 Chat Worker (Large Language Model Chat)

How to run a Stable Diffusion Worker (Image Generation)

How to run a Seamless Communication Worker (Text2Text, SpeechText, Text2Speech, Speech2Speech)

Available Client Interfaces

Javascript

Python

More to come...

Documentation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 4

Languages

Packages