Skip to content

Commit

Permalink
Docs: Update readme with quickstart section and new links to guides (a…
Browse files Browse the repository at this point in the history
…rgilla-io#2333)

# Description

- [x] Documentation update

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
dvsrepo and pre-commit-ci[bot] committed Feb 14, 2023
1 parent 5f0627c commit 91a77ad
Showing 1 changed file with 11 additions and 70 deletions.
81 changes: 11 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@

### Advanced NLP labeling

- Programmatic labeling using [weak supervision](https://docs.argilla.io/en/latest/guides/techniques/weak_supervision.html). Built-in label models (Snorkel, Flyingsquid)
- Programmatic labeling using [rules and weak supervision](https://docs.argilla.io/en/latest/guides/programmatic_labeling_with_rules.html). Built-in label models (Snorkel, Flyingsquid)
- [Bulk-labeling](https://docs.argilla.io/en/latest/reference/webapp/features.html#bulk-annotate) and [search-driven annotation](https://docs.argilla.io/en/latest/guides/features/queries.html)
- Iterate on training data with any [pre-trained model](https://docs.argilla.io/en/latest/tutorials/libraries/huggingface.html) or [library](https://docs.argilla.io/en/latest/tutorials/libraries/libraries.html)
- Efficiently review and refine annotations in the UI and with Python
Expand All @@ -71,93 +71,34 @@
### Monitoring

- Close the gap between production data and data collection activities
- [Auto-monitoring](https://docs.argilla.io/en/latest/guides/steps/3_deploying.html) for [major NLP libraries and pipelines](https://docs.argilla.io/en/latest/tutorials/libraries/libraries.html) (spaCy, Hugging Face, FlairNLP)
- [Auto-monitoring](https://docs.argilla.io/en/latest/guides/log_load_and_prepare_data.html) for [major NLP libraries and pipelines](https://docs.argilla.io/en/latest/tutorials/libraries/libraries.html) (spaCy, Hugging Face, FlairNLP)
- [ASGI middleware](https://docs.argilla.io/en/latest/tutorials/notebooks/deploying-texttokenclassification-fastapi.html) for HTTP endpoints
- Argilla Metrics to understand data and model issues, [like entity consistency for NER models](https://docs.argilla.io/en/latest/guides/steps/4_monitoring.html)
- Argilla Metrics to understand data and model issues, [like entity consistency for NER models](https://docs.argilla.io/en/latest/guides/measure_datasets_with_metrics.html)
- Integrated with Kibana for custom dashboards

### Team workspaces

- Bring different users and roles into the NLP data and model lifecycles
- Organize data collection, review and monitoring into different [workspaces](https://docs.argilla.io/en/latest/getting_started/installation/user_management.html#workspace)
- Organize data collection, review and monitoring into different [workspaces](https://docs.argilla.io/en/latest/getting_started/installation/configurations/user_management.html)
- Manage workspace access for different users

## Quickstart
Argilla is composed of a `Python Server` with Elasticsearch as the database layer, and a `Python Client` to create and manage datasets.
👋 Welcome! If you have just discovered Argilla this is the best place to get started. Argilla is composed of:

To get started you need to **install the client and the server** with `pip`:
```bash

pip install "argilla[server]"

```

Then you need to **run [Elasticsearch (ES)](https://www.elastic.co/elasticsearch)**.

The simplest way is to use`Docker` by running:

```bash
* Argilla Client: a powerful Python library for reading and writing data into Argilla, using all the libraries you love (transformers, spaCy, datasets, and any other).

docker run -d --name elasticsearch-for-argilla --network argilla-net -p 9200:9200 -p 9300:9300 -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.5.3

```
> :information_source: **Check [the docs](https://docs.argilla.io/en/latest/getting_started/quickstart.html) for further options and configurations for Elasticsearch.**
Finally you can **launch the server**:

```bash

python -m argilla

```
> :information_source: The most common error message after this step is related to the Elasticsearch instance not running. Make sure your Elasticsearch instance is running on http://localhost:9200/. If you already have an Elasticsearch instance or cluster, you point the server to its URL by using [ENV variables](#)
* Argilla Server and UI: the API and UI for data annotation and curation.

To get started you need to:

🎉 You can now access Argilla UI pointing your browser at http://localhost:6900/.
1. Launch the Argilla Server and UI.

**The default username and password are** `argilla` **and** `1234`.

Your workspace will contain no datasets. So let's use the `datasets` library to create our first datasets!

First, you need to install `datasets`:
```bash

pip install datasets

```

Then go to your Python IDE of choice and run:
```python

import pandas as pd
import argilla as rg
from datasets import load_dataset

# load dataset from the hub
dataset = load_dataset("argilla/gutenberg_spacy-ner", split="train")

# read in dataset, assuming its a dataset for text classification
dataset_rg = rg.read_datasets(dataset, task="TokenClassification")

# log the dataset to the Argilla web app
rg.log(dataset_rg, "gutenberg_spacy-ner")

# load dataset from json
my_dataframe = pd.read_json(
"https://raw.githubusercontent.com/recognai/datasets/main/sst-sentimentclassification.json")

# convert pandas dataframe to DatasetForTextClassification
dataset_rg = rg.DatasetForTextClassification.from_pandas(my_dataframe)

# log the dataset to the Argilla web app
rg.log(dataset_rg, name="sst-sentimentclassification")
```
2. Pick a tutorial and start rocking with Argilla using Jupyter Notebooks, or Google Colab.

This will create two datasets that you can use to do a quick tour of the core features of Argilla.
To get started follow the steps [on the Quickstart docs page](https://docs.argilla.io/en/latest/getting_started/quickstart.html).

> 🚒 **If you find issues, get direct support from the team and other community members on the [Slack Community](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g)**
For getting started with your own use cases, [go to the docs](https://docs.argilla.io).
## Principles
- **Open**: Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can **use and combine your preferred libraries** without implementing any specific interface.

Expand Down

0 comments on commit 91a77ad

Please sign in to comment.