GitHub - exu24/financial-datasets: Financial datasets for LLMs 🧪

Usage

To generate datasets using this library, follow these steps:

Set up your environment: Ensure you have Python installed on your system. This library requires Python 3.6 or later.
Install the dependencies:

 pip install -r requirements.txt

Prepare your API key: The library uses OpenAI's GPT models for generating datasets. You need to have an OpenAI API key. Set your API key as an environment variable:
```
export OPENAI_API_KEY='your_api_key_here'
```
Generate datasets: Use the create_dataset.py script to generate datasets. This script supports generating datasets from PDF documents or directly from SEC filings using a ticker symbol and year.

Available Variables:
- --model_name: The model name to use for generating the dataset. Default is 'gpt-3.5-turbo'.
- --pdf_path: The path to the PDF document you want to generate the dataset from.
- --ticker: The stock ticker symbol for generating a dataset from a 10-K filing.
- --year: The year of the 10-K filing to generate from.
- --max_questions: The maximum number of questions to generate from the document or filing.
Example command to generate a dataset from a PDF document:
```
python create_dataset.py --model_name 'gpt-3.5-turbo' --pdf_path '/path/to/your/document.pdf' --max_questions 100
```
Example command to generate a dataset from a 10-K filing:
```
python create_dataset.py --model_name 'gpt-3.5-turbo' --ticker 'AAPL' --year 2021 --max_questions 100
```
The script will generate a dataset and save it as a JSON file in the data directory.

Note: Replace 'gpt-3.5-turbo' with the model of your choice supported by OpenAI. Adjust the --max_questions parameter based on how many questions you want to generate from the document or filing.

Based on:

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
financial_datasets		financial_datasets
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
create_dataset.py		create_dataset.py
pplx.pdf		pplx.pdf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

About

Releases

Packages

Languages

exu24/financial-datasets

Folders and files

Latest commit

History

Repository files navigation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages