Skip to content

A simple CLI tool for extracting common vocabulary from text files

License

Notifications You must be signed in to change notification settings

russellsteadman/vocab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vocab Tool

Vocab is a simple CLI utility that takes a text file and finds the most frequent words.

Motivation

I wanted an easy way to extract key vocabulary terms from a story in Spanish so I could learn the most common terms.

Online Version

If you want to try out this tool or use it without installation, try the experimental web version. You can use all of the CLI features in your browser without any information leaving your device.

Installation

Using go

This requires go 1.18+. Replace <version> with the latest release.

go install github.com/russellsteadman/vocab@<version>

If the vocab term is not available in your terminal, make sure your $GO_PATH/bin is in your $PATH.

Via releases

Download the executable for your operating system and architecture from releases. You can then move the binary to a location in the path, or just use it locally.

# For MacOS (darwin)/Linux platforms
chmod +x ./vocab
./vocab --help
:: For Windows platforms
.\vocab --help

Note that these binaries are not signed and will raise "unidentified developer" errors. Code signing may be added in the future with enough usage.

Usage (Command-line)

If you have an .mobi, .epub, or other format, convert the file into a .txt text file. There are many online converters available to do so.

vocab --help
Usage: vocab v0.2.0
Vocab is a tool for generating a vocabulary file from a text file.

Options:
  -h, --help                    Show this help message and exit
  -v, --version                 Show the version number and exit
  -o, --output                  The output file to write to (required)
  -i, --input                   The input file to read from (required)
  -t, --thresh                  The minimum word count to include in the output
  -c, --count                   The maximum number of words to include in the output
  -s, --simple                  Only emit the words, not the counts
  -g, --group                   Group words into # word groups

Examples

Get all words in order of usage:

vocab -i book.txt -o book-vocab.txt

Get top 100 words:

vocab -i book.txt -o book-vocab.txt -c 100

Get words with 10 or more uses:

vocab -i charlie.txt -o charlie-vocab.txt -t 10

Get all words in order of count without additional markup:

vocab -i charlie.txt -o charlie-vocab.txt -s

Get groups of 3 words sorted by usage:

vocab -i charlie.txt -o charlie-vocab.txt -g 3

License

Open source. Released under an MIT License, see the LICENSE file for details.