Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
guidocella committed Jan 2, 2021
0 parents commit 6150f8f
Show file tree
Hide file tree
Showing 5 changed files with 202 additions and 0 deletions.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) Guido Cella

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Some simple tools for learning Japanese in the terminal.

## sdcv

You can use sdcv as a terminal dictionary. Install it from your package manager and download JMDict-ja-en and Kanjidic2 from http://download.huzheng.org/ja/ to `~/.local/share/stardict/dic`.

Once you reach a high enough level, you will want the monolingual daijirin dictionary since it has the most information, but this requires some work since it's copyrighted and in an exoteric format:

- Look up rutracker "epwing" on a search engine, and only download the KOKUGO directory (the kenkyuusha and kotowaza dictionaries are not worth it in my experience)
- Download Yomichan Import from https://foosoft.net/projects/yomichan-import/, and use it to convert convert daijirin to JSON
- Extract the resulting zip
- From the directory with the extracted JSON files, execute this repository's `convert-daijirin.php`
- Install Stardict's convertion tools (`yay -S stardict-tools-git` on Arch, `apt install stardict-tools` on Debian)
- Execute `stardict-tabfile daijirin.tab` (Arch) / `tabfile daijirin.tab` (Debian)
- Execute `mv daijirin.{dict,idx,ifo} ~/.local/share/stardict/dic`

## Kanji lookup by selecting radicals

This consists of a zsh script that parses the [RADKFILE](http://www.edrdg.org/krad/kradinf.html), and a shell function that lets you select a kanji with fzf, and looks it up in sdcv.

- Download the RADKFILE: `curl ftp://ftp.monash.edu/pub/nihongo/radkfile.gz | gunzip | iconv -f EUC-JP -t UTF-8 -o ${XDG_DATA_HOME:-~/.local/share}/radkfile`
- Copy `radicals.zsh` to `~/.local/lib`
- Define this function in `zshrc`/`bashrc`: `radicals() { ~/.local/lib/radicals.zsh $(awk '/^\$/ {print NR,$2,$3 }' ${XDG_DATA_HOME-~/.local/share}/radkfile | fzf -m --with-nth=2,3 --bind=ctrl-l:jump --preview='~/.local/lib/radicals.zsh {+1}' | cut -d ' ' -f 1) | fzf --bind=ctrl-l:jump-accept | sdcv; }`

You use this by selecting the radicals of the kanji you want to look up and pressing Tab. You can filter the selections by typing the stroke count, and can move down with Ctrl+j or by showing labels you can jump to with Ctrl+l (mnemonic: label). As you select radicals, the preview window will the show the kanji that contain them. Once you find your kanji, press enter, select it in the new fzf instance, and sdcv will show its definition.

## Minimalistic IME

A lightweight alternative to IBus that interacts with Anthy on standard input and output. This converts one word at a time, so it's only viable if you mostly just read Japanese rather than writing it.

- Download Anthy's .tar.gz archive from https://packages.debian.org/sid/anthy (as the CLI binary in the old Sourceforge version is broken) and cd to the extracted directory. The later example usage assumes `/opt/anthy`
- Apply `anthy.patch` with, for example, `patch -p1 < ../japanese-cli/anthy.patch`. This removes most printf calls and makes it read from standard input
- Execute `./configure && make`
- Install Rust and execute `CARGO_HOME=~/.local/share/cargo cargo install to-kana` for a program that converts romaji to hiragana
- Add a keybinding that uses [fzfmenu](https://github.com/junegunn/fzf/wiki/Examples#fzf-as-dmenu-replacement) or dmenu with the dynamic options patch. An example for Wayland is: `wtype $(fzfmenu --phony --bind='change:reload(cd /opt/anthy/test; ~/.local/share/cargo/bin/to-kana hira {q} | ./anthy),ctrl-l:jump-accept' < /dev/null)`
108 changes: 108 additions & 0 deletions anthy.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
diff --git a/src-main/context.c b/src-main/context.c
index 5efdba3..9cdbfe3 100644
--- a/src-main/context.c
+++ b/src-main/context.c
@@ -562,6 +562,7 @@ anthy_print_candidate(struct cand_ent *ce)
seg_score = ce->mw->score;
}
anthy_putxstr(&ce->str);
+ return;
printf(":(");
/*if (ce->nr_words == 1) {printf("%d,", ce->elm[0].id); }*/
if (ce->flag & CEF_OCHAIRE) {
@@ -613,14 +614,11 @@ print_segment(struct seg_ent *e)
{
int i;

- anthy_putxstr(&e->str);
- printf("(");
for ( i = 0 ; i < e->nr_cands ; i++) {
anthy_print_candidate(e->cands[i]);
- printf(",");
+ if (i < e->nr_cands - 1)
+ printf("\n");
}
- printf(")");
- printf(":\n");
}

/** コンテキストを表示する */
@@ -637,18 +635,10 @@ anthy_do_print_context(struct anthy_context *ac, int encoding)
return ;
}
/* 各文字を表示する */
- for (i = 0, ce = ac->split_info.ce; i < ac->str.len; i++, ce++) {
- if (ce->seg_border) {
- printf("|");
- }
- anthy_putxchar(*(ce->c));
- }
- printf("\n");
/* 各文節を表示する */
for (i = 0; i < ac->seg_list.nr_segments; i++) {
print_segment(anthy_get_nth_segment(&ac->seg_list, i));
}
- printf("\n");
}

void
diff --git a/test/main.c b/test/main.c
old mode 100755
new mode 100644
index 644c0e2..428b9ef
--- a/test/main.c
+++ b/test/main.c
@@ -27,6 +27,7 @@
#include <anthy/anthy.h>
#include <anthy/convdb.h>
#include <config.h>
+#include <unistd.h>

/* Makefile の $(srcdir) (静的データファイルの基準ディレクトリ) */
#ifndef SRCDIR
@@ -268,7 +269,6 @@ set_string(struct condition *cond, struct res_db *db,
}

if (pr) {
- printf("%d:(%s)\n", in->serial, in->str);
anthy_print_context(ac);
}
anthy_reset_context(ac);
@@ -379,24 +379,12 @@ main(int argc,char **argv)
db = create_db();
read_db(db, expdata);

- printf("./test_anthy --help to print usage.\n");
-
- print_run_env();
-
- fp = fopen(testdata, "r");
- if (!fp) {
- printf("failed to open %s.\n", testdata);
- return 0;
- }
-
ac = init_lib(cond.use_utf8);

- /* ファイルを読んでいくループ */
- while (!read_file(fp, &cur_input)) {
- if (check_cond(&cond, &cur_input)) {
- set_string(&cond, db, &cur_input, ac);
- }
- }
+ char buf[30];
+ read(STDIN_FILENO, buf, 30);
+ cur_input.str = buf;
+ set_string(&cond, db, &cur_input, ac);

anthy_release_context(ac);
anthy_quit();
@@ -406,8 +394,5 @@ main(int argc,char **argv)
ask_results(db);
}

- show_stat(db);
- save_db(expdata, db);
-
return 0;
}
29 changes: 29 additions & 0 deletions convert-daijirin.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?php

// From Stardict's documentation:
//
// Here is a example dict.tab file:
// ============
// a 1\n2\n3
// b 4\\5\n6
// c 789
// ============
// It means: write the search word first, then a Tab character, and the definition. If the definition contains new line, just write \n, if contains \ character, just write \\.

$files = glob('term_bank*');
natsort($files);

foreach ($files as $file) {
echo "$file\n";

$daijirin = '';

foreach (json_decode(file_get_contents($file)) as $row) {
$daijirin .= "{$row[0]}\t".str_replace("\n", '\n', $row[5][0])."\n";
if ($row[1]) {
$daijirin .= "{$row[1]}\t".str_replace("\n", '\n', $row[5][0])."\n";
}
}

file_put_contents('daijirin.tab', $daijirin, FILE_APPEND);
}
9 changes: 9 additions & 0 deletions radicals.zsh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/zsh

for line_number; do
new_kanji=(${(s..)$(sed -n "$(( $line_number + 1 )),/\\$/p" ${XDG_DATA_HOME:-~/.local/share}/radkfile | head -n -1 | tr -d '\n')})
[[ $kanji ]] || kanji=($new_kanji)
kanji=(${kanji:*new_kanji})
done

print -l $kanji

0 comments on commit 6150f8f

Please sign in to comment.