-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release 1.4.5; Document profiling/annotating
- Loading branch information
1 parent
24840f1
commit 4c92ab6
Showing
6 changed files
with
111 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
<?xml version="1.0" encoding="utf-8"?> | ||
<!DOCTYPE chapter SYSTEM "docbook-dtd-45/docbookx.dtd"> | ||
|
||
<chapter id="profiling"> | ||
<title>Profiling / Code Coverage</title> | ||
|
||
<section id="prof-what-why"> | ||
<title>What and why</title> | ||
<para> | ||
Grammars tend to accumulate rules and conditions over time, as exceptions and corner cases are discovered. But these are very rarely removed again, since they may still be useful but nobody knows if they really are. These tools aim to solve that problem, by letting you test a grammar against a large corpus and see exactly what rules and contexts are used, how often they are used (or not), and examples of contexts in which they are used. | ||
</para> | ||
</section> | ||
|
||
<section id="prof-gather"> | ||
<title>Gathering profiling data</title> | ||
<para> | ||
When running a corpus through a grammar, the extra cmdline flag <code>--profile data.sqlite</code> will gather code coverage and data for hits and misses for every rule and condition into an SQLite 3 database. Each run must use its own database, but they can subsequently be merged with <code>cg-merge-annotations output.sqlite input-one.sqlite input-two.sqlite input-three.sqlite ...</code>. | ||
</para> | ||
</section> | ||
|
||
<section id="prof-annotate"> | ||
<title>Annotating</title> | ||
<para> | ||
Use <code>cg-annotate data.sqlite /path/to/output</code> to render the gathered data as HTML. This will create a <code>/path/to/output/index.html</code> file that you can open in a browser, alongside files with hit examples for each rule and context. | ||
</para> | ||
<para> | ||
In case of included grammars, each grammar is rendered separately. And in each rendering, the rules and conditions that matched are clickable to go to a page where an example context is shown. The example has <code># RULE TARGET BEGIN</code> and <code># RULE TARGET END</code> to mark exactly what cohort triggered the rule/condition. | ||
</para> | ||
</section> | ||
</chapter> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
A new release of CG-3 has been tagged v1.4.5 (binary rev 13897). | ||
|
||
Haven't made done one of these rundowns since last NoDaLiDa workshop in 2019, and didn't quite make it in time for last week's NoDaLiDa. | ||
|
||
Authoritative repository is now on Github: https://github.com/GrammarSoft/cg3 | ||
|
||
Notable new features: | ||
- Nested rules keyword With implemented by Daniel Glen Swanson. See https://visl.sdu.dk/cg3/chunked/rules.html#with | ||
- Implemented code coverage / profiling to find annotated examples. See https://visl.sdu.dk/cg3/chunked/profiling.html | ||
|
||
New features: | ||
- Added rule flags NoMapped and NoParent which will cause the rule to skip mapped readings or cohorts with a dependency parent. | ||
- Cmdline flag --dep-absolute will cause dependency to be written with globally unique cohort IDs | ||
- Added rule flags Ignored that will make Remove hide away readings for the current grammar. See https://visl.sdu.dk/cg3/chunked/rules.html#rule-options-ignored | ||
- RemCohort can take Ignored to hide away whole cohorts. And Ignored WithChild to hide away whole dependency sub-trees. | ||
- Added rule flag LookIgnored and context modifier 'I' to allow rules and contexts to look at ignored readings. | ||
- Added rule type Restore to revive previously deleted/ignored readings. See https://visl.sdu.dk/cg3/chunked/rules.html#restore | ||
- Section headers can now have rule flags, which will then apply to all rules in that section. | ||
- Cmdline flag -B will inhibit and trim whitespace between/after cohorts. | ||
- Cmdline flag -T will delimit based on a regex of non-CG data. Defaults to /(^|\n)<s/. See also https://visl.sdu.dk/cg3/chunked/cgkeywords.html#keyword-text-delimiters | ||
- Environment variables CG3_DEFAULT and CG3_OVERRIDE can set and override CG-3 cmdline parameters. Ditto CG3_CONV_DEFAULT and CG3_CONV_OVERRIDE for cg-conv. | ||
- Added context modifier 't' to look at non-target readings, and 'T' to only look at target readings. See https://visl.sdu.dk/cg3/chunked/contexts.html#test-active | ||
- Added global option addcohort-attach to make all AddCohort rules automatically attach to the nearest neighbour. See https://visl.sdu.dk/cg3/chunked/grammar.html#grammar-options | ||
- cg-sort can now sort by weight (-w), reverse (-r), and keep only the first reading (-1). | ||
- cg-conv can now convert back to FST format with -F. | ||
- List += can append tags to an existing set. | ||
- New directive Undef-Sets to delete sets and allow their redefinition. Mostly used when including a common grammar that you want to make a few exceptions to. | ||
- Implemented window-local stream variables. See https://visl.sdu.dk/cg3/chunked/tags.html#local-variables | ||
- Cmdline flag --nrules and --nrules-v to filter which named rules to include in the parse. | ||
- Tag type line match to match the literal whole reading line. See https://visl.sdu.dk/cg3/chunked/tags.html#line-match | ||
|
||
Changes: | ||
- Jump targets can now be constructed from unification and varstrings. | ||
- Relation queries can now be constructed from varstrings. | ||
- Relations now exist as tags during the run so they can be captured with regex. | ||
- Relation queries themselves can also be captured with regex. | ||
- Binary grammars should now be reproducible. | ||
- Baseforms may now be empty strings. | ||
- SetVariable/RemVariable now allow varstrings for variable names and values. | ||
- Stream variables can now have their values tested by equality and regex. See https://visl.sdu.dk/cg3/chunked/tags.html#global-variables | ||
- Lots of updates and new features to the Emacs mode by Kevin Brubeck Unhammer. | ||
- On Posix platforms, Include paths are now shell-expanded so tilde and environment variables can be used. | ||
- Codebase now requires C++17 | ||
|
||
Fixed Bugs: | ||
- Shorthand @< and @> will now fail if there is no previous/next window to look at. | ||
|
||
Main site is https://visl.sdu.dk/cg3.html | ||
Google Group is https://groups.google.com/group/constraint-grammar | ||
Source is at https://github.com/GrammarSoft/cg3 | ||
OS X binaries are at https://apertium.projectjj.com/osx/ | ||
RHEL/Fedora/CentOS/OpenSUSE packages are at https://apertium.projectjj.com/rpm/howto.txt | ||
Debian/Ubuntu packages are at https://apertium.projectjj.com/apt/howto.txt | ||
|
||
-- Tino Didriksen |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters