Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct handing of ampersands in bibliography entries #2050

Closed
Omikhleia opened this issue Jun 9, 2024 · 5 comments · Fixed by #2048
Closed

Correct handing of ampersands in bibliography entries #2050

Omikhleia opened this issue Jun 9, 2024 · 5 comments · Fixed by #2048
Assignees
Labels
bug Software bug issue inputters:xml modules:packages Issue relates to core or 3rd party packages packages:bibliography
Milestone

Comments

@Omikhleia
Copy link
Member

For strings such as On some stuff & other things, we currently have to format our bibtex files as follows for use with SILE:

@book{key,
  title = {On some stuff & other things},
...

If we don't XML-escape the &, we get an error...

This is dues to SILE supporting XML entries in bibliography, which is non-standard... albeit interesting, e.g. if one wants to markup parts of entries in SIL XML. (The true use case however is not the user inserting markup, it's for the internal logic of rendering titles in italic, etc.)

It's not completely obvious, as:

  • Bib(La)TeX would actually require {On some stuff \& other things} (and the big picture here is that it allows any (La)TeX constructs in field values, but also suffers from its own rules, hence the ampersand escaping I guess...)
  • Most bibtex processors in other tools (citeproc, zotero, web-based solutions, etc.) are just happy with no escaping at all in {On some stuff & other things}

Having to manually edit bibliographies to replace & by & is cumbersome, we'd need to avoid it, or at least have some way to bypass it. (I'd also be interested in supporting Djot/Markdown in bibTeX files, but that's another hornet's nest :p )

@Omikhleia Omikhleia added bug Software bug issue modules:packages Issue relates to core or 3rd party packages labels Jun 9, 2024
@Omikhleia Omikhleia changed the title Correct handing of ampersands in biblography entries Correct handing of ampersands in bibliography entries Jun 9, 2024
@Omikhleia
Copy link
Member Author

Slightly relates to #1860 (as minimal TeX-like stuff people might expect in a bibTeX file, the \& and ~ are maybe both common enough to be properly handled).

@Omikhleia
Copy link
Member Author

Slightly relates to #1860 (as minimal TeX-like stuff people might expect in a bibTeX file, the \& and ~ are maybe both common enough to be properly handled).

Along the same line of thinking, - vs. -- in page ranges might need to be checked for consistency (also as argument to \cite)

@alerque
Copy link
Member

alerque commented Jun 12, 2024

Having to manually edit bibliographies to replace & by & is cumbersome, we'd need to avoid it, or at least have some way to bypass it.

This is definitely not something we should expect to be in the input, we need to apply XML character escaping ourselves.

I'm not familiar with other issues with inputs, but if TeX-isms like \& and ~ are standard we need to decode those too, or if they are common but not standard maybe we need an optional setting for handling them or not on loading bibliographies. Perhaps an argument to the loaded or a setting for whether the input is expected to be plain, XML, SIL, TeX, Markdown, or whatever is in order. Defaulting to plain or whatever is standard or most common.

@Omikhleia Omikhleia self-assigned this Jun 12, 2024
@Omikhleia
Copy link
Member Author

Perhaps an argument to the loaded or a setting for whether the input is expected to be plain, ... or whatever

Food for thought:

  • A setting is not really adequate (One might have several bib files obtained from various sources)
  • An option on \loadbibliography could work, but it is not very user-friendly (One would have to check the input files carefully and knowingly...)

The crux of the matter is that the bibtex format was design with TeX in mind, hence it cannot be made completely portable. (The original need to escape \&, I'd guess, came from & being an active character in TeX for arrays...).

I think that the safest approach (to start with) is to consider by default that the input does not contain any markup. (We are not going to be able to support TeX/LaTeX, or @preamble blocks with TeX-like instructions, anyway).

IMHO, the best course of action is to assume the bib file is self-defined, written in a minimal "portable" subset, i.e. not containing any TeX, XML, SIL or whatever constructs, exception made of the really common ones (--, \& and ~).

FWIW, as of other most "common" input issues (I might comment on them separately at some point) are likely:

  • The dubious construct for knowing how to tune some short forms for names, e.g. author = {{\relax Ch}ristopher Doe} (we might want BibLaTeX's §3.4 extended name syntax before, though it doesn't fix it all)
  • The internal braces for marking bits of text that shouldn't be affected by text transformation such as casing (e.g. title case, first case, etc.) -- but we are not yet there for casing and sorting ;)

@Omikhleia
Copy link
Member Author

We are not going to be able to support TeX/LaTeX

BTW, For the record, Typst does support some minimal interpretation of TeX-like input.

The problem I see there is that we'll never know what's really minimal...

Omikhleia added a commit to Omikhleia/resilient-types that referenced this issue Jun 15, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
Omikhleia added a commit to Omikhleia/resilient-types that referenced this issue Jun 15, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
Omikhleia added a commit to Omikhleia/resilient-types that referenced this issue Jun 15, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
@alerque alerque added this to the v0.15.4 milestone Jun 15, 2024
alerque pushed a commit to Omikhleia/resilient-types that referenced this issue Jun 23, 2024
Accept `\&` for compatibility with legacy BibTeX, but do not
mandate it to be escaped for compatibility with other engines.
Support unescaped `~` as a non-breaking space for compability with
TeX, this is often found in existing bibliography files.
Support `\~` to render a tilde.
XML-escape the input so it can safely be wrapped in a `<sile>`
construct.

Closes sile-typesetter#2050

Closes sile-typesetter#1860 (replaced by this implementation)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software bug issue inputters:xml modules:packages Issue relates to core or 3rd party packages packages:bibliography
Projects
Development

Successfully merging a pull request may close this issue.

2 participants