Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Data in Higher Taxonomy #32

Open
RogerBurkhalter opened this issue May 2, 2019 · 2 comments
Open

Missing Data in Higher Taxonomy #32

RogerBurkhalter opened this issue May 2, 2019 · 2 comments
Labels
Taxon used to denote issues related to terms in the DwC taxon class

Comments

@RogerBurkhalter
Copy link

This issue is also related to Issue 12.

How do we, in dwc:higherClassification, represent data that is “Uncertain” or otherwise not known. The classification is not known or uncertain not because we couldn’t Google it or whatever, but because scientific authority have not been able to arrive at a consensus, have been shown to be polyphyletic and/or paraphyletic, or there is insufficient information to tell what the taxon belongs with. We may have Family ->Class (missing Order), A simple example would be, according to Adrain, 2011, the trilobite Irvingella angustilimbata should be listed in dwc:higherClassification as:

Animalia | Arthropoda | Trilobita | Uncertain | Elviniidae| Irvingella | angustilimbata
Or, it could be:
Animalia | Arthropoda | Trilobita | | Elviniidae| Irvingella | angustilimbata

The PBDB lists this genus under Order Ptychopariida (citing Whittinton et.al 1997). The PBDB did not pick up on Adrain’s paper because it is a taxonomic revision published as: “Class Trilobita Walch, 1771. In: Zhang, Z.-Q. (Ed.) Animal biodiversity: An outline of higher-level classification and survey of taxonomic richness”. It does not list occurrences, which is the primary function of the PBDB, not taxonomy.

Others, particularly in Palynology/Paleobotany may have a Genus->Kingdom (missing nearly all higher level data). The most extreme example of this I can think of would be certain conservative morphology palynomorph genera, such as Inapertisporites, that different species are attributed to different Kingdoms.

Do we just count on “incomplete data” being ignored or rejected because a species may not have an entire classification associated with it? Should a standard vocabulary be used to indicate missing classification elements, i.e. “unknown”, “incertae sedis”, “prolematica”. Would a standard vocabulary run the risk of grouping unrelated entities in a search? Would an empty entry?

@ekrimmel
Copy link
Collaborator

We should get clarification from iDigBio/GBIF on this, but in the meantime I will answer to the extent that I understand what happens in situations with uncertain higher taxonomy. Anyone else with better insight, please feel free to add to or correct this!

Whenever you provide an identification at the level of specific epithet, the data in dwc:higherClassification matters less than it does when the identification you provide is above the level of genus. This means that "incomplete" data are not being rejected or ignored. For the Irvingella angustilimbata example, iDigBio will go look for the genus Irvingella and then species angustilimbata in GBIF's taxonomic backbone. If you had provided the value "species" for dwc:taxonRank along with this identification, then iDigBio would know that it was dealing with a specific epithet and weight its taxonomic name resolution algorithm accordingly.

As it turns out, GBIF does know the genus Irvingella, but it knows it as both a trilobite and a plant. The data you provide in taxonomic rank fields, e.g. dwc:class, help iDigBio match your data to an (ideally) appropriate taxon concept in the backbone, in this case probably one of the two trilobite genus-level records for Irvingella.

When we record higher classification uncertainty–either by leaving a rank empty or by including an "uncertain"/"incertae cedis" placeholder–this exists in the raw specimen data we publish to iDigBio/GBIF, and each collection is able to assert their own taxonomic classification preferences here. But if we want higher classification information to show up when users search for and view records on iDigBio/GBIF then that requires changes at the level of the GBIF taxonomic backbone, not our specimen data and not anything iDigBio is doing when it matches our data to the backbone. For Irvingella angustilimbata, both genus-level records have the same higher taxonomy (Animalia | Arthropoda | Trilobita | Ptychopariida | Elviniidae | Irvingella) and so there is no option for our data to show up with either nothing or an "uncertain"/"incertae cedis" placeholder in the order level field...

@DimEvil
Copy link

DimEvil commented Oct 23, 2019

We publish only scientificName and Kingdom in our DwC so the GBIF tax Backbone can actually make the distinction between plants or animals. We let GBIF decide for all the higher taxonomy.

@hollyel hollyel added the Taxon used to denote issues related to terms in the DwC taxon class label Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Taxon used to denote issues related to terms in the DwC taxon class
Projects
None yet
Development

No branches or pull requests

4 participants