Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

BITC / Biodiversity Diagnoses - Taxonomy Checks 2

0 (0 Likes / 0 Dislikes)
We say that we need to clean the taxon data, and we need to look at how to make these data homogeneous at a given point in time because data can actually change. Basically what we do is, we try to reconcile our data with some known nomenclature or taxonomy. Probably one of the best taxonomies that are available is Species2000, it is not the only one but it collects a lot of taxonomies and have reconciled a lot of names. But it might not have everything. Something like uBio might have more names or more ancestors of names but that might not be linked to the real. For instance, this map here shows the taxon space in that data set, that means every single here has a name so that there is the first taxon name that ever entered into the database in the columns left to right. So the column means that this particular taxon is for a record of an XXXX [okay]. So those are the names that were entered from literature records related to that species. But if we take this list of names and we try to find what is current accepted names, which can do by querying taxonomy literature, species 2000 or uBio, we see that things change over time. And they might even change of group. So this change that you see represents how taxonomy has changed for a time when we compare the data to species 2000 in this case. And we might resolve to some correction techniques that might be reliable or more reliable than others. If we look at how fast we can do that, we see that trying to do an automatic checking is quite fast. You need your list of names, you need to submit this list of names to our service and we check the names and tell you whether those names are valid or not. That's the easiest way. So it's fast but it might be too reliable, it might be free but it may cost you your time. You could use local nomenclature, which means that you have your own copy of your taxonomy or validness. And you can check the names on your list against your valid copy of names. You might want to order your list alphebethically to see where a name changes. You can detect spelling errors that way. But it is extremely labor intensive, red means bad and green is good. So going by hand is extremely labor intensive or least probably more reliable, you have something really obnoxious. If I want to check a bird list, I could probably send the list to Town and I can be confident that Town will get things right. But it will take an enormous amount of task. In the case of a preview, it will take an enormous amount of cost. As opposed to manual name changes which is something I do, it doesn't cost me much but it will cost a lot of labor. But if I send it to Town, it doesn't cost much labor, it will cost a lot of money. Or you might do something called ground sourcing. Ground sourcing is having a lot of people checking names. This is something that you can do with birds or plants, but for your reliability you cannot be sure who has checked the names and you cannot rely on that as much. Which tools we might use? This is one of the possible tools we might use. And I suggest that you use this tool to check the names you have on your list. It will allow you to compare a list of names with its known names. You'll simply prepare a file which is basically a list, submit the file and you will get a return of those names and the proper names that should have been listed under. So this is an example of the file you will prepare, in case it's plants so you all know what it is. So I submit to the list checker, what I get are matches in this list with the species, its accepted name, etc. But, there are names which might be moved or at least they don't appear in ITIS. For instance, Arnica montanus which is an alternative Latin name for Arnica montana is wrong because Arnica is feminine and montana is XXX so it's spelt wrongly. So I can use this list here, I don't have the vertebrate list; I might see whether a species name matches the list or not and then alphabetically mark that name as something I have to review. Another taxon checker you might use for plants is iplant you might already know it. But it works similarly like ITIS but it has an added value, which is if I submit a list of species to be checked, it will eventually check against XXXXXX, such as tropical or XXXXX. But then it will also mark things that we don't understand and, tell you the level of disagreement which you might use to again check my value of those species. You're confident that this is the right name but anything that goes below (e.g. this goes below to 94). but it has matched most of it. For example, Arnica montanus has been matched to Arnica montana. So you might actually download this list and you use these numbers filtering by say 95%. And then accept (this is risky), the name in the list as a synonym or as a proper name or what you submitted. In this way you could correct a lot. But still remember you're making a choice here. You're trusting that there is a magical number here that will save you a lot of work. Which magical number is this? I won't tell you. I am not sure, I don't know. I can trust this one. I will distrust 50 and 48% because these species have been equated to genus. But where does this one go? Well, 97, 99, etc were misspellings so they were detected. This one is a lower level, it was also detected. It's up to me to decide where I put limits. But if that's corrected, you might have a long list checked with the right names, in a very short time. Another possibility here is low scores (0.4, 0.5 0.4), you see prunus has been equated to pinus a completely different species. Because of these errors, you might want to discard all the lower matches. Okay, and those are the species you might need to check one by one. We accept according to the word census, which is simply to write names that you want to be checked. You will get your results in this file that you can easily put into our data sheet or whatever. And you will get all the data. So basically this can be done as a single query which allows a lot of automatic searches which basically construct the queries and submit a set of queries to the service. Basically, this is the most efficient way to this. But still, this is a way to automate correct names and to reduce the spectrum of different names to the set of unique names that gives you the idea of how many species you have in your data set. How you do that is up to you, but just remember there are three golden rules in this: 1. Always keep your original data set. 2. Always record whatever changes you make. And rule 3. Always apply rules one and two. You record what you do so you can always go back, you never miss or lose original data. So what is the workflow for taxonomists? Get a new file list, which normally you do it by pivoting the names, you put your list in Excel and use pivot table that will tell you how many different cases of names are there. Then you try to check this pivot table against the taxonomy, We'll then go ahead to check the names, remove misspellings and we should be careful with homonyms and mix matches, whatever doesn't fit in the first part will have to be remodeled. Another problem that I haven't covered because of time is checking the range. The species might yet to be placed which is outside the range. This can mean two things, either it's an error, or it's a new paper for you. So you have to check. You might start the range of a species, but you have to be sure that the species was there. And so that's what I want to say about taxon names.

Video Details

Duration: 13 minutes and 45 seconds
Language: English
License: Dotsub - Standard License
Genre: None
Views: 2
Posted by: townpeterson on Jul 26, 2016

This talk was presented in the course on National Biodiversity Diagnoses, an advanced course focused on developing summaries of state of knowledge of particular taxa for countries and regions. The workshop was held in Entebbe, Uganda, during 12-17 January 2015. Workshop organized by the Biodiversity Informatics Training Curriculum, with funding from the JRS Biodiversity Foundation.

Caption and Translate

    Sign In/Register for Dotsub above to caption this video.