Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

BITC / Biodiversity Inventories - Introduction to the BITC II

0 (0 Likes / 0 Dislikes)
Five West African countries collaborating to capture the data from 12 world herbaria. They will capture those data that are relevant to West Africa, process and digitize those data in West Africa, quality control the data, georeference the data, and then return the data to the institutions that hold the specimens under some understanding that those institutions will then make those data available globally, particularly to West Africa. And so that's going to rely on this sort of technique. If you did it by hand without these assists from technology, it would take too long. It would not be feasible. A still harder task is to deal with three-dimensional collections. The big nightmare in the biodiversity world is insects because they're small, collections are huge, and worst of all, a specimen is a pin that goes down through the insect and then has a stack of tags. Sometimes it will be 5 or 10 tags. And each tag has one little bit of information. And they're written in this microscopic script. I could never be an entomologist just because I can't see and write that small. The technology assists get even better. I'm going to show you a series of slides from an initiative within Brazil. I've had the good fortune to be connected with this initiative a bit, but this is the Virtual Herbarium of Flora and Fungi of Brazil. It's a national virtual institute of Brazil. It connects a huge number of collections and so you can very easily do a query. This was for some plant. From each of these collections, we can see this total number of records. There were 81 records in total. And then there were two forms of georeferencing. You can see 60 of the 80 have a latitude-longitude coordinate pair. There's some information about the content and the format. Then there's also the ability to map the data. And, this is really exciting - it's photographs. One of the things we'll be talking about later in the week, officially in a separate course, will be species descriptions. One of the biggest constraints on being able to do those descriptions is access to the literature. Here in Cameroon you have the biodiversity right here. You have some access to the literature. Not global. Not ideal. But some access. But in many cases what you need to do is to take your specimen -it might be a bird, it might be a moth, it might be a plant- you need to take your specimen and put it right next to the type specimen. And we'll talk about what types are. But that type specimen, sometimes it's just lost.; in which case, you can't make that comparison. And sometimes it's sitting in a museum in Britain or in the US or what have you. And in that case, you really end up waiting until you can go and make that comparison or a friend can make that comparison. And that ends up being a huge bottleneck in this process. These images are not a replacement for actually looking at the specimen; but, they can make a huge difference. They're very high quality images. In a situation like this, I pulled up a holotype and a paratype. But, using the software tools of the virtual herbarium, you could put up your specimen here next to the holotype. And these are images that are taken with very, very good precision. You can zoom in and look at, at least some of the mesostructure, if not the microstructure of the plant. This is a very exciting time where we start to make museums that used to be in a single building in a single city in a single country, we start to make those global. This is far afield from the topic of this course, but there are other technology assists. There now software facilities and internet facilities that allow us to georeference localities in an automated or semi-automated way. So, this is in Brazil and we fed in "Brazil, Rio de Janeiro." You all know Rio de Janeiro as a city, which is actually down here. Right here. But, it's also a state. I figured that out because it says 5 km north of Teresópolis. And, Teresópolis is a city up in the mountains of Rio de Janeiro state. So feeding that string of text into this facility, the facility came out with three different interpretations. It was pretty easy to figure out that that was not Rio de Janeiro city because then there would be two cities in the string. So if that's the state, then the next question is whether Teresópolis is a city within that state. Answer: 'yes'. Then, 5 km north. That's pretty easy. But, the question can be, 'what is Teresópolis?' Is it a point? It's actually a polygon because Teresópolis is a medium-sized city. So you can come up with a number of different interpretations. That's where we jump between full automation, which is to say just feed all of the localities in Cameroon or all of the localities in Brazil into a facility like this and take the latitude longitude as it comes out. Or, we probably want a supervisory step where somebody who knows the region and maybe even the taxon can step in and look and say, 'I'm guessing it's actually B and not A or C'. And so we can automate or we can semi-automate this process, but it's a lot faster than doing it by hand. Okay? Other technology assists come out of sharing the data. If we have data in three or four herbaria around Cameroon -obviously, those herbarium sheets are wonderful and rich documentation of the plants of Cameroon- but, they become more when we put the data together. They become still more when we put the data together and allow other people around the world -scientists- to access those data and play with the data. That's what's been done in Brazil. This is the SpeciesLink network that is the basis for the virtual herbarium. SpeciesLink does animals as well as plants and fungi. This is the raw data from SpeciesLink; and there are some things that you should see in here. Obviously, that's Brazil; that makes sense. Most of the data where you see darker, warmer colors, that's a lot of data records. And, where you see light colors, that's few data records. And, where you see the background, that's zero. You can see most of the data in these Brazilian institutions is in Brazil. That's good. There's some data up into Venezuela, Colombia, Peru, and Argentina. That's to be expected. There's some marine data. And, indeed, there are collections of marine algae and marine plants and marine fishes and things like that. So that's good. So can anybody tell me what this is? Notice that there is not intense sampling, but there is some sampling along the east coast of Africa and then broadly out into the Indian Ocean. Anybody have an idea what that is? [to instructors] Don't answer. [laughter] [to participants] Anybody work with GIS? Okay, I see some hands. You guys have got to start asking questions. When you represent the Eastern Hemisphere, this hemisphere, in GIS, you use a positive longitude. When you represent our side of the world in GIS, you use a negative longitude. If you make a mistake and forget the negative sign before a Brazilian longitude you get this. Here is the prime meridian. Notice that this is the shape of Brazil reflected across the prime meridian. Just like that. Those are places where people in the Western Hemisphere forgot that we are on the negative side of the world. And in fact, you will also sometimes see it reflected like this and like this because we mess up the signs amongst other hemispheres. So, that's one thing that you see from this immediately -that we have some problems with longitude data. You can also see this line and this line. Anybody have an idea what those are? [unidentifiable participant response] [Town] That looks like it goes through London, so I'm guessing it's the prime meridian. And this is the equator. How do you generate that error? If either latitude or longitude, instead of having the correct value, is just put at zero. Sometimes people represent missing data with a zero. They should represent missing data with a missing datum, but we frequently get a latitude-longitude pair that might be 32-20, but the 20 gets lost and it becomes a zero. That generates these crosses. They're usually centered right on 0-0. There are software tools that allow us to pick out these errors and think about them. This is still this Brazilian project, SpeciesLink, and this is a diagnostic of one collection. I don't know if you guys have seen this, but back when I was little, you would sometimes see government documents that were redacted. Where they blacked out the names of individuals and things like that. I blacked out the name of the collection just to not make anybody feel bad. This collection serves 235,000 records. And then, all the rest of this is a diagnostic. Of the 235,000 records, 158,000 have no coordinates. Right away, you see this is a collection that could benefit hugely from a georeferencing effort. These 873 records are in the ocean. Maybe that's fine. I help run a bird collection and most of our specimens are on land; some of them are from the water. That's fine. But if you say, 'we're a terrestrial collection,' then those records from the ocean are probably errors. You can look at repeated records. For example, if there are two records with the same catalog number; catalog numbers are supposed to be unique. Or, there are 1809 records where the entire record is repeated. All of the data are the same. Those are just errors. Then there are outliers in terms of country. There are latitude and longitude are the same number, which does happen, but not frequently. For each of these categories of reasons why you should worry, you can get a diagnostic. This is a diagnostic through time of just the number of records. You can see this collection did a big initial digitization push, and then accumulated records at a slower pace through time. But then, these are suspicious genera. Genera that maybe weren't on an authority list. You can see they built up 2000 or more records that had suspicious genera, and then somebody went through and cleaned the data up. And then, 10,000 or 11,000 records accumulated that had suspicious genera, and somebody went in and cleaned those up. This is what Mark and I have been doing together for 21 years where you look at the catalog and say, 'you know, we've got a mess in this field. Maybe we should go through and do some cleaning.' And, that cleaning produces this. Or here for duplicate records, they had built up quite a large number of duplicate records, probably by some database duplication error, and then somebody went in and cleaned it up. So, these tools really are a huge assist in the curation of collections. That's all internal housekeeping; that's making data better and better. But it still doesn't solve the problem of access mentioned by Dr. Mafane. You're here in Cameroon. There's unique biodiversity all over the country, but especially right here in your backyard. And all the information is sitting on another continent. So, a next step is to integrate and share those data. SpeciesLink in Brazil integrates 250 collections -I think the number is quite a bit higher now- and 5.2 million records online that correspond to almost 400,000 species. This is what the network looks like. The center is in a little city called Campinas. 'Little' meaning several million people. Each of these points is a node that contributes data to their national network. And their national network contributes data to the global network.

Video Details

Duration: 18 minutes and 3 seconds
Language: English
License: Dotsub - Standard License
Genre: None
Views: 1
Posted by: townpeterson on Jun 22, 2016

This talk was presented in the course on Biodiversity Inventories, an advanced course focused on developing complete inventories of species present at sites. The workshop was held in Buea, Cameroon, during 2-5 March 2015. Workshop organized by the Biodiversity Informatics Training Curriculum, with funding from the JRS Biodiversity Foundation. Instructors included David Blackburn, Rafe Brown, Town Peterson, Mark Robbins, and Moses Sainge

Caption and Translate

    Sign In/Register for Dotsub to translate this video.