Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

BITC / Biodiversity Diagnoses - Basic Patterns 2

0 (0 Likes / 0 Dislikes)
But usually you go on and there is a set of species accounts that gives detail information. And really I didn't find that in this set of monographs, it was more of a summary. So I went to another monograph that I know a bit better, this is birds of the northern Peten. The Peten is the Yucatan Peninsula based in Mexico and Guatemala. So here, this is a very nicely structured monograph with scientific name, authority, common names, etc. And then look at this, a locality and some information; date, numbers, another locality, date and numbers, we don't care about measurement for this purpose. And you can see this is a nice list of biodiversity data and your only challenge is how to get those data out. So, whereas the specimens were highly distributed and dispersed, the monograph is highly centralized. You've got some technical problems on how to get your data out of the monograph because they're usually old books. Those technical problems can be addressed either by brute force which is sitting down and capturing the data, or potentially by technological means (scan, optical character recognition and by searching the raw text for records e.g. from Cameroon Mts.). So, the drawbacks of the monograph are that it ends at some point (1932 for the birds of the Belgium-Congo), but you at least have one expert giving a you a consistent view of the taxonomy. Many regions don't have a monograph and many monographs don't provide the data in the format you really want the data. This is somehow secondary. So then, as you well know, we can also go online and get the data we're wanting. Essentially the progression approximately was that by the 1980s a lot of major collections were capturing their data digitally, but there wasn't a really good way of sharing those data. So we'll do things like write letters and ask for a disc to have an excel file put on it and that will be mailed back. And then collections were able to put up searchable databases online into the 1990s. But it was extremely inefficient because one collection would use one field structure, and another collection would use another field structure, so you'll get a date in one format and another date in another format and you'll have to combine those data very different data sets. So a very key step was a development of a common vocabulary called Darwin Core. And that was done in the building next to my office. The first version was done by a bunch of collection experts who were concerned about how to communicate the essence, not every last detail. For example, like mammal people who want to measure every detail of a specimen (e.g. ear, foot). And that's not really the essence. As Arturo said, you want to know what species, where, when. And then you can through in some ancillary data such as number of dividual, sex, age, etc. But really the essence is what, where and when? And so the Darwin Core revolve around developing a relatively small list of fields that could describe what, where, and when of biodiversity records fairly efficiently. And the Darwin core grew over the years because people working on different systems made their inputs. And then eventually, the Darwin Core was turned into an international standard. And that's very nice because now it's documented and quite formal. And so building on that, first there were North American efforts (Species Analyst) and South American efforts (Specieslink). Species Analyst turned into Vertebrate Network (VertNet) in North America which gave birth to GBIF. And now you have networks all around the world, the Australians were very big also. The biggest that you've already seen, not necessarily the most quality-controlled is GBIF. Which has half a billion records, so it's unbelievable and massive. You might want to worry about the amount of error, but what really matter is not the amount of error, but rather the ratio of signal to noise. And there are ways that certainly GBIF could invest time and resources in reducing the noise, but very generally it's a spectacular resource. So just for fun I did a GBIF query on Uganda, 345 data sets contribute data on Uganda biodiversity, there were 171,304 records coming from 26 countries. Now as I said, those data have been frustrating. Worldwide there are the latitude/longitude coordinates associated with the country code Uganda, so you see we have some problems, regionally and Africa-wide. Some of these we might be able to guess what the problem is, I'm wondering about a north-south problem. But right away you can see the problems, which means you need to work a lot more on data cleaning. If you contrast the work between this and working with a data from a monograph, Someone a hundred years ago dedicated his/her life to developing that monographic synthesis. That is the birds of, or reptiles of, and did the data cleaning for you. Although you will still have to do data cleaning in the sense of making sure the georeferencing is correct, typos, etc. But really, that is the value of working with a curated dataset. Now the value of working with this is it's a huge amount of data and you can take advantage of sources that the monographer didn't have access to. So it's all balanced. And then the last source that I want to layout for you is that of the de novo field work and so I figured out I should show you a few pictures of my own field work. And this was my most recent expedition which was too long ago in Mongolia in Central Asia. This belt is Taiga, the northern Boreal forest, and this belt is the Gobi Desert. And what I want you to focus on is this, third of Gobi Desert in Mongolia, those are any records of birds in the global digital accessible knowledge store. And so out of those five hundred million records of anything, about two hundred and ten million are records of birds. And out of two hundred and ten million records of birds, no record is there in that part of Mongolia. So over the course of four expeditions, we assembled this set of sampling. And essentially all of this is that quarter from wish there was no data. And at the end of the expedition that I did in 2011, I thought maybe I will put together a compendium of the new records at the level of states within Mongolia. And there were so many new records that I wasn't able to finish the compilation and we send another expedition two years later. And that brought back a whole bunch of new records. So it's literally to the point where almost all of the information we brought back was a surprising useful record. So the field work was pretty brutal, this is an old Soviet missile transport which ate gasoline like you can't imagine. It broke down every third day, but it took us across the country. At this site, this tent the first night the wind blew so had that it would pick up a tent and carry it off. So I was walking around the truck this way midnight and felt something really big moved by me and I'm thinking it was a horse or something like that, and then I realized it was the tent with a backpack and sleeping bed inside rolling off into the desert. Two guys had to run to catch it and bring it back. So this was a pretty amazing adventure, those are some of the landscapes, and my favorite was a day spent in an oasis. And I spent the whole day photographing salt crystals. And there were hundreds of different shapes. Some of the birds, and then we end up in the Altai mountains. So this was highland site but there were minute forest patches on some the highland slopes and then there was this gallery forest, so that was the real treat. Because those were real seriously isolated islands, so that Altai site is right there. It took us seven days to get back to the capital. So if you want to do this de novo filed work, many times the holes in the coverage are holes in coverage for a very good reason, they are hard to get to, so it's very expensive and time-consuming. But the advantage is you control your own faith, you get good data quality, you check the identification, and you check the geographic references. So all I'm trying to emphasize to you is we have multiple sources of information and then every single one of them has an advantage and a disadvantage. So with these regional diagnoses, ideally what we do is we get comprehensive coverage of existing data. So really what I'm saying is take all three of the first sources that I outline for you, with that you can develop an analysis of inventory completeness. Once you have that you go ahead to identify gaps in coverage, and we'll talk about that for the next two days. Once you understand completeness and gaps you can essentially see what biological lessons can be learned and then go on to translate that into things of interest for biodiversity conservation. So that's an umbrella picture for today and succeeding days. Any questions about anything I have thrown out?

Video Details

Duration: 15 minutes and 12 seconds
Language: English
License: Dotsub - Standard License
Genre: None
Views: 0
Posted by: townpeterson on Aug 30, 2016

This talk was presented in the course on National Biodiversity Diagnoses, an advanced course focused on developing summaries of state of knowledge of particular taxa for countries and regions. The workshop was held in Entebbe, Uganda, during 12-17 January 2015. Workshop organized by the Biodiversity Informatics Training Curriculum, with funding from the JRS Biodiversity Foundation.

Caption and Translate

    Sign In/Register for Dotsub above to caption this video.