Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

BITC / Biodiversity Inventories - Introduction to the BITC III

0 (0 Likes / 0 Dislikes)
So, we can just keep making this challenge bigger and bigger. One thing is just piling data into a big, global stash. Where we all can go and start playing with the data. That's at this level. We have -in Brazil, we have millions- and now globally, it's more than one half billion records that are online and openly available that we can access from this room. But, then we can start being smarter about it. For example, this is an integration diagram about Burma. You can see this is Burma misspelled because it has two r's. That can be put through some services that look up, standardize, and pick out errors so that incorrect version can then be equated to correct names. In fact, Burma bounces between two names. And, Cameroon, for example, with two o's or 'ou'. And we all know what those two names mean. It's easier than 'Burma' and 'Myanmar'. We know that it's Cameroon. But if you're searching on a database somewhere, and you use 'C-a-m-e-r-o-u-n', you may not see the records that are stored under 'C-a-m-e-r-o-o-n'. So we can do technologically assisted data cleaning that really make our data better and better, particularly once they're shared. Because, then we can look across a bigger and bigger population of data. So, now we've got a ton of data available, cleaned up, and integrated. But we still need to see the bigger picture. This can take us into the challenges of analysis and interpretation. A few years ago, several of us published this diagram. I kind of like the diagram less and less with time. But, you can think of these big processes. We have genotype, which translates into phenotype. And, phenotypes of different species can interact. And species can interact with their environments and with humans. And some of the second level phenomena would be ecology, and biodiversity loss, which is frequently ecology crossed with human activity. And then, we have all of these data products. Phylogenetic trees. Maps. Conservation strategies. And, really, we're not going to get to really good quality products -like a conservation strategy- unless we have a lot of good information about all of this. And so, we only have little bits of that information. For example, GenBank with genotype data. And, for distributions of species, we just talked about that. But really, our view of this broader perspective is very fragmented. For example, there's almost no organized information about interactions among species. Maybe because it's hard to organize. Okay? Or, something that Dave Blackburn can talk more about than I can is good information about phenotype. It's hard to describe a frog, a beetle, and a plant in the same terms. So this is one big picture. But, the point is that this is a big world, of very diverse information. And the big payoffs in this world of information will come from linking across these worlds. For example, a conservation prioritization might link between geographic distributions of species, taxonomy, and biodiversity loss. If you can link those three, you can get to an effective conservation and management strategy. But that's not easy. Linking these worlds of information is hard. That was all an introduction to biodiversity informatics. We can call this field the application of informatics techniques to biodiversity information for improved capture, cleaning, management, improvement, analysis, and interpretation. A very useless definition. But, I think it helps us to think about the full breadth of this challenge. Essentially, what this pair of courses is about is a lot capture We're going to go out to the field. Here in Cameroon. Or those of you who come from other countries, in your home countries. You go out to the field. You make observations. And you turn that into data. Okay? There will also be some analysis and interpretation when we start talking about species descriptions. Just a couple comments about the field. It is a young field. But I also think it's a field that has been born rather poorly. You can think about this field from the definition I just gave you. It's going to bounce between data availability (what data are digital?) and technology. I just provided many comments and ideas about available technologies. But then, we have these ideas and concepts. Things like the theory of evolution and basic ideas from ecology. And so we have these three (not the only three) worlds of possibilities. And I think you could build an argument that, in biodiversity informatics, much of the activity has been over here on the technology and data side. And the technology and data, have driven which ideas and concepts we have explored. That's not how science should happen. Science should happen by us coming in with ideas and concepts. An idea we want to test. Some mechanism. Some process that we want to explore. The ideas and concepts should be driving the technology and data that get developed. But biodiversity informatics has run the other way. The ideas and the concepts have been driven, not exclusively, but in large part, by the data and the technology. That, to me, is a criticism of this young field. Another point that I'd like to make -each of these points is the subject of another week or two long course- but, we can ask where we lose this information. Okay? If we want to know, right now, about the beetles of Cameroon, -right now, in this room- There's going to be a lot of literature we don't have access to. There's going to be a lot of specimen we don't have access to. But, there's also going to be a lot of information that you won't have access to if you're in London. Or in New York City. So, we can talk about digital accessible knowledge. Think about this as the pipe that brings water into this hotel. It may bring in a cubic meter per second of water. Which is far more the the hotel needs. But, maybe there's a leak in the main pipe that comes into the hotel. Or, maybe there are leaks all along the plumbing system. And so up in my room, I open the faucet and nothing comes out. Actually, the water here is fine. But, my point is, here's the main feed of biodiversity information. Some of it has been studied. This box has been studied. A huge amount of biodiversity information is 'work left to do.' And that's a career for each of you, and the rest of a career for me. That's what's really left to learn. But, somebody went out and sampled some subset of biodiversity. Some of that exists as specimens and observational records. This is specifically about distributional data. But, some of that sampling got lost. A museum burns down. A researcher dies without documenting all of their data. There are all sorts of ways that data that once existed can get 'lost.' But, let's say the specimen exists in a museum somewhere. Some of those specimens have been determined. They've been identified. And some haven't. Right away, those data are lost. Of the data that have been identified, most are still in analogue format. They're still in the form of paper tags with permanent ink on them. Some have been digitized. A lot haven't. So, in terms of digital accessible knowledge, all of those data are lost. Right? They're not part of the water that gets up to my room. Of the data are digitized, some have just been typed in by a technician and never looked at again. Those will be 'dirty' data. They'll be full of typographical errors, inconsistent descriptions, inconsistent localities. And, so, even digital data can get lost because they're 'dirty.' By a similar token, digital data can get lost for some purposes because they're not georeferenced. So, these represent really big leaks where data just get lost. Even when data are georeferenced and digital and clean, they may not be published. When I say 'published', I don't mean as a paper in a journal. I mean that they've been shared. They aren't made available publicly. They can be made available with adherence to global standards, or just in some local format. That's very difficult to integrate. And then, they could be integrated in what we could call a world museum. One virtual store of data where we could access of of this information. Each of these blue circles is a leak in our information plumbing. The red circle is the information that you can use at the end of the day. In these courses, I've built various examples. I should do one with Cameroon, but it's not the subject of this course. I've built examples about information loss. Usually, of the digital data -just from here to here- you lose about 80% of the data. From here to here, you may lose 90% of the data. So this is 20% of 10% of the original. Okay? I say this to you because, in this course, we're going to go out and we're going to add to this arrow. There will be biodiversity in Korup National Park that we're going to turn into sampled biodiversity. You might be able to make this arrow ten-fold larger. Or 1000-fold larger. But, if these leaks are still big, you make no difference here at the end. The data that Moses collects on the plants of Korup National Park, where he's been working for a long time, if those data are determined, digitized, cleaned, georeferenced, published, shared, and integrated, then the amount of usable information increases. But if those data get stopped or leaked at some point, nothing changes. One of the things that I've been pushing over the last several years, is to analyze the leaks. Analyze where we're losing information. Because sometimes you fix something here, and even though you fix it, it makes no difference. We could increase the amount of water coming into this hotel from 1 cubic meter per second to 2 cubic meters per second but if there are leaks all throughout the plumbing system, there will still be no water in your room. Okay? One last comment. Sorry, but I never have time to translate slides. This is coming into Mexico. Another comment is about developing countries taking control of their own destinies. This is purely correlational. And, if you've taken a beginning statistics class, you know that correlation does not equal causation. In this case, though, I believe that correlation does reflect causation. This is Mexico from 1825 to almost present. And, this is the number of publications per year about Mexican birds. There's the Mexican revolution. Relatively little happening. It wasn't a good time to be a scientist in Mexico. Then, what I want you to see is the proportion of those publications authored by Mexican scientists. That's the red line. Notice that that goes from mostly zero... There's the revolution where no Mexican scientists were participating. And that proportion comes up and up. You get into the 1990's and it's near half. By the 2000s, it's well above 50%. And, the proportion of non-Mexican authors drops below 50%. My point here is, in this period -1980 to present- Mexico took control of its biodiversity information. It created a group called CONABIO. It's a national commission. By various means, CONABIO united, shared, and integrated millions of records about Mexican biodiversity. In the National University and several other universities, there were big efforts aimed at data sharing. Again, this is correlation not causation. But my firm belief, is that those information sharing exercises are translating into Mexican scientists taking control of their own science. So, that's an introduction to biodiversity informatics.

Video Details

Duration: 18 minutes and 7 seconds
Language: English
License: Dotsub - Standard License
Genre: None
Views: 1
Posted by: townpeterson on Jun 22, 2016

This talk was presented in the course on Biodiversity Inventories, an advanced course focused on developing complete inventories of species present at sites. The workshop was held in Buea, Cameroon, during 2-5 March 2015. Workshop organized by the Biodiversity Informatics Training Curriculum, with funding from the JRS Biodiversity Foundation. Instructors included David Blackburn, Rafe Brown, Town Peterson, Mark Robbins, and Moses Sainge

Caption and Translate

    Sign In/Register for Dotsub to translate this video.