Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

BITC / Data Capture - What is Biodiversity Data

0 (0 Likes / 0 Dislikes)
So, we have biodiversity out there in the world. We know it's somewhere between 5 million and maybe 50 million species. Something like that. Nobody has a good idea. But, the question is how do we structure really good quality data to describe that biodiversity as efficiently as possible? And so, there are some things I want to talk with you about as far as what is really good information. What should it look like? And, what can you look for when you see a new initiative or a new idea and you need to decide 'is this worth my time?' Or, 'is this not worth my time?' We're going to talk a little bit about the difference between real data and what I call 'smoke and mirrors'. Illusions. We can call this field biodiversity informatics: the application of informatics techniques to biodiversity information from improved capture, management improvement, analysis, and interpretation. That's this field that we're talking about: biodiversity informatics. Next, John is going to give you a detailed presentation of this idea: Darwin Core. There are some really interesting aspects of it; but, what Darwin Core comes down to is seeking an essential set of descriptors. If I want to communicate to you the essence of a biodiversity data record, the idea is that with this set of fields I can communicate that essence. It kind of distills down to just taxon, place, and time. Is that fair, John? [John] Yeah. [Town] It just tells us who, where, and when. That's really what we're talking about. Of course, there's a lot of detail behind those big generalities. This is a very good paper to read. We'll put it on a USB key for you all. This is a documentation of the Darwin Core idea. This is a little bit old, but here's what I was just talking to you about with Darwin Core summarizing the taxonomy, the place in space and time; and, then you can see some links out to other types of data. And, you can see some links out to sharing data. You can see Darwin Core allows us to link very different things -like genomics with taxonomy- by being this nexus of descriptors of primary biodiversity data. We're going to talk about what primary biodiversity data is now. A primary biodiversity datum places a particular individual or population of a taxon at a particular place at a particular time. In some sense, it's related to the observation of an individual or a population at one point in space and time. That's a really critical definition. The contrast is secondary biodiversity data. This is information that has somehow been synthesized, processed, or interpreted. Usually, secondary biodiversity data are based on primary biodiversity data. But there's a world of difference between these two types of data. Primary data offer information without any subjectivity, without assumptions, without interpretation, and perhaps most importantly, without information loss. Which is to say, if the Minister of the Environment of your country asks for a summary of distributions of endangered species in your country, it's perfectly good to create secondary biodiversity data products. The Minister of the Environment doesn't want to see data. Right? He or she wants to see information synthesized. So you end up giving that person maps and hotspots and interpretations. That does, in some sense, describe where the species is or the species are, but there's a lot of interpretation in there. If we're talking among scientists, like the people in this room, we want that unitary, fundamental primary data; because, that doesn't depend on assumptions and interpretation. Another set of qualities that's very well connected to primary versus secondary is what I would call 'research grade' data. This is data that is of sufficient quality that you can base publishable, cutting-edge research on those data. What you will find is that biodiversity information is often converted to secondary information. I'll call it dumbed down. There's a lot of information content in the label of each insect in the collection. Or, each herbarium sheet. In some sense, when initiatives aim to share information, they sometimes fall into the trap of simplifying, organizing, synthesizing, interpreting, summarizing. That's all great - for the policymakers. For the decision-makers. For the general public. But scientists need research-grade data. We don't need that interpretation; we want to do our own interpretations. This is something I want you to think about when you look at the broader world of biodiversity informatics. Some of the things that you look at, if you think about it really hard, you're going to say, 'that's secondary information. That's not primary'. Or, you're going to say, 'that's not research-grade data.' Real improvements to biodiversity informatics infrastructure will have these kinds of qualities. will have these kinds of qualities. This is not my usual style. Effective, efficient, novel, inspiring... but the idea is very, very important. Real improvements to our data world as scientists need to be effective and efficient. They need to be founded on primary research-grade data. They need to be sustainable and permanent. One of the things we're going to be talking about a lot in this course is not just fixing up a dataset so that you can do an analysis and publish your paper. But rather, digitizing, preparing, improving, and documenting a dataset so that you and a million others into the future can use those data. So, sustainable and permanent. Reliable. Publishable. Remember, we talked about that. But also, these crazy things like novel and inspiring. Really good data will end up producing future science just because those data are so information rich. That's a set of qualities that we should always be thinking about. I want to give you two examples. And, I have the bad habit of speaking my mind. It gets me in trouble a lot. So, here we go. Here's National Science Foundation website. We have an award of almost a half million dollars. And the name of the project: Map of Life: An Infrastructure for Integrating Global Species Distribution Knowledge. Am I going to get myself in trouble, John? [John] Of course. [Town] Okay. So, that sounds great. Right? Global species distribution knowledge. Map of life. This sounds wonderful. So I go to their site and I took a very, typically African species. Of course a bird because I'm an ornithologist. This is <i>Afropavo congensis</i>. It's a species of pheasant that was discovered only in the middle 20th century. Its closest relatives are in Asia. It's a really cool bird. It's a peacock that's endemic to the Congo basin. And look at this. I got five data sources. And, I've got these different maps. couple different ones in green. I see some points in there. I see some less smooth shapes. A lot of information there. But there are some really interesting things about it. First of all, this shape. Why is it shaped that way? Does <i>Afropavo congensis</i> occur literally in every one of those places within that shape? Probably not. There are probably some clearings for farm fields; and, I doubt this forest pheasant is out in those fields. There are some cities. I doubt the species is there. So, somebody did some interpretation. Sometimes it's an expert who draws a line. I've been involved in those exercises. Sometimes it's a model. Maybe this came from an ecological niche model. These point data are somehow more fundamental, more primary. But why do we have point data only from the eastern side? There's a lot that I don't understand about this. I could go back to the sources. But also remember, I want those research-grade data. I want that primary data. What I am seeing here is a lot of secondary data There's a lot of interpretation that went into making those nice maps. My more fundamental question is, can I download any of this? I still haven't found a button on this webpage that allows me to download anything. Remember my list of qualities of biodiversity data infrastructure improvements. Can I do research with these data? No. I can just look at them. I pull it up on my web browser. I look at it. I feel richer. And, I go away. But, I can't do research. A lot of these data are secondary, so I don't want to do research with them. I want the primary data. This project, for me at least, doesn't make the grade as far as being a real infrastructure improvement for biodiversity. Here's a different project. John knows a little bit about this project. This is VertNet. [John] Are you going to get yourself in trouble again, Town? [Town] No. This is a day and night sort of thing. VertNet is all about primary data. There's no interpretation. The data flow through the VertNet facility and out to the researcher. This is a new portal that John and others have developed. You go straight to the data. I picked another African bird. Here's a record from Ivory Coast. You get the full locality. You can get all of the data very conveniently. You can get a map. In some cases, you can even link to media. You can see your data geographically or textually. And, perhaps most important, is that all of the data available. There is no filtering. There is no preprocessing. There is no dumbing down. And, nothing's held back. Just to sum up these ideas... In biodiversity informatics initiatives, we really, really need primary data. No interpretation. No processing. We need research-grade data. We need access to all of this. Access as far as getting the data into your realm of being able to work. Onto your computer. Or onto your workstation. We're really looking for genuine infrastructure improvements that make a difference to research and that are not just what we call smoke and mirrors - illusions. Something that you can look at and then, when your Internet connection goes away, so do the data. Always, when you're looking at these initiatives -be it what our experts are going to show you. Be it what I'm going to show you. Be it what you see when you go back to your country and you read some newsletter that announces a new initiative: 'Biodiversity of the World.' Whatever that the flavor of the day is. You need to ask whether that initiative makes your research world more effective. Okay? That's just a commentary about why are we doing this. This is a complex new field. There's a lot of movement; but, not all of the movement is forward movement. I hope you guys will take these ideas to heart. So, next time you read the newsletter that says "new initiative", -I can just say things like, 'GBIF.' That's new as of 12 years ago. The new flavor is IPBES. Some of you will have heard of IPBES. It's the Intergovernmental Platform for Biodiversity and Ecosystem Services. You have to ask yourself, 'does this make my research world better?' If you read the different newsletters and websites and things like that, you're going to see a lot of new initiatives. And, just ask yourself, 'is it real? Is it primary? Is it research-grade? And, does it make a difference?' Any questions?

Video Details

Duration: 17 minutes and 48 seconds
Language: English
License: Dotsub - Standard License
Genre: None
Views: 4
Posted by: townpeterson on Jun 22, 2016

In English. Portion of course that covers biodiversity data capture, held 13-22 January 2014, in Accra, Ghana. Experts included Melissa Tulig, Kim Watson, Christiane Weirauch, John Wieczorek, and Town Peterson.

Caption and Translate

    Sign In/Register for Dotsub to translate this video.