Watch videos with subtitles in your language, upload your videos, create your own subtitles! Click here to learn more on "how to Dotsub"

Social Translation

0 (0 Likes / 0 Dislikes)
Thank you very much for coming this morning As we are talking about Arabic content I would like to focus on Arabic digital content and in the next 30 minutes we would like to talk about the changes caused by the information and communication revolution in our way of dealing with knowledge and content in general and see how we have to change a little bit in our way of thinking to be able to cope with these changes So I would like to talk a bit about these changes as a start then I would like to suggest a new idea which may help us in translation and knowledge and content transfer from any language into Arabic but I will focus on English as the vast majority of content on the Internet that is currently available is available in English the technology did not work so we will go back to the wired microphone Sorry let me give you a quick glimpse into the way we used to innovate in the past innovation was a solitary process The scientist or poet of author used to spend countless hours alone writing an encyclopedia or a book in medicine or a novel or theatrical play the process was very solitary the result was deposited mostly in a paper medium a book or an encyclopedia most of it was printed on paper First they were manuscripts then they became printed with all the limitations in distribution that arise from confining the content in a physical (paper) medium Then came the Internet and the story started to change fundamental changes happened we became able to communicate with each other without the need for the paper medium and this was the single most significant change in the processes of innovation, authoring and content and knowledge production which was caused by digital technology i.e. saving information in digital format and the Internet, which enabled us to transfer this digital information around the world So what happened? The creativity and knowledge production process was transformed from a solitary process into a social experience We became able to work together even for example if I live in the UK with my friend Rami who lives in Jordan with my friend Ahmad who lives in San Francisco We are now capable to working together because there is no need for a physical medium anymore and we have an amazing speed in communication and knowledge production I can, for example, perform a specific part of the work that we need to produce and send it to Rami for review, and to add his own part -the legal aspect for example- then send it to me, then to Ahmed, etc. All of that happens at light speed I will call this concept: social creativity The examples are plenty, like Wikipedia for instance Wikipedia was impossible to happen without two things: digital technology, which allows the storage of information in digital format and the Internet, which allows the transfer of this digital information around the world at light speed Unfortunately, the Arabic Wikipedia currently suffers from a serious weakness in Arabic content which does not align with the prospects of the language itself. But we are hopeful that this will change But in other languages, specially in English Wikipedia became the 8th most popular website on the Internet This is very significant Another example I would like to present about social creativity is the Linux operating system and the wide spectrum of Free and Open Source Software which were only possible because through the Internet tens of thousands of developers were able to collaborate and develop software application I think we are now beyond debating its quality or it being fit for purpose because it is now a viable competitor like for example on the web Apache is an open source web server. No one can claim that Apache is not good, or that there is a better commercial alternative There are many other examples but I liked to mentioned these two because they are well known And the result: The result is that we are expecting this year to produce 1.5 exabyte of content 1.5 exabyte means 15 with 17 zeros beside it (byte) An enormous number To show you the significance of this number, let us compare it In one year, we will create content that equates to all what have been created in human history for 5000 years! will be created in one year Next year, the number will become even more significant and we will produce much more content My idea here is that these factors affecting content production and distribution processes and the massive amounts being produced should be taken into account when we think about enriching Arabic content because it gives us an idea about the acceleration in content production be it knowledge, art, literature or entertainment we will get an idea about the scale of this acceleration and at the same time we get an idea about the new mechanisms for knowledge production most of which is social exploiting the best of both worlds: digital technology and the Internet to enable us to build website like Wikipedia and software like Linux I will move now to give you an quick view about the Arabic digital content in particular I love this graph because it shows us from the years 2000 - 2007 the growth of Internet users Th violet line is the growth in the Arab region almost 950% in 7 years compared to to the global growth in blue as we can see the growth of the number of Arab Internet users is very high very very high indeed People are learning about the Internet, and their interest grows then they get hooked and at the same time while Arab users constitute 2.5% of users on the Internet Arabic content does not constitute more than 0.3 % from the content on the Internet We are not asking much but we ask that these two lines at least become closer to each other as we constitute 2.5% of the users on the Internet then we should have at least 2-3 % of the Internet's content specially that content production is no longer a solitary process, requiring countless hours spent by individuals alone in the office to write or create content but we can all work together so each of use can contribute a small amount, according to his circumstances and committments using social technologies like Wikipedia like Free and Open Source Software development processes Dr. Mohammad Al Kanhal showed us yesterday a great slide indicating that 70% of Internet users in Saudi Arabia use an Arabic interface for their operating system to access the Internet The statistics I have are not only for Saudi Arabia only, but for the Arab world and Arab Internet users 65% of which do not speak English This is really significant No one can now come and say that any Internet user should be default know English and that there is no need to develop Arabic content as users use computers and the Internet which means they can read the English Wikipedia and read all the information resources available in English Unfortunately, this is very far from truth as long as 65% of Arab Internet users do not speak English I tried to summarize the problems that we face in Arabic content in this cloud The cloud should be called "the wisdom of the crowds" but I will call this "the stupidity of the individual", because I was the only one who wrote it I do not know whether the words and ideas mentioned here ring a bell that these may be some reasons for te weakness of Arabic content like for example the lack of linguisitic tools capable of dealing properly with Arabic there are no incentives to produce Arabic content properly niether at the volunteer nor the commercial levels not even companies that can build business and make profits from creating and distributing Arabic content Copyrigh laws. Mr. Rami told us yesterday about the differences in these laws among Arab countries but generally there are laws, but their enforcement is weak Search and indexing technologies also. Our friends at Google, we are so happy to have them here they will hopefully take these ideas to help us and within 6 months we will get good and strong Arabic search in Google Weakness of the collaborative and volunteer work philosophy in the Arab region also has impact specially when the vast majority or a very large portion of content on the Internet is now being produced socially, collaboratively and volunteerily We have in the region weaknesses in collaborative and volunteer work and the volunteer philosophy we do not have lots of linguistic research in Arabic language issues We have a generally weak reading culture in the Arab world and we do not have a strong transation movement translation is confined within a few publishing houses in Arab countries in Egypt, Saudi, UAE, Syria and Morocco but thats note enough to satisfy the requirements of the Internet age and that massive growth in the amount of content we are producing I do not like to think negatively So I will not call these problems, but let me call them opportunities lets call them opportunities for us, specially that we are gathering to solve any problems if they do exist or overcome the challenges of Arabic content on the Internet lets consider this cloud a starting point we can use and improve and add additional ideas and from it we can move towards reducing each of these factors until it does not constitute a challenge to the development of Arabic content What I want to suggest, and ask for your feedback on, is a new technology designed specifically to satisfy the requirements of content creation in the digital era and the Internet which are generally considered social approaches based on the wisdom of the crowds My own view is that traditional translation approaches do not match the social content production processes For example, it is very difficult to go to Wikipdia and pay for 5000 translators to translate Wikipedia from English into Arabic because by the time they will finish the first translation run the amount of change that would have happened on Wikipedia will be massive to seriously affect the value of the translation there will be significant change texts and content on the Internet are no longer static they change every second Yesterday, I was very excited by our friend Wael Al Ghonim from Google because for every statistic he showed he said this was from 3 months and now the numbers have changed this was from 2 months, and now the numbers have changed Mr. Souliman also said that he has modified his statistics yesterday because yesterday they were changed so as we can see, change is accelerating To give you an quick idea about how social translation works as a concept I will start by a piece of content be is a wiki page, Wikipedia or other or a discussion forum or a news website, such as BBC or blogs or even a paper book I will take this source and extract the content that needs to be translated I will get the exact content that I want to translate and take this content and break it down statistically into small fragments based on the information that I have translated in the past and you will see now how the picture starts to become complete I will get small fragments of this text to be translated say from English into Arabic I will take these fragments and distribute them to a network of translation computers translation machines which will also be distributed and have varying capabilities so for example there may be a machine translator specialized in engineering civil engineering lets say if the source text was about civil engineering, I will not send it to a machine translator specialize in arts but will rather send it to a machine translator specialized in engineering I think we all agree that despite the efforts invested in machine translation it is still incomparable to human translation you still when you enter some text into a machine translator we in Syria say as if it speaks Armenian in machine translation, the verb becomes objective for example it is simply unredable. If I will use one word to describe it it will be unreadable to solve this problem we use a hybrid approach that combines both machine and human translation after we receive the fragments which may be sentences or paragraphs the size changes according to our corpus we take these fragments and retrieve their translation from the machine translators and distribute them to a network of human translators this distribution is also based on the skills and capabilities of the translators so for example if I have a text in civil engineering I will know whom of the translators is experiences in civil engineering and will send the machine translated text to him/her the translator can now review the text to decide whether the quality was good so he will approve it without changes and if the text was not that good then he will edit and change some sentences as required I will then receive the translated fragments into Arabic for instance I will take these and add it to the linguistic corpus after that, and statistically as well I will reassemble the original content and give it to the reader There is an essential point here I will give the reader the right to give his feedback on the translation was the translation appropriate or not was it good, was the result readable or not this feedback, the reader's opinion will be returned to the linguistic corpus so that we can know which parts of the corpus are of high quality and what parts are of lower quality at the same time I will use it to determine the performance of human translators so that I can know if there was a translator, x for example who keeps getting negative feedback The system will adapt automatically and will not send this translator lots of translation tasks it can say at some point: wait a second, this person is incapable of contributing let me give you an example from the Free and Open Source Software world Someone may submit contributions to the Linux kernel he sends code fragments he wrote if the code was bad in the first time, and the second and the third he will reach a point when he will be told Sorry, but you are not able to write code for the Linux kernel This is the same concept. We borrow it from the Free and Open Source Software community and apply it on the translators community If I was to summarize the main components of this system they include a network of machine translators from computers we will have a network of human translators they contain two categories: volunteers and professionals I will pay for the professionals and there are many reasons to justify the existence of professionals you will see in many successful Free and Open Source software projects a foundation or a small company employing 20-30 developer and pay them salaries their task is to keep the momentum going and ensure the interaction between the volunteer community and the software itself this is the same idea that should be applied in the translators community to make sure that for example if I have a text that needs to be translated and there are no volunteers currently on the Internet to do the translation In this case, the professional translator comes just in time I also have a linguistic corpus between Arabic and English at the sentence and paragraph level and statistical algorithms to determine how to assemble the fragments I will get from the linguistic corpus based on its rating and quality There is also a communication protocol between machine translators and human translators on the Internet This protocol is very important as it includes what we call "presence" I should know at any given time how many translators are available and are able to volunteer and translate and at the same time what are their skill sets Do they work in arts, engineering or computer? I should know that, and there is a protocol to ensure that this interaction happens properly and without problems I also have a mechanism to manage the translators' reputation to be able to look at the translators' community and say: these are the top 10 translators those you will get reasonable translation from and those produce very bad translation that I should reduce their engagement with the system At the same time we have the accounting component So I can know in case I have professional translators how I am going to pay them based on the volume of their work The system should be able, without the need for accountants, to determine the amounts that should be paid The system should determine at the end of the month that this translator has translated a certain number of words and you should pay him 50 or 100 or 1000 Riyals based on the translation volume I should also have a system to pay the translators for their work I told you earlier about the presence management mechanism and the translation evaluation mechanism The reader should be able to decide the quality of the translation he received from the system Generally speaking, the features of this system are that it combines the advantages of both machine and human translation Machine translation is quick and can be repeated quickly and inexpensive Human translation is costly, but is of very high quality Machine translation is generally of low quality Human translation is expensive and slow With the system I can combine both and attempt to overcome the challenges of each With time, and with the growing linguistic corpus in addition to the feedback information collected from readers the quality of machine translation will improve because I will have a high quality linguistic corpus with a feedback system so I will know what translated fragments are the really good With time, the quality of machine translation increases with the increased size of this corpus because all the information that we will generate will be stored in this linguistic corpus this corpus will become an invaluable resource for empowering reseacrhers in the field of Arabic linguistics to be exploited in research to improve Arabic language indexing for example or improve Arabic searching mechanisms or grow machine translation Basically there are no limits to what can be done using this corpus that grows on a daily basis or even every hour in the same way in which Wikipedia grows and the same way in which other interactive and collaborative websites grow It is also suitable for Web 2.0 technologies Why? Because as I told you earlier Wikipedia is based on collaboration to create knowledge Now it I translate an article, and someone comes later and changes a single paragraph or a single sentence I do not have to re-send the complete page to a translator but I will send the changed section only to be translated and be inserted back into the original translation It can also be used to translate any type of content using Application Programming Interfaces (APIs) All what will happen, and I will tell you now about the platform that we have developed to test this system has an API to communicate with so you can send the source text without caring about all the component we have mentioned earlier because they will work transparently for you All what it takes is to send the text to receive it back after a short while completely translated Now how does that relate to open content which is the central theme of our gathering It was very important with the massive explosion of information on the Internet these were like the classic chicken and egg I mean do we say that open content led to the developed of Wikipedia or that Wikipedia had encouraged people to build more open content? In my opinion, both are interweaved they are very closely related Because what enabled Wikipedia to succeed was the existence of a collection of licenses that have overcome the limitations of traditional copyright protection regimes which impose stringent restrictions on the re-use, development and modification of intellectual content regardless of its format and at the same time the unprecedented growth of these movements like Free and Open Source Software and Wikipedia led many law researchers to develop appropriate legal devices to support this movement and facilitate its growth It all started with the Copyleft in Free and Open Source Software which then evolved into Creative Commons, about which Mr. Rami spoke to us in detail yesterday Here is a collection of the licenses available under Creative Commons and here are a few examples of free and open content sites and initiatives such as Arab Commons and Global Text Project which is a project that aims to create 1000 university textbooks in four languages: English, Arabic, Chinese and Spanish There is currently one Arabic book on the website and the sites are being continuously developed The central element in the social translation system I told you about earlier is the linguistic corpus The corpus that will we will build is available under an open license agreement so that any researcher or anyone working in the machine translation field or Arabic linguistic research can access this corpus and use and adapt it for use in his research free of charge because its open sourced Currently, it contains 600,000 sentences which the the outcome of two years work in Meedan I will show you now a quick example We hope for this to be a modest contribution to enrich Arabic language research This number will grow on a daily basis as we use this system every day in Meedan the size of the linguistic corpus will grow for anyone who wants to learn more, please contact me and I will do my best to help This is the website in which we are trying the social translation technology. Its called Meedan If you look at the translation quality this is the quality you will get from social translation which combines both machine and human translation This is Tech Crunch website translated through Meedan using social translation Do you know Twitter? Twitter offered a service called Twittrans In Twittrans you write the update that you want in English and ask for it to be translated into any language so it gets translated through human translation as they pay for translation, and this is compensated for through adverts For us in Meedan, this is the translation that you can read I will read a short section Today Twitter released a new service called Twittrans Twitter users can use the new service to get a quick translation for any short message into different languages This translation is much better than machine translation and costs less that human translation Thats all what I wanted to tell you about today and I am ready to discuss the details of our project in greater detail Thank you very much.

Video Details

Duration: 30 minutes and 13 seconds
Country: United States
Language: Arabic
Views: 414
Posted by: anas on Feb 9, 2009

Anas' presentation at the Arabic Open Content Workshop (Araboc) from 17-18 January 2009 at King Abdulaziz City for Science and Technology - Riyadh, Saudi Arabia

Caption and Translate

    Sign In/Register for Dotsub to translate this video.