>> From The Library of Congress in Washington, D.C. ^M00:00:06 [ Pause ] ^M00:00:29 >> My name is Jason Steinhauer, a Program Specialist with The John W. Kluge Center at The Library of Congress. Before we begin today's program please take a moment to check your cell phones and other electronic devices and please set them to silent. Thank you. I'll also make you aware that this afternoon's program is being filmed for placement on The Kluge Center website, as well as our YouTube and iTunes playlists. I encourage you to visit our website, loc.gov/kluge, k-l-u-g-e, to view other lectures delivered by current and past Kluge scholars here at The Kluge Center. Today's lecture is presented by The John W. Kluge Center at The Library of Congress. The Kluge Center is a vibrant scholar center on Capitol Hill that brings together scholars and researchers from around the world to stimulate and energize one another, to distill wisdom from The Library's rich resources, and to interact with policymakers and the public. The Center offers opportunities for senior scholars, postdoctoral fellows and Ph.D. candidates to conduct research in The Library of Congress collections. We also offer free public lectures, conferences, symposia, book talks, festivals, and we administer The Kluge Prize, which recognizes lifetime achievement in the study of humanity and which we will award in 2015. We are also celebrating our 15th Anniversary this year in 2015 with an event on June 11th called Scholar Fest, welcoming back 70 scholars from The Kluge Center for a day long series of panels and conversations. For more information about these events and to learn how to conduct your own research here in The Library of Congress collections I encourage you to sign up for our RSS feed, which is just on the table on the way out. Today's lecture is titled The Digital Traces of User-generated Content, How Social Media Data May Become the Historical Sources of the Future. Our speaker is Katrin Weller, one of the two inaugural Kluge Fellows in Digital Studies at The Library of Congress. The other Kluge Fellow in Digital Studies is Wendy Fok, who could not be with us today. This Fellowship was created in 2013 to examine the impact of the digital revolution on society, culture and international relations using The Library's collections and resources. Groundbreaking technological innovations can be agents of broad and profound change and their transformative affect on society can be greater than is anticipated or originally understood. Innovations, such as the printing press and aerial flight, continue to affect every level of the human experience and the digital revolution is another such transformation. Though it may be too early to know the full affects of the digital revolution on the human condition, to paraphrase former Director of The Kluge Center, Carolyn Brown, it is not too soon to ask the question. So Katrin's work lays at the intersection of digital humanities, information science, archival science, computational science and web science. Her work asks the question of what value user-generated data may be to the historians of the future and how we preserve and manage these new kinds of information sources. At The Library of Congress she has used The Library of Congress web archives to create case studies that recognize, decode and document the nature and characteristics of different social media platforms, and we will learn about her research and her work at The Library during her talk. Katrin Weller is a Senior Research Fellow at GESIS, the Leibniz Institute for the Social Sciences, and she's been there since 2013. There she works at the Data Archive for the Social Sciences and is leading the investigation of the role of social media data as new types of research data for social scientists. Her main research areas are social media, social networking services and web user collaboration, as well as social semantic web and web science and data management and documentation. She is the Author of Knowledge Representation in the Social Semantic Web and one of the Editors of the Volumes Science and the Internet and Twitter and Society. And now she has the distinction of being the first Kluge Fellow in Digital Studies to deliver a lecture at The Library of Congress. So please join me in welcoming Katrin Weller. ^M00:05:17 [ Applause ] ^M00:05:27 >> Katrin Weller: Thank you so much, Jason, for the kind introduction, and thanks to all of you here today for your interest in our topic. I'm happy to see so many people around today. When I talk to my family, my friends, sometimes even my colleagues about my work and study of social media data, studying social media, I still often get comments like, well, but isn't that just all people posting photos of their lunch or their coffee or something like that, maybe? And I brought to you a quote which is not from one of my family, I hope, I'm not sure, but a quote I found on CNN.com news articles, so not the article itself but a reader's comments to that article. And, obviously, that reader was surprised that Twitter had donated its entire database of tweets to The Library of Congress for preservation and that reader isn't very convinced that this is a good idea. So my job today is to convince you that this is, indeed, a good idea, that I really believe that social media data are something that should be preserved, they are something that will be a valuable source for historians in the future and that is already a very important source for other researchers today. Not only do I want to convince you that this is important I also want to raise awareness for the fact that if you do not deal with this now, if you do not think about ways to preserve social media data and other online data it might be too late and that a lot of that might be lost in the next couple of years, some of that might be lost already. So that's kind of my task today, to deliver that message to you. Let me start with looking at what I mean by social media and user-generated content, what do these terms actually mean. There is no one single definition for social media, everyone kind of thinks of that in a slightly different way. The term social media, social networking, Web 2.0, social web all came up around the early 2000s after phase where the internet economy went through something that was called the dot com crisis and some of the companies who were successful in reestablishing themselves after that time could do so because they found new ways to involve users into engaging with the platforms. So there were a lot of platforms coming up where you could actually without any technical programming skills, without any kind of background in hypertext, contribute your own photos, your own text could easily connect with other users, could share videos, audio files and so on and so forth. And this entire new scenario of users contributing their own content and connecting with other users was then described as social media. The term was selected mainly by people in the field of internet economy, computer science, so it's not - or it's often been criticized for not being something more social than any other form of media. That is not the case, but it's kind of the term they came up with. And that was something new in 2000. So we've got Twitter starting in 2006, YouTube 2005, Facebook 2004, something called Myspace in 2003, which already holds on one important aspect, these platforms may vanish again, not everything is here forever. Wikipedia 2001 and early forms of blogging were present since the 1990s. So, again, I started in 2006 with my countdown. In 2006 the Time magazine had this Person of the Year related and referring to this new age of the internet where everyone was contributing, was engaging with content in new forms. It says -- Yes, you. You control the Information Age, welcome to your world. This is kind of interesting because at that point it seemed kind of plausible, oh, it's all about you, getting connected, do interesting things. A couple of years later the new Time magazine cover, not the Person of the Year, but another Time magazine cover looked like this. So, okay, it is about you in some way or the other, but for most of the companies who run these platforms, who run these social media platforms it's also very much about getting your data, getting to know what you buy, what you like, what you want to do, who you are, and selling you new products, advertising things in one way or the other. ^M00:10:30 So this is kind of another dimension that goes beyond it. For people who, like me, want to do research with social media data the fact that it's all inside platforms that belong to single companies can be a really tough challenge because it's kind of hard to get in touch with them and hard to figure out things like that, and it's also like this image should also remind us and the entire users community that we clearly have to have a way to figure out how to deal with the sensitivity of this data that people contribute, that people share knowingly or not so knowingly through social media platforms. Okay, back to my coffee cups. So commented in the earlier quote used the phrase, worthless babble, there is a phrase often found in social media research, which is pointless babble and as far as it goes back to that study by Pear Analytics, which was one of the earliest studies on Twitter, where researchers manually coded tweets for what they are about, what information purpose they fulfill, and they ended up with something like, oh, 40% of all tweets are pointless babble, nothing really interesting in there. Only at that time also scholars would disagree with that, especially for example linguists who would believe that even if it's probably not that purposeful for the majority it might still give some insights into one's specific perspective or into language use or into other features. So it's not pointless babble, there's more to it. And I've brought to you two, three, as an extra reason why I think information in social media channels is valuable, there's something in it that you need to consider and that's something that needs to be preserved. And I'll start with the first one, that is it contains lots of information about everyday life, even starting from things like information about nutrition, actually posting your meals on a daily basis may be valuable information for some people and even your coffee cups. But there's more, people are sharing such things as ultrasound images of their own children through Twitter. People are blogging about very personal events from being diagnosed with cancer, to having issues, relationship issues, lots of these things are being shared with an online community and it may be interesting to read through for specific kind of research questions. At the moment these publicly available social media contents, like blogs, everything, you can read when looking on Twitter, it's already being studied by a couple of researchers who are usually not historians, but for example social scientists, media and communications studies makes up a huge share of current social media research, and probably the biggest is computer science. So a lot of what we know about what's going on in social media environments has been done by computer scientists, who may have a very different perspective than a historian who would look at the same data. Second example, a lot of important public figures are using social media data to promote themselves, to broadcast their perspectives. The one I selected here is probably quite well known here, but I've selected it because of something else, that is because of what you can read there, this account is run by Organizing for Action staff. Tweets from the President are signed -BO. So actually there's a distinction whether Barack Obama is tweeting himself, then you should find a little -BO at the end of the tweet, or whether his public relations people are doing that. Yesterday I scrolled through, I don't know, a couple of months' worth of Barack Obama's tweets, I didn't find a single one signed -BO, but maybe there were a couple more in the earlier years. There's a lot about that that I could discuss today, which I don't have the time for, but let me point out the little blue ribbon with a little dash next to Barack Obama's name this suggests that this is a verified account. Twitter has actually checked that this account belongs to the actual Barack Obama and not someone else pretending to be him. This might be useful in some cases, but it might be some source of actually verifying whether you're dealing with content that actually claims to be what it claims to be, but it might not in other cases. So accounts get hacked, for example, user names may change. Maybe if Barack Obama today decides to suspend his account and someone else would pick up the Twitter handle Barack Obama in next years in some database in the future these will just become the same because you cannot distinguish them anymore. So there's more about verifying the truth and what's not the truth than just looking at it and seeing a symbol. Also, social media is affecting discourse on a lot of topics, sometimes even the cause of an event. I've collected a couple of hashtags, which all refer to different types of events from the last couple of years, very different types. So some of them are scheduled events like, for example, the last one election example. There's a lot of work on studying social media during elections. Also, something like televised events, royal wedding, Olympics, football, world cup, viewer contests, all these kind of things are heavily discussed in social media and people are already looking into what users are actually doing when they discuss these kind of events. Then there's a couple of events which happen and you don't know of them before they happen, that you don't know how they really picked up in the community, you don't know what terminology will be using when discussing them. for example, the [inaudible] movement had different hashtags that were used in this context, the process at the Istanbul Bogazi Park, the terrorist attack in Paris which was then - for which then there was the [inaudible] hashtag, the occupy Wall Street movement, and lots and lots of natural disasters for which pure LD flood for Queensland flood is just one example of a big flood event in Australia, there's a lot more. If you look at that now it's already difficult to figure out which hashtags were used in which event, so if you do not keep track of that and even if you know some of the hashtags you will never get the entire conversation because some people will not use the hashtags. So there's a lot of things to keep in mind, to keep track of if you really want to capture these processes. In any case, as people are simply discussing events, for example during an election people may discuss some political party's current activities. In some cases actually being able to use social media in events makes a difference for the cause of the event. I don't believe that much that the Iraq Revolution is the typical example because there's lots of discussion whether that was really driven by people using Facebook and Twitter to connect to one another. It's not really sorted out in the community yet. But, for example, during the Fukushima event in Japan in 2011 and the earthquake that took part there lots of people relied on things like Twitter in order to let their family know that they were okay, in order to figure out where to go to, and lots of other natural disasters. It really makes a difference whether you find immediate response through the social media channels or not. There are other cases where the use of social media kind of affects what is going on, and one of them is elections where social media are used by the political parties in order to promote themselves, but also where things happen like Facebook experimenting with where do I put a specific [inaudible] and it says I've already voted and does that affect whether my friends vote, as well? And there are indications that the placement of this thing on Facebook can actually have an influence that make a very tight election be responsible for people voting or not voting. So that's kind of interesting to keep in mind. ^M00:20:03 Something I won't go into more details here today is that, of course, social media can also be used to engage people with historical sources, with classical historical sources. So, for example, The Library of Congress shares old photographs through Flickr and it would encourage people to engage with this content, to take this content to identify who is on these pictures, what you can see there, that's a very useful way to bring in classical historical sources to a broader audience to use the material and the context. There are also projects of historians who have started to copy old diaries from the 1st World War into blogs so that it appears like they were written today in that new medium, but that's kind of a different thing. So what is it actually that makes social media something special and why is it something different than traditional sources? I would say many cases there's that much of a difference. You can see social media in a role of a lot of other media, like for example in this photo, old postcards, people have always been sharing information, have always been communicating, so it's one step and there's a lot of similar content in social media than in traditional sources. For example, this is a blog by a war hero, which is kind of the digital equivalent of a war diary which would have been in other context, as well. So the content is kind of the same, but the ways to engage with it are different because people can immediately respond to a blog post here to a diary entry, they can link to it from other platforms, they can share the link, they can like it, they can comment on it. These are things that you will not have in a classical non-digital format, which makes this digital source something special. I'm very grateful for being here at The Kluge Center and working with all these historians and humanities scholars who work with all these very different resources because it reminds me of similarities and different approaches and from people I've met here so far I can in many cases translate that this might be something that will be digital in the future, like people studying manuscripts, lectures from for example office, politicians, we will find digital equivalents to that in the social web. We will find something like artists sharing their first drafts through social media. We find people responding to it. We find something like fan fiction. We may even find something like propaganda, but I think no one has really looked for it in the social now, I'm not sure. So there's a lot that's kind of similar. We also find similar concepts in terms of what you can do with it. So everything which you find in classical history, which is referred to as other history or collecting perspectives from different users, highly relates to what is today called crowd sourcing where you actually have those perspectives from web users like in the story [Assumed Spelling] project here where Americans are asked to contribute their own personal stories and which actually is also archived at The Library of Congress. So we've got lots of new forms, lots of new connections between individual contributions on the web and with that comes new modes of analysis, and that's probably what interests me most and what got me into this field that you can actually look at different connections very easily because the data already has some useful metadata, like timestamps, like unique identifier for you also, like the geo codes in some cases, that you can use these bits and pieces of metadata to do new kinds of analysis. It's also a field where there's lots of exploration going on at the moment, so it's mostly computer scientists who explore what you can do with this kind of data, how you can draw networks, timelines and so on. And there's no standards yet, there's no real [inaudible] truth, when we do it this way or the other you get the exact results. I've brought some celebrity images of social media data analysis. This is a network based on Facebook friendship relations and Paul Butler who did it actually just used the location of the person and drew a line to the location of his friends, which then on the Mac displayed this kind of network. And I want you to keep in mind and look at the darker areas in this image, and they're going to be important at a later point, but I want to highlight that Paul Butler could only do so because he was an intern at Facebook so no one else can have access to this kind of data except someone really within Facebook. So with this case, which is a timeline of tweets during a football or soccer World Cup and so Twitter collected tweets mentioning some of the teams. So you can see over the course of time which team is mentioned the most and this relates to the course of the event. Something else that [inaudible] did was I selected this in order to realize that if you do networks, if you look at how users are connected to one another, for example through read tweets, that's what it is here, it matters at what point of time you do this. So he has split u an entire event into subsets and depending on which of these subsets you look at you get a very different result, and that is actually important if you consider collecting tweets from events like an election or like an earthquake that depending on which subset you select you will get very different results, for example of who is connected to whom and how this plays out. Social media is not just social media, there are different types of social media. So far I've only mentioned the very popular ones, there's lots and lots more besides the big players, YouTube, Facebook, Google+ isn't even that big, so a lot of things beyond that. But what I've tried recently is to identify a couple of features that are different in different types of social media and that are important for historians. So the data formats are the one thing that's important, whether you have to look at images, whether you have to consider video formats, whether you have those mixed up, and with this screenshot I just want to highlight that this may change at any time so Twitter, for example, is frequently rolling out new features, like including a mobile camera recently which changed the entire way users can interact with that platform and data can be produced through that platform. Currently there's hardly any way to keep track of which feature was available and which platform at which point of time and what users can actually do with it, and that's a big problem for future historians and even for researchers today who want to know what they were actually studying. The immediateness is a big factor, as well. I would start with a very timely response by a social media user, which is this one. So when a US Airways plane crashed into the Hudson River in 2009 it was actually an eyewitness who happened to just be in that place and to share that photo on Twitter, Twit pick, who was actually the first one to report on this event before media could take up, so this is very immediate, very spontaneous. He may not have thought about what impact this photo would have caused at that point of time, how much it would be picked up by the news for example. This is the same event, it's also the US Airways flight landing into the Hudson River, but it's on Wikipedia, so people had to have some time to think about the events, to learn what was actually going on, and to write a report on that. It's much less immediate, it's much less spontaneous, and probably much conscious of the users who did that. The interesting thing here is that it's still changing, so if you look at those or history of that article the last edit was on May the 9th, so a couple of days ago, so people are still editing this story about the Hudson River landing. There's a lot of challengers, so like at what point of time do you look at something, like a Wikipedia article, if you want to figure out how people today are perceiving events they are describing in these articles. ^M00:29:34 And actually there's a lot of articles which are what is called edit wars, where people go back and forth changing things which are very controversial and sometimes it's even very interesting to pick out the edit war articles because they'll probably point you to what is so controversial in some communities at the moment. Point three, we don't always know much about users' motivations for participating in social media platforms. So there are a couple of studies investigating why people take selfies of themselves in front of the Capitol and share that on Flickr or wherever, but we don't always know that for sure. We don't know how much people are aware of their actions becoming visible to everyone, becoming reused by people like me who do research on that or by people who want to archive that. So there's a lot we don't know. We could be quite certain that someone who participates in a Wikipedia article knows that people will read that, but we don't know whether someone who posted a photo on Twitter will be aware that this is public to everyone. And sometimes we don't even know that there are users, actually [inaudible] so I've selected a case of a Twitter butt that tells you whether you can currently see the International Space Station from your location, so it's kind of a friendly butt, it's not doing any harm. But there are a lot of more automated accounts in social media who actually are there to do harm, who are there to spread spam, who are there to spread malware, who are there to probably influence things like political debates during an election, which you don't always know about. And these are usually deleted by like Twitter at some point if they find them. So if you look at that later on you may not be aware that there was such a butt in a conversation, but people who have been seen with at the time it was detected, if we don't get that at a later point. And in the end it all comes down to the availability. So some data is publicly available on the web and that's usually the type of data that social media today work with. Some data is available even beyond visibility if you do some tricks, which I wouldn't recommend, and some data is there for sale. So, for example, if you want to get access to Twitter data you can buy that from a company called Glip [Assumed Spelling]. Some data may vanish after a certain point of time, this is the example of Politwoops which collect tweets from politicians that have been deleted, so they constantly monitor tweets by politicians and if one of them vanishes they are going to post that on their platform. Mostly that's because of typos and things like that. In this case, it's kind of an example, someone had posted a longer text on Instagram and automatically Instagram also posted on Twitter, so that means that it will show up at some point. And the tweet that came out of it sounds - took some time this morning for one of my favorite things to do as senator, calling young men, so I realized that this is probably not a good idea to have it as a tweet, and deleted the tweet. But it was supposed to be - calling young men and women who have applied to service academies to let them know if they've received an appointment. So that's what it was in the original, but these things happen and these things get deleted and in these cases may not be too bad to have missed that deleted tweet. Other cases may have deleted critical information, like someone believing he has seen that a plane got shot down somewhere and posting that on Facebook, and suddenly this tweet is gone or this post is gone, this is much more critical. From my perspective the top three challenges for historians and librarians when dealing with social media content and web content are not to run into something like the Dark Ages of the internet, not to lose too much critical information that cannot be restored after awhile, like what we also find for the [inaudible] of radio and television a lot of things have been lost already. So I've made this blurry by purpose in order to protect the people who are used in this example, but if you look at a Twitter timeline you'll usually see something like profile pictures, you usually see something like images in tweets, URLs in tweets. There's a lot going on in that user interface. If you study Twitter the way most of the people do it right now what is left-over from this looks something like this. So we never really work with a public interface anymore, but you collect the data in a specific form, transfer it to CSV files or EXA [Assumed Spelling] files in the end and it might be something like this with some iteration already being lost, like the profile pictures, like the other images in the tweet. Sometimes the URL included in the tweet will not - you will not be able to resolve them anymore because the link is just broken, those things might already be lost, and people who are currently wanting to archive such a dataset which they've used in the study usually do it in a form like this, so all that remains in the end are the IDs of the tweets, which again will have to resolve in order to retrieve the original tweet again. The internet archive has something called the way back [Assumed Spelling] machine which lets you view websites, how they looked like a couple of years ago, for example. And I've been trying out a bit of how much of Twitter profile page can you actually find through internet archive? Internet archive is not made for the purpose of looking at dynamic content that much, so it's a little tricky. But I was able to review a couple of old Twitter profile pages, like the one from The Library of Congress and what it looked like in 2007, it's quite different today, and it worked in that case to some degree. It might not work in other cases, for example, I did a study on soccer clubs in Germany and they frequently change their Twitter name, their user handle, and if I don't know that, I mentioned yucball [Assumed Spelling] used a different Twitter handle a couple of years ago than it does now, I will not be able to find the profile page of that Twitter account anytime today. So two suggestions I have for how can people in the future who want to study important events and important political figures through social media is to keep track of important hashtags for events and important user names and even changes in these user names. This will be really, really hard to do once a couple of years have gone by. I've been talking quite a lot about Twitter which is because that's kind of my focus, but there's a lot more social media platforms out there and there's a lot of people who are not on Twitter. And this is usually referred to as the digital divide, so you've got people who are not able to access the internet in the same way, you have people who live in countries where certain platforms have been banned, like in China you cannot access most of these platforms but you have very specific other social media channels. And it's important to keep in mind that they exist and to focus also on what's happening there. That also should be done from my perspective now is to keep track of what user base can be found in which social media platform because even today it's difficult to keep track of which population do you find on Twitter, do you find on Facebook, do you find somewhere else, but it's important in order to understand the biases that appear when you study these kind of data. So keeping track of things like Twitter is predominantly used in these countries and that number of people are on there will help in the future to make sense of how important it really was in that time and country. And, finally, this is all just the tip of the iceberg, there's so much more underneath it because if today we have something like that someone finding a box of old photographs on the attic that his grandparents may have left there for a couple of years that's already something historians cannot really deal with in large scale, but some day you will find lots of hard drives full of digital photos, terabytes of digital photos, terabytes of other things how can you deal with that? So that's, even if you look at social media there's a lot more that didn't make it into social media and still may be important so that's really becoming lots of data to look at and really a big challenge. I will leave off for today and I would be happy to discuss with you at some other point. For now I have to, yes, express my deepest gratitude to all the people I've been working with at The Library of Congress in the last month, and I see a couple of them here and I'm happy that you made it here today. It's been a really inspirational environment to work at The Kluge Center and to work kind of in the intersection of looking at technological topics, but also into more classical humanities topics while I was here. That was really, really useful for me and I thank you, all, for making that possible. I'm still somewhat in between with this research so I'm hoping to come back to continue, but for now thank you so much. And, also, thanks to my colleagues back in Cologne, who also helped me a lot. Thanks. ^M00:40:10 [ Applause ] ^M00:40:22 Okay, Zona [Assumed Spelling]. ^M00:40:27 [ Inaudible ] ^M00:42:18 >> Katrin Weller: Yes, [inaudible] thank you so much. So the first question was about [inaudible] diaries and maybe mixed up two things? I don't know how clear it became. So there's the one case of people today who have been [inaudible] in Afghanistan or something like that, write their own diaries but they do it in a blog. But that's not what you were asking about, you were asking about the other case where historians or sometimes also cultural heritage institutions, like libraries, take their own documents from what I respond - I personally know of a case from the 1st World War where they were doing such a project, where they have diaries from the 1st World War and publish these manuscripts as a blog post. I don't really know about the legal situation in that particular case. I would have to - I think it's in collaboration with the institution that maintains - so it's not just someone going into an archive and taking the copy and publishing it, it's a project run by the institution. There's other things like people creating something like a fake - well, fake - an account for a soldier on Facebook, like pretending that this is the account of a 1st World War soldier on Facebook and then writing things like, oh, I have to go now and leaving my family behind and things like that, and kind of making it look like it was a real person, but yes. Yes, the second question was about something that I really have to think about more, which is keeping an eye on the grassroots initiatives that could be useful in this context because much, of course, is first I focused on the big names, the big events and big things, but kind of involving more people into that, seeing different perspectives will be an important next step. I cannot say much about that now, but definitely it's something I have been thinking on a lot recently and will have to keep thinking on. Elliott [Assumed Spelling]? ^M00:44:39 [ Inaudible ] ^M00:46:11 >> Katrin Weller: So that's already been accomplished in that sense, but it's not enough to preserve the data, you will sometimes also have to preserve the computer programs that can run this kind of data and the computers with it in order to make this all kind of still working in the future. Of course, we have no idea what the future will be like. I don't have any idea, but, yes, my take on that is that we should do the best possible effort we can with learning from the past, with learning from what we have lost in the past and to try to prevent this happening with this new material, like I think the best we can do right now is just say we don't know what's going to be in the future so we'll just keep it at this and we'll see what survives. There are a couple of things we can try to preserve and that's kind of my call to do that as best as we can. ^M00:47:09 [ Inaudible ] ^M00:47:39 >> Katrin Weller: Yes, you're right. ^M00:47:44 [ Inaudible ] ^M00:48:34 >> Katrin Weller: So every platform has its own legal terms of services that you agree to when you sign up for it before you upload any content. And, in any case, you transfer a lot of rights to the platform, itself, and copyright is something that is very different in every country so it's very difficult to answer that question [inaudible] it always depends on where you are, which terms of services you've signed. It's very tricky and it's very complicated from my perspective, but I'm not a legal person and like my homeland situation in Germany what I'm planning to do is to get legal advisors onboard on how to deal with these issues. But, yes, the second question was the eventual removal? So what was the exact question? ^M00:49:31 [ Inaudible ] ^M00:49:48 >> Katrin Weller: Yes, that may happen in some cases, yes. ^M00:49:50 [ Inaudible ] ^M00:49:55 >> Katrin Weller: Yes, that definitely may happen depending on the terms of services for a specific platform and in some cases - so there have been cases of people trying to delete their Facebook profile, for example, [inaudible] and signing back up on Facebook at a later point and the data was still there, so they reactivated everything so it wasn't gone. So you cannot really be sure what happens with the data you think you have removed, you think you have kind of shut down the account. ^M00:50:34 [ Inaudible ] ^M00:51:18 >> Katrin Weller: Yes, not much than talking to these companies, I think at the moment that you can do. You still can do a lot of things already by kind of keeping the background information that you know how to deal with the data once it may become available because we all don't know what happens if Twitter shuts down someday or if Facebook shuts down someday, it's kind of like Myspace and no one really cares for it anymore. There are other examples, for example Wikipedia is very open in sharing all the data, so there are things we call Wikipedia dumps, which are copies of different versions of a time of Wikipedia which are already available to the public for reuse for different purposes. And so focusing on the big ones may be better than looking at what is already there that can actually be reused where the policies are a bit more open towards sharing content and in these cases it may be useful to start and then see what is needed to preserve these because even if the data from Wikipedia is there it might be the case that you can still read it in a couple of years because the forms may have become different. Yes, so my suggestion would be to start with the big names in that field, then of smaller projects, where you can actually talk to the people behind it and see what you can learn from that. Yes? ^M00:53:02 [ Inaudible ] ^M00:54:06 >> Katrin Weller: Starting with your first question, so the past couple of years, I don't even know how many, seven or so, I've been working on studying case studies with social media data for today's context, like looking into political events, looking into how scholars use social media to communicate with each other and how citations kind of comes through social media environments, looking for how [inaudible] use social media, all these things are happening right now. And current research, there's a huge community of researchers who work with this kind of data today to look into today's society, so social scientists are interested in that, to learn more about behavior and other type of religious beliefs. Computer scientists, as I mentioned, are doing a lot of work with this kind of data today. What I've found when working in this field and when talking with people who use this kind of data today for today's research questions, like physicists are using it for testing the network models, this is all happening. But what is not really happening is that the research data that is being used in these projects is preserved for the future. So you've got a lot of work going on, a lot of research being conducted, but you will never probably get a chance to see the original data that it was based on because they cannot really archive it. So it's kind of, yes, around the process of what of this data will survive in the end and what you will still think. So kind of based on that I'm now trying to look a bit more into the future and I'm not sure whether political - maybe it's political scientists, as well, who want to work with this kind of data in the future, that's for sure, yes, but kind of that was the take on that. ^M00:56:05 [ Inaudible ] ^M00:56:33 >> Katrin Weller: Yes, I don't want to read too much into follow-on numbers. In terms of follow-on numbers the most popular people in the world are Katy Perry, Lady Gaga, Justin Bieber and the like. So that's a lot of things going on in terms of, for example in political context what you can see if you look at re-Tweet networks or follow-on networks where people are only re-Tweeting people from their own party or from other parties, whether people are following on with only people from their own party. And there are certainly different classes emerging, but it's really a difficult social dynamic and we have to see this over the course of time. I think the most interesting work in this area currently is the one that is studying the so-called filter bubbles or hemophilia affects, like looking at if you're only presented with information for people who are like you, who are in your own network and are like you do you always live in that filter bubble, can you ever get to know anything outside your own perspective, outside your own world? Have you any chance to kind of get to that? It's very difficult to study, as well, so but that's a problem that a lot of people in this area of network studies address. ^M00:58:00 [ Inaudible ] ^M00:58:03 [ Applause ] ^M00:58:16 >> We are back here next week, same time, same place. Our current [inaudible] Chair of American Law and Governance is the distinguished Sociologist, William Julius Wilson, and he will be discussing Race, Economic Class and its Determining Affects on Young Men and Women in America and Their Future Life Outcome. So he'll also be sharing some thoughts on the future of race relations in the US. So hopefully join us for that. Please sign the list on the way out to learn more. And thank you, again, for coming. ^M00:58:47 [ Applause ] ^M00:58:51 >> This has been a presentation of The Library of Congress. Visit us at loc.gov.