>> From the Library of Congress in Washington, D.C. ^M00:00:03 [ Pause ] ^M00:00:22 >> Meg Phillips: Hi, everybody. So continuing the theme of community, our panel discussion today is Community Approaches to Digital Preservation. So we thought this was a really germane topic because we're highlighting some of the heart of what the NDSA is all about. And one of the major themes that came out in the national agenda for digital stewardship, which is the need for collaborative work on digital preservation. So this panel discussion is highlighting four different projects that are emphasizing community approaches, collaborative approaches to doing good digital preservation work. So I'm just going to give you a quick rundown of who our panelists are and then I'll tell you a little bit about how we're going to manage the session today. So our panelists are Dr. Francine Berman, Chair of Research Data Alliance, Edward G. Hamilton, Distinguished Professor Computer Science and Director of the Center for a Digital Society at the Rensselaer Polytechnic Institute. Next, we have Bradley Daigle of the University of Virginia, Content Lead for the Academic Preservation Trust, and Aaron Rubinstein, University and Digital Archivist, Special Collections and University Archives at the University of Massachusetts Amherst. And, finally, Jaime Schumacher of Northern Illinois University who is here as Director of the Digital PowRR Project for University Libraries of Northern Illinois University. Okay, so we are going to have very short presentations in that order from these four panelists just to get you oriented on the projects that they're representing here today. We're hoping that that'll take maybe 20, 25 minutes for all of those presentations. And then we're going to have a panel discussion on a couple of these themes related to community, meeting the needs of your community, how their collaborative preservation approaches work, how they might scale, those kinds of issues. So we're going to have a little panel discussion on those themes for maybe 20 minutes, hopefully, leaving 10 or 15 minutes at the end for your questions of the panel. So let's get started. Our first speaker is Fran. >> Fran Berman: We have our choreography up here, so I hope you guys are appreciating how much work we've put into this. I'm here, and my buddy, Larry Lanam [Assumed Spelling], who was kind enough to send in the application to speak to you all, is virtually here. So think of Larry and I, Larry is about nine feet taller than me, for those of you that know him, so think of Larry as here, as well. What we thought we would talk about today is a little bit about the Research Data Alliance, which is an organization that's barely two years old, but has really grown precipitously, and just share with you just some of the basics about it. The library community, the information science community, archival community, the museum community are really important parts of the Research Data Alliance, so I want to talk about it a little bit with an open invitation that we would really love to see any of you use the organization for whatever is useful for you. So what is the Research Data Alliance? And this is an organization that was created a couple of years ago to accelerate the development of research data sharing worldwide. So for all of us in the research community we're finding that sharing data really accelerates virtually everything, whether you look in social science or chemistry or high energy physics or art or any area that you choose you find that researchers are sharing data. So, for example, if you want to look at the ethnography of how you get asthma, are you more likely, more at risk to get asthma in Mexico City than Los Angeles, which some of our researchers are looking at, you might want to combine data from patient records, from air quality studies, from where the infrastructure is built, et cetera, creating the kind of infrastructure needed to be able to share those data collections and ask those questions is really important. Creating that infrastructure is part of what we do in Research Data Alliance. Now when we think about creating infrastructure you have to think broadly about what infrastructure is. So it's not just the technical infrastructure, pieces of code or approaches to systems for data centers, et cetera, it's the social and organizational infrastructure, as well, and you can think of that as community standards, as policy that's adopted, as practice that we all use, et cetera. I was talking to Steve Morales [Assumed Spelling] from DPN today and about how libraries are coming together and sort of sharing their technologies, but also their practice and their policy, as well. So the idea is to bring together people who are worried about reducing barriers to data sharing and exchange and to develop, to accelerate the idea of coordinated global data infrastructure. Now RDA doesn't tell you what infrastructure you ought to be building, there's no, you know, the organization is not around. There's an Esperanto infrastructure and we all have to adopt it in order for it to work. The idea is that all communities are sort of working to develop the infrastructure that seems most appropriate to them. If we can share that infrastructure, great, if we can use that infrastructure, even better, but not everyone has to use everything. Communities in the real world are just doing what they can to develop their infrastructure. So the working groups in the RDA are working on focus pieces of infrastructure that's adopted and used by someone. You build a piece of infrastructure that's a technical piece of infrastructure, social or organizational, it's adopted and used by someone. These are efforts for which people are spending with volunteer effort about a year, year-and-a-half and ideas to eliminate a roadblock for data sharing. These efforts should have applicability to some group. It doesn't have to be absolutely everyone, but if there are more people for which that infrastructure is applicable for that's really important. And the kinds of things we could start today because if I want to worry about asthma I don't want to wait for some number of years for the infrastructure to be in place. I want to answer that question and move on to the next question that's important to me. So that's kind of basically what the organization is trying to do, it's trying to create stuff, adopt stuff, use stuff. It's kind of the just do it of the data infrastructure world. So let me answer the question next is who is the RDA? And in less than two years, this has been sort of precipitous growth. Every time I show this slide it's a little out of date, so these days we have over 1,900 members from 80 countries. This gives you a sense about around the world. You know, this is not just our problem in the US, everybody is trying to share data, if you look at Europe and the whole digital rights and the kinds of things they're doing there around their digital agenda, Australia, Africa, India, Asia, et cetera. There's a number of people all over the world who are looking at this, so those people have found the RDA to be a useful vehicle for them and are attracted to the organization to build infrastructure. In the US we have about 41 states that are represented by our members. If you're from the Dakotas we would love to include you. So it's really been spread widely. And so that was a little bit about who. And, actually, let me just say that are who are these people? And it turns out that RDA is very broad in scope. We have folks from the library community, folks from the information science community. We have archivists, we have data center people, we have people from the domain sciences, we have people who are looking at agriculture and chemistry and toxic genomics and material science. We have computer scientists, we have technologists. So it's really - we have policy makers, data policy makers - so it's a very broad community. So my last slide is I just thought I would show you what are they working on? And all of this is community driven, no one tells people what RDA should be a vehicle for, but here are the groups that have found RDA useful. My apologies for the smallness of font, but they kind of cluster in certain kinds of themes. So in domain science we have folks looking at lead interoperability. We have the photon and neutron science community, who just joined RDA and are trying to develop infrastructure that's useful for them. Marine data, as I talked about, digital lessonography, you know, biodiversity, structural biology. So we have a number of different areas, and if there are domain researchers who need infrastructure to facilitate data sharing and they think that RDA might be a useful organization we welcome them. It's free to join, you go to the web and just check the box that says you believe in openness and good things in life, and you're part of the RDA. And it should be something that enhances the work that you do, so the idea is don't just join to join, join because you think it would be a good vehicle for you. For community needs the data, we are in, all of us are in arguably the most exciting area around these days. You know, data is a first class priority all over the world, and to really come together and as a community this is a wonderful thing. I think I'm less than 10 minutes, I think I'm like 30 seconds. A number of different things, what you'll find is nuts-and-bolts people are very first working groups delivering infrastructure are going to worry about things, like persistent identifiers, data type registries, data citation, metadata, et cetera, as well as things focused on stewardship. Very important, if you're going to share data the data has to live somewhere, somebody has got to be its steward, somebody has got to worry about preservation. So all of these things are really important, and we look forward to working with all of you and working with all of you as appropriate. Thanks. ^M00:10:36 [ Applause ] ^M00:10:42 >> Bradley Daigle: All right, so my name is Bradley Daigle, and I'm the Content Lead for the Academic Preservation Trust. And I'm going to just give you a brief overview of what the Academic Preservation Trust is because you've heard there's a lot of community based approaches to preservation of all kinds, and what I want to situate here is the thinking and the methodology behind what the AP Trust is trying to accomplish. So there are 17 members currently, and they are made up of a broad group of participatory actions. And largely what we've seen with our work is - here are the members - is we have disposed ourselves in a few critical ways. We have a team that's focused on the content issues related to preservation, a team that's focused on technical issues and implementation of preservation activities and actions, and then we have a governance board. And so, as you know, we are not alone in creating preservation environment, but what separates us in many ways from what you've heard from a lot of other initiatives is we're really a community based approach. We have a central philosophy, which is that many minds make for not only better solutions, but faster solutions. So you will know that there are other preservation solutions out there for singular institutions, other ways to approach preservation. We don't have any claims to being the definitive approach. What we're looking for is we look for people who want to engage with preservation issues at a variety of levels. So, for example, one of the tenets that we have as our group is we want to become a TDR, a trusted digital repository, and as such we have the content team has individual members. Steven Davis, sitting in the front, is leading that group. So what we try to do with AP Trust is establish a dialogue between practitioners and the future direction of preservation. If there's a small group that has an idea about how there's a new problem set that needs to be addressed, there's a new direction that we want to go, then their task is to create the data to bring back to the larger group, and then we determine how it fits into our preservation environment. So where we are now, we've been working towards production, so we've been in the idea stage, about a year-and-a-half ago is when we really started to ramp up, figuring out how we could get 17 people to agree on what a basic preservation baseline service suite should be. We're anticipating being in production later this summer. And then our next step with that is to really create a business model around preservation and basic preservation services because we feel strongly that you should be able to go to your budget manager, your administrator, your dean, your director, whoever, to say I can tell you how much it will cost, if you have X amount of data I can tell you the cost is Y amount of cost for that data. We're really looking at providing some real costs to that environment. The type of environment, of course, that we're proposing is not a singular environment. It really fits within the ecosystem of preservation and stewardship activities. And if you think about an ecosystem starting from content you keep in your local machine, maybe then you have content that's backed up by your organization in some way, AP Trust can function very much as a scratch disk space where you can read, write, but you know the content is going to have preservation activities done to it, auditing, fix e-checks, things like that. And then you might have a long-term preservation solution along the lockbox model, you know, something along the lines of deepen [Assumed Spelling]. But really thinking about where preservation fits we're committed to providing the appropriate solution for a large, and if not the largest group of people who can find that useful. So that's us in a nutshell. Thank you. ^M00:14:29 [ Applause ] ^M00:14:36 >> Aaron Rubinstein: Hi, so thank you, everyone. I'm here not necessarily just as a representative of the University of Massachusetts, but actually as a representative of the Five Colleges Consortium, so I'm just going to describe briefly who we are. So on the slide here I have a depiction of Scooby-Doo. So there's a very popular urban legend that the characters of Scooby-Doo are actually modeled from the individual institutions that make up the Five Colleges Consortium. That, unfortunately, that urban legend has been denied by the creators of the actual show, but nonetheless it does say something I think interesting and also instructive about the various institutions that make up our consortium. So starting on the left, Velma represents Smith College. We have Shaggy representing Hampshire College. Scooby-Doo, of course, is my Institution, the University of Massachusetts at Amherst. Fred representing Amherst College. And Wilma representing - Daphne [Assumed Spelling], sorry, I'm mixing my cartoon metaphors - Daphne representing Manholio [Assumed Spelling] college. So my presentation here is a bit of an ugly duckling, I think, compared to the other two presentations that you've heard so far. I'm not necessarily representing a specific project or even a thing, really what I'm talking about is a decision making process, actually, sort of a transitionary point in understanding the nature of collaborating around digital preservation in a sort of very small, localized community. So in a way the five colleges is a bit like a family in a lot of ways, where a small community, a community that is in a lot of ways based on proximity and a community that has a fairly formal relationship that's represented by an incorporated business structure. So with those family-like aspects, there's also a lot of ways in which we find ourselves to be different. So sort of what I'm talking about is the moment sort of trapped in Amber of these institutions trying to understand what it means to collaborate. So I'm going to sort of break this transition into two phases. So the initial phase of collaborative work around digital preservation in the five colleges started with the formation of a taskforce. So one of the things that makes our institution, I think, very, very privileged in a lot of ways is that we have a support structure of library directors and even presidents and chancellors of these institutions that understand and support the need for digital preservation. So we do, as a group of institutions, have a mandate to think about these issues and to strategize for solutions for them. So the Five Colleges Consortium put together a taskforce to specifically look at these issues. So the first thing that we did was, of course, define the scope, usually what you do when you start a project that's both intimidating and also abstract. We surveyed our institutions to kind of understand the domains of practice, what are we doing in those institutions in terms of creating digital objects, in terms of managing those digital objects. And then we did a formalized assessment of our organizational, technological and resource readiness using the ADA [Assumed Spelling] self-assessment survey. And then, finally, we looked at other consortia, some similar, some different, and talked to them about their lessons learned, their experiences in terms of where the benefits, where the efficiencies were in terms of collaborating around digital preservation. So as we were going through this process we started to hear some very interesting things from the real stakeholders, the members of our institutions, who really had the most at stake in terms of digital preservation. Some of the things we heard were, oh, I'm so glad to hear that there's a taskforce, that is great, will you figure out digital preservation for me, right? Or, yes, we're not doing anything really right now, we're waiting for the taskforce to tell us what to do. So as we heard more things like that we realized that actually, and also hearing that and talking with some of these other consortia we realized that there was a little bit more nuance to finding a solution for digital preservation. Also, we started to hear from our administrators, and I don't want to skip over this because I think it's significant, especially when looking at smaller institutions and smaller consortia, is from our administrators we started hearing things like, well, doesn't our membership with how they trust solve this problem, or we implemented locks, right, aren't we doing digital preservation, and can't we just buy something that's going to solve this problem? And, you know, hearing this sort of wide range of sort of kneejerk responses to digital preservation made us realize that though we were building a foundation of understanding and the kind of foundation that you would build for any project that you hoped to implement some kind of product or some kind of solution, we realized that we had to sort of step back a little bit and think both more holistically and a bit of a more nuance fashion to figure out how to understand the benefits of collaboration among our institutions, for our institutions. ^M00:20:18 So that sort of gets us to phase two, where we developed a three-prong plan to help us understand what that foundation was and what that meant, right? So the first prong I actually have on the slide as education, and I also, I apologize, there's a typo in the PowRR Project, there's no e, apparently I literalized the acronym there, so excuse that. But education is really more along the lines of building expertise. So trying to find a way to empower the individuals and the stakeholders in our institutions so they can start making decisions for themselves about what their needs were and how to approach specific problems that are part of that overall process of digital preservation. So we held the week-long Nancy McGovern workshop, where we had hand selected representatives from each institution, who not only would take the workshop, but then become leaders for those institutions to move their projects forward. We are bringing the PoWRR workshop to New England to bring our archivists into the fold in terms of training and empowerment. We developed a digital preservation guide that is not necessarily an instruction manual insofar as it is a bunch of thought provoking points to get institutions to start thinking about, well, what are my digital objects, how are they composed, can I get them, are they managed well? We're looking at outreach around that guide, building affinity groups, so groups that are implementing digital repositories and other aspects. Best practices and standardization across the five colleges, I don't really need to justify the importance of this, I think that should be pretty clear to this group, but it's something that doesn't exist up until this point. And is an opportunity to bring in stakeholders beyond just archivists or those who are designated as having to worry about digital objects. And then, finally, experimentation, right, so building all of this expertise, thinking about these issues is abstract unless there's a specific thing to collaborate around. So we are implementing an Archivematica pilot project, not to necessarily pilot the software, but to create an opportunity for a collaborative group among our institutions to come together and make decisions about what it takes to start the process of digitization by just that first step of accessioning and preparing and managing digital objects. So, hopefully, not only will it spur collaboration, but it will also give us the hands on experience to take the next steps. So thank you. ^M00:22:57 [ Applause ] ^M00:23:03 >> Jaime Schumacher: I feel like I'm on a leash. Okay, my name is Jaime Schumacher, and I am with the Digital PowRR Project that my colleague was referring to. And we have just a little bit of time, so I'm going to try to cover all these items in 90 seconds or less, so start your clocks. I want to start with just our background, so how the PowRR Project came to be. We are practitioners from small and medium size institutions, university archivists, the special collections and rare books librarian, a digital humanist. We are practitioners who are on the ground, and we knew that digital preservation was a thing that needed to be done. We thought the larger institutions had it all figured out. Our consortium wasn't able to help us, and so we came up with what we thought was a great solution. I think it might have had a flex capacitor in it, I'm not sure. We proposed it to the IMLS, and they said, you know what, you're right, smaller institutions do have special challenges while they're tackling digital preservation, however, your solution is not the best one. So instead of an implementation grant we got a figure it out grant. So we've been spending the last couple of years trying to figure out how we can create or identify sustainable, scalable solutions that are targeted to smaller and medium size institutions. And that's kind of how the ragtag team of archivists and curators and all that sort of thing became the PowRR Project. And a word on what restricted resources means, I just want to be clear on that. It's not just restricted funding. Some of our smaller institutions are private colleges that have pretty decent funding. We're talking about anything from very small staff sizes, lone rangers, outdated infrastructures. We have one organization that's still practically on dial-up speed. So there are really truly some restricted resources that they are working with, but they're still trying to make this happen nonetheless. So our mission was to investigate a handful of digital preservation tools and services and come up with potential solution models that were feasible and affordable. And all of this was with an eye towards solving our own problems at our institution so that we could get to the point where we were actually preserving our digital objects long term. And I know that was less than 90 seconds, but I'm going to bank that time. So what we discovered along the way, just to kind of see where we're at - so, yes, what we discovered, it's interesting, but it's not necessarily the availability of tools and services that are keeping these practitioners from making meaningful progress. Really what's happening is they're overwhelmed. To them digital objects are still a novel thing, it's still unchartered territory for them, and they Google digital preservation and they get the OIIS [Assumed Spelling] reference model, which we lovingly call the spaghetti monster and you get track certification documents. And they are consumed with this fear that they're already behind their peers and they've got so much to do to catch-up that they're never going to. They're so concerned with doing it right the first time because we are librarians and archivists and we want to do it right the first time that they're paralyzed from making any progress, at all. So it's almost as if these practitioners need permission to take a pragmatic approach and to do good enough digital preservation. Okay, I was a little worried about that staying there. We also realized that we were too focused on the restricted resources part of our team. We whined a lot about there's no money, there's no people, there's no time, we don't have the right resources, when we completely overlooked the fact that because we're smaller institutions we have unique advantages. So I like to think of that as power, unique advantages. And what I mean by that is with smaller institutions you've got less red tape, you have more streamlined decision making processes, you very likely have lunch with the person just above you, probably once a week, and you can chat about things. That doesn't necessarily happen in a larger organization. You have fewer people to educate that this is something we need to do. Some of our systems, we don't have administrative lockdown on our systems, so if we want to download Archivematica and play with it we don't have to ask, we can just do it. And those are advantages, so we're trying to find the sweet spot of using those facts to our advantage while maneuvering around the challenges. But even with that optimistic approach as we were trying to figure this out we kept running into roadblocks. So what we decided is while we, through the generous funding of the ILMS national leadership grant - I know Bob is somewhere, thank you very much - while we had the time, while we had all these people together, these people, and we had great skill sets, we had a programmer at our disposal, we had a server administrator at our disposal. And while we had the funds, that we were going to try to remove as many roadblocks as we could for these practitioners at smaller institutions. So every time we found ourselves grinding to a halt we thought, okay, how are we going to fix this so that the ones that are coming behind us don't run into the same problem? And I'm just going to give you three quick examples of roadblocks that we eliminated or at least try to. When we first started to decide which tools and services we were going to test in depth we Googled it because, you know, we're naught librarians. And, oh, my gosh, do you know how many tools and services purport to be digital preservation related? There are hundreds of them, and it was just like completely overwhelming. And if you think about the lone ranger at the university archives trying to figure this out for his or her institution, that right there you just stop and you're like I can't do this. So when we tried to research them, and they are mostly registries, which are nice, but just a long list. So we tried to put - we have at least 70 tools and services that we classified against all the functionality of, not all, but the most important to us, functionalities of the data curation lifecycle. And we did this for over 70 tools and services, that way if a practitioner is saying, look, I only want to look at something that does backend dark archive storage, they can look at the grid and say, okay, I need to focus on these five or six services and go from there. So that was just one thing that we did. ^M00:30:04 Getting access to some of the services that are potentially very good fits for our target audience, we found that to be a roadblock, as well. So we wanted to get those services that were just out of reach, we wanted to pull those into reach for our target audience. One example of that is there's a private locks network that is run as a cooperative. They have a very affordable collaborative membership option, the problem being that you really almost have to already be involved in a consortium that's ready to take that on. We have institutions that wanted to join together, but didn't have their consortium's backing. So we hired a Jacopius lawyer, and we're having him draw up the legal framework and the business documents so that you can create your own collaborative and join something like a MetaArchive. Another roadblock we tried to remove was when we started testing a vendor had just come out with a cloud-based hosted solution that seemed to be incredibly elegant and it was a soup-to-nuts solution, which would be great for those lone rangers that had a budget, but they don't have an IT team to do like an Open Source tool. The problem was that the pricing was out of reach, the delivery model was out of reach, and you had to request a quote from that vendor which, as we know, can take a lot of time. We talked to the vendor about it and, by golly, they created a pricing model that is very attractive to smaller and mid-size institutions and the pricing is openly available on their website now. And that's just another example of some roadblocks we've tried to eliminate. The final roadblock I want to mention is the mental roadblock. For many practitioners, like I said, digital objects are still a novelty to them, and they're not like kids at Christmas, if they get something they don't understand they shake it and they poke at it and they bang on it to see what's going to happen. No, they're worried about things like integrity and provenon [Assumed Spelling], so they're afraid to touch the digital stuff that they're starting to get in. So we've figured out is that what they need is someone's hand to hold. So one of the things we decided to do was to create daylong workshops and take them out to the people in their regions because they don't have much of a travel budget, and spend a day with them and let them practice a session, a digital collection, and let them practice harvesting metadata. They walk away from the day having generated checksums, having generated metadata, and they're like, wow, I can do this. And we're like, yes, you can, yes, you can. And to top it off we say and guess what, in three months we're going to contact you and see what else you've done. And we've had really good responses so far. And, as you mentioned, we're coming out to the East Coast, and we've been asked to go to South Dakota. So if I find someone in South Dakota, send them your way, yes. So really what you've got here is almost an accidental community. We didn't set out to become a community, this was supposed to be a project. But now we've got this community and the question is how do we make them self-sustaining and help them to help the others in their community that they need to bring onboard. And actually I'm hoping to learn a lot from you guys today about how to make that possible because PowRR, I'm not sure PowRR should live on. Is it something that becomes integrated with depot, is it something that the NDSA takes on or * Foundation? I don't know, but I do know this community has made it very clear that they want a voice and that they're ready to tackle this, they just don't know how. Thank you. ^M00:33:34 [ Applause ] ^M00:33:40 >> Meg Phillips: Okay, great. So these are all pretty different projects, but they all share this collaborative nature, where everybody found partners and somehow established common digital preservation goals that they wanted to work together on. So what I want to do is kind of drill down on how you did that, how you formed these collaborative partnerships. And so the first question I'd like you to address is how did you find the partners? It was just a naturally occurring group or you had to go out and recruit people who were interested - how did these groups come together? Yes, go ahead. >> Jaime Schumacher: Oh, can you hear me? I do this at home with my kids. Hey, go to bed. So, for us, it was interesting, our rare books and special collections librarian put forth a grant to digitize some dime novels, and the granting agency said, no, sorry, you don't have a digital preservation plan. And she was like a what? And she asked some colleagues, do you have a digital preservation plan? And they were like, no, we don't. So about five institutions in Illinois said, you know what, this is something that needs to happen sooner rather than later, and that's when they created what they thought was a great solution with the flex capacitor. And that didn't get accepted, and so then the PowRR Project was born, so it kind of happened organically. >> Meg Phillips: Kind of geographical. >> Jaime Schumacher: It was. >> Meg Phillips: It was a group of people who were talking anyway. >> Jaime Schumacher: And we have found that people appreciate the value of face-to-face time, and it's hard for practitioners from small organizations with no travel budgets, they can't afford to go to D.C. or to Las Vegas. And having a regional support system we're finding it's something that's kind of in demand. >> Meg Phillips: I'm going to actually call on Aaron because I think he's got a similar situation there. >> Aaron Rubinstein: Yes, well, it's interesting, there's sort of two sides to the coin. So one is more like the family aspect that I was talking about. So we have a consortium that has a certain amount of sort of inborn structure just because of the nature of the five colleges incorporated. And the collaboration was something that sort of came down from on high, basically saying you five are going to figure out how you can do digital preservation collaboratively. So we had to sort of work backwards from that and figure out what does it mean to collaborate, and what is a way to do that efficiently so we can actually benefit from collaboration, as opposed to just collaborating for the sake of collaboration. And that's definitely been a challenge, and I'm not sure we really understand what that means. And I think part of that could be sort of figuring out, well, what is the thing that we have in common in terms of digital preservation, what's sort of that minimum requirement to have a functioning digital preservation program? And then think about, well, how do we incorporate that into our access systems, and is there collaboration there, as well on standardization or maybe sharing some tools and expertise? So that aspect sort of was imposed, the community, however, on the other hand, even within that imposed group of five there are more organic collaborations that have come just because of the nature of maybe where two institutions are in terms of their digital programs. So Amherst College and New Mass are the two institutions that have really sort of full-fledged digital repositories. Both of them happen to be Fedora based, although there's a lot of other software that are involved than Fedora based, repositories that are different for our institutions. But the general, the point of progress and a lot of aspects of the approach and certain aspects in tools are very similar. So there's natural collaborations there, whereas, there's also unnatural but potentially beneficial collaborations across all of the five institutions. So that's also something that we're feeling out, but there is that balance between, well, here we are, we have to do something versus organically developing types of relationships and collaborations. >> Bradley Daigle: So in terms of AP trust, you know, the collaboration that formed was organic. There wasn't any specific method behind it. And one of the things I wanted to speak to specifically is that I don't - I would be cautious to draw a distinction between large and small organizations in that we have large organizations who are doing digital preservation at a very small scale, and we have very small organizations who are doing digital preservation at a very large scale. And one of the approaches we really wanted to look at is how can you create a methodology that's going to work for both of those and any situation in between, and that was really how we started thinking about it is once we had a group of people who kind of got together and was talking about things we had, okay, we've got some Fedora people, we have some DSpace people, and we have some Content DM people, and then, lo and behold, we have some file system people. We were like we don't know what to do. So the whole approach is like, okay, well, how do we start developing workflows for that group to start mentoring others? So we started off with our group of organizations, but it's very similar to PowRR and others, where the train the trainer model is, okay, who is going to be the Fedora spokes group so that they can start working on workflows, then the other Fedora people are going to be like, oh, how did you do that? Oh, you did it like that. So I think our partnership started off as I think a lot of dean relationships and then a lot of interpersonal relationships, but it became much more organic in that it turned into workflow methodologies and approaches to digital preservation. Are you using this as a scratch disk space to put your content, say it's, you have a digital disk image that you don't have time to process, but you want to make sure it's in a preservation environment so that it doesn't get corrupted and become ultimately lost, so you need to use it in that way. So I think that's really how, for us, it's still evolving. So I'd like to say we have an open relationship. >> Meg Phillips: So was there a group, a smaller group of deans were talking all the time or was that whole list of people involved from the very beginning or I mean how ... >> Bradley Daigle: There was a small group, an affinity group that started off, and I think it was similar to some of the deep conversations, as well, that were happening at the time, where people I think there seems to be this watershed moment about two or three years ago where deans and presidents suddenly realized that digital preservation was something they had to pay attention to. And those of us who had been kind of waving our hands for years saying, hey, we're expensive, but it's hard to codify. They're starting to say, hey, wait. And I think it's an outcropping of that. >> Meg Phillips: It's actually cool to hear that, you know. >> Bradley Daigle: They're starting to get it. >> Meg Phillips: Yes, there's like some momentum. So what about you? >> Fran Berman: So I think we have kind of an interesting collaboration story because it starts at multiple levels, but for every group that ended up collaborating it was a matter of sort of hitting the tipping point. And so the first group that I think recognized the need for this was the funding agencies, the RDA funding agencies all over the world. And the RDA really started because the National Science Foundation and DIST [Assumed Spelling] and other kinds of agencies would meet their cohorts from the European Commission and the Australian Government and other countries, and they talked about how important data sharing is to their communities and how important it was for them to help their communities build the infrastructure they needed. And, as all of you know, your researchers don't always get money to build infrastructure in their grants, they need money to in some sense advance innovation, so the infrastructure often gets left behind. And so these funding agencies got together and said, well, what can we do to help the community? ^M00:41:29 And they did the equivalent of kind of creating a fertile soil in which a community organization could grow. Everyone went back to their global region, so in the US the National Science Foundation came back and approached me and others about how can we help the community? In Europe the European Commission did that, in Australia the Australian Government did that, and other places. And take one step forward and many of us wrote grants or something like that to staff the Research Data Alliance and to help the communities come together, hold the meetings, and we have two fundraisers a year, et cetera. So that collaboration was actually seeded I think by the recognition all over the world by the RDA agencies that this, you know, you're not going to put the data genie back in the bottle, the time has come for all of these communities to be accelerated by data. The community, itself, came because there was the attracter that there was now a place to talk to your colleagues all over the world who are looking at the same exact problem you are around data sharing or on digital object identification or around data citation or preservation, et cetera, and that you could have a broader community, a wider set of best practices, standards that are adopted by a bigger group of people. So the community has grown precipitously because people find it useful to have a broader, more diverse community of people to talk to about the things they care about. And then organizationally we're actually now partnering with other organizations because, of course, the RDA is not the only organization that does, you know, none of our organizations are the only organizations that do this. And so if we really want to forward and advance data driven innovation we need to partner effectively with each other. So we have a number of organizational partners and affiliates who are joining the RDA, anyone from a smaller library, to Microsoft, you know, co data to other kinds of organizations. And the idea is to do something that's in the intersection of our missions, so for us it's data sharing, for other organizations it's some other things. And we're finding that RDA, at least so far we've had three plenaries with another one coming up in Amsterdam in September, it's providing kind of a neutral space town square for organizations to get together and advance common agendas, which is very useful for us, for all of us. It's the fact that we're at this meeting and we can kind of create those connections, those connections are really important for the work we do. >> Meg Phillips: Yes. So I wonder if you could talk a little bit about how you feel your communities own digital preservation and might be distinct from other communities? Do you feel like all of these communities are interchangeable, we could just fold them all together and be one big thing, or do you really have distinct needs? And how is your particular project work, geared towards meeting the needs of your particular community? >> Bradley Daigle: I'm going to start from the agnostic point of view, which is that we agree, our partnership has a conceptual framework upon which we all set our ideals and our philosophy for how we want to operate, but then it's really up to the individual participants to figure out where that preservation solution fits in their own environment. So I don't have a clear idea how Columbia is going to use AP Trust versus Cincinnati versus Virginia, and I think that's how we want it because I think a lot of those organizations they're learning that, as well. So, for us, we agree upon, again, the high level principles of what we want our service and our activity to be, but how individual partners use it were really not prescriptive in terms of even what they put it. So you might use it as an institutional repository, you might use it as a dark archive, it's really - it has to fit the right model for your organization. >> Meg Phillips: Anybody else? >> Fran Berman: I think it's really interesting, if you look around the research community, and we focus on the open access research community primarily in the RDA, I think what you find is that the need for preservation and the infrastructure around preservation your mileage varies. So I'm a computer scientist, so as a computer scientist we don't see our data typically as a competitive advantage. I'm happy to share my data with anyone. I'm happy to think about preservation, although for a lot of computer science we're still a little behind the curve in thinking about preserving our own data. But if you look at folks, like the life science community, for a lot of life scientists your data is your competitive advantage, and so sharing your data is harder to do unless there is a piece of infrastructure called policy that encourages you to do that. So the Alzheimer's community, the autism community, the folks who are depositing stuff in the protein data bank, those policies from NIEH have really helped accelerate those communities because they've made that data available to a much broader set of people, so that kind of social infrastructure matters. With respect to preservation I think one of the most fascinating things that all of us are now dealing with is the public access memo from the White House that came in February 2013, that says as Federal funders we paid for it so let's see it. And if you want to see it, what are you going to see? Are you going to see my 300 terabytes of astrophysics data from my super computer simulation? Where am I going to keep that? Who is going to pay for that? How long are you going to see that? So I think our entire community is really, as you all have said, grappling with business models and the economics behind how do you preserve data. We are not ahead of the curve. I think everybody is struggling with that, and I think this is something that we will really have to focus on over the next decade if we really want that data to be around because just expecting it to be around doesn't mean it's going to be around. And I think that every agency, every researcher and beyond is really grappling with the business model behind preservation, but I think you're going to find up and down some communities, some projects have better solutions than others, and I think we need to raise the ocean. >> Aaron Rubinstein: Yes, this is an interesting question, too, from a practical standpoint. One of the things that we heard when we talked to consortiums during phase one of our planning was that there were complaints from the organizers of these consortia or these preservation services that it's like, well, we have a broad spectrum of people who are taking advantage of our services, but they're just shoving stuff in because they can. And I had a really, actually the whole committee had a really eye-opening experience at a meeting where we were talking to one of the library directors of the five colleges, who said, oh, yes, I love Archivematica because it's just like a washing machine, I just put the clothes in, and I press the button, and the thing lights up, and I know my clothes are getting cleaned. And we just all sort of put our heads in our hands, and realized that there is a certain set of services that they could potentially be universal, right? If you sort of break down the practicalities, the technical infrastructure, if we have a better communal sense of what digital preservation really means, and what's sort of an effective like sort of minimum barrier to entry digital preservation approach is, then that's I think possible. I mean I think that's possible on the large scale, but the question is more for those individual institutions, how - what do their policies look like, right? What is their organizational infrastructure? How are they thinking about not only the manipulating and the packaging and the managing of their digital objects, but, of course, the access aspect? And what is that - what are those workflows and what are those approaches, and what is that - those key aspects that can make a program sustainable look like for those individual institutions? And I think what we certainly saw in our very, very small microcosm of a community is that that has to be individualized work, right? It has to be work that's unique to the institution in order for more robust participation in sort of a shared collaborative system. So that, that balance I think is really interesting from a practical standpoint. >> Jaime Schumacher: Yes, I think, if you noticed at the beginning of my presentation I said that we thought that the larger institutions had it all figured out. Well, that's one of the things we came to realize is that we're not much different from the larger institutions. ^M00:50:37 And I think half the battle with what we're doing is convincing smaller organizations that, no, they have not been left behind, you have not missed the boat. And one of the tasks we were to do was find a way how smaller organizations could join the NDSA, so I set out on that task. And I said, okay, and I sent an e-mail to some folks at the Library of Congress, and I was like, excuse me, how can smaller organizations join the NDSA? They said, well, come on down and we'll talk to you. Butch, do you remember that? I e-mailed you, I was like you mean you actually work at the Library of Congress? And he's like, yes, I actually work at the Library of Congress. And they were so welcoming. And we have found that - I come from corporate consulting, that's my background, I've only been in this profession for two years. And, my, God, you guys are so nice, and you're so welcoming. And we have to convince these smaller organizations, some of them have chips on their shoulders, we're being left behind, nobody is helping us. Well, one, you can help yourself because the larger guys did and, two, all you have to do is ask and Butch will let you come to his office at the Library of Congress and you can text a picture to your mom and say look where I am. But so, no, I really don't think there is that much of a difference. Now part of the community we're trying to reach are those very, very small organizations, historical societies that are primarily run by volunteers, whose skillsets aren't maybe necessarily what you would need to even upload stuff to internet archive. So we're also trying to reach those types of organizations, and one of the things we did was create an internet archive tutorial for those, if they have digital pictures of the bicentennial celebration of their community, if they're in the public domain, guess what, you can upload it to internet archive for free, and here are pictures to show you how to do it, and they love it and they're using it. And now we know that there's just a few sets of digital objects now that have a better chance of being preserved, so. >> Meg Phillips: It seems like the communities can almost be differentiated by how intimidated they feel, not how big they are, how funded they are. So I think we've still got a few more minutes. I'm going to ask one last question of the panel, and then we'll open it up to everybody. So we don't want to spend a huge amount of time on this question, to leave some time for open discussion. But I'd like to see if any of you have any particular comments on what you've found the hardest about working collaboratively and whether you have any tips for the rest of us who are looking for good collaborative relationships? What were the challenges or hurdles to the collaboration, itself, not to digital preservation? And don't name names. >> Bradley Daigle: I think for AP Trust the challenges, it's ongoing, which is you need to have enough forward momentum and you have to have enough directive purpose and prioritization to be successful, but you also don't want to exclude other viewpoints. So that's the whole point about dialogue is that you don't want a few people who are passionate about something derail the entire collective, but you need to have the means to get them to go out and do fact finding and bring it back to discuss it on level terms. So I think having that dialogue back and forth and the means to implement and use it successfully is key. >> Fran Berman: We've been really interested in impact, you know, that's a metric of success. Can we build infrastructure that actually help people, use it, it helps them, you know, then they can move on? And I think whenever you get perhaps more than one person together politics is always a huge challenge, and it's a challenge for nascent and mature organizations. The really fun part, to me, about helping create RDA was that we wanted to structure it so that systemically it really focused people on the right things, that it wasn't about turf and credit, it was about doing something that has impact. And, as you know, there's two pieces of that. There's getting the system right, and then there's getting great people to be involved in it. And with great people and a bad system you can't do very much, and with a great system and not so great people you can't do that much. So the goal is to try to get both. And so systemically we really tried to engineer, and we're still at it, this is like not done and I don't think it'll ever be done, an agile evolutionary organization that's structured things towards impact and not towards sort of the personal and the turf and stuff like that. That's just really hard, it's just part of the human interaction, I think, to go in that sort of political direction. But for us it was challenging, but it's worthy of experimentation. And we looked at a lot of other organizations and tried to - we had good examples and we had bad examples, we tried to learn from them both. >> Aaron Rubinstein: We're really lucky to have on the taskforce a really strong group of people who worked particularly well together. I actually forgot to show my credit slide, so please check that out when the slides are online. And also Megan Bergen [Assumed Spelling], who is here, is actually one of the founding members of the taskforce, and Kelsey Shepherd [Assumed Spelling] was the previous chair, to me. The challenge wasn't necessarily that group of collaborators, those representatives from the institutions, but actually it was people who are stakeholders but outside of that group. And the challenge for them was to really take ownership of the process. It's sort of one thing to say, yes, we realize that digital preservation is important, and we realize that we need to do it, but there's an additional step of ownership that requires you to actually roll up your sleeves and really think about, well, what does it mean to preserve this material, what does it mean to sort of provide current, contemporary and sort of ongoing access to these digital objects? And once you start answering those questions it becomes much clearer how to solve specific problems and how to plan for the future, but until that ownership was taken it was very, very difficult to really think productively about what a sustainable response to digital preservation issues would be. So that ownership has been hard won, and it's not even complete. As you can see, we're sort of at this transition area point. So I mean we're hoping what we're doing now for the future is building that sense of ownership, sort of one individual stakeholder at a time. But there is a chasm for people to go from acknowledging the importance, to overcoming intimidation, or even just being willing to sort of think more than they have before about the implications of digital preservation, and that's been a hard gap to bridge. >> Jaime Schumacher: I think our biggest challenge is yet to come. We are a totally accidental community. We started out as a bull in a china shop trying to figure out how to make this work, and suddenly we looked and there were a lot of colleagues hanging under our coattail saying please take us with you. And now we're trying to figure out end-of-life planning for a project, it's not a program it's a project, without leaving these people behind and left to drift. We don't want to do that. This project actually ends on November 30th. We've had other organizations reach out to us, asking us can we have your tool grid, can we take ownership of that, can we have your one-page communication documents, can we - and mostly we're saying yes, yes, but we don't want anyone to be left behind. So I think the end-of-life planning or if PoWRR gets absorbed into another community I'm not sure how that's going to look, and I think that scares me the most is how I can keep these good colleagues of mine from not feeling adrift again, because we're making progress, yay. >> Meg Phillips: Well, a lot of these discussions are almost - well, definitely similar to conversations as a whole and a lot of the working groups have among ourselves about how to function in a way that's productive in this purely voluntary collaborative environment, so a lot of it sounds familiar. So at this point we'll open up the floor to questions from anybody, any questions for any members of the panel or all members of the panel, either about their projects or about our main theme of community? >> Yes. Thank you. This thing called digital preservation, I've heard a lot about it this afternoon, but what has come across, and I'm going to make a sweeping generalization, what has come across is an impression that it is an end in itself. But we all know that it has to be a means to an end. Maybe it's the community in the room that didn't feel it needed it, but I need it. I'm a scientist, and I want to know why you're doing this? Just one little statement, one vision from each of you? And community is big or a community is small, however you're going about it, you still need to have that ultimate vision of the new thing that will come out when the first time you can effectively knit together databases that have never talked to each other before is a very powerful thing, and that is what wins money. Now let me turn this around to you then, supposing, and life can treat you nastily sometimes, supposing there was just one grant available and the four of you have got to compete against each other, you need to have that vision there, it's got to be there in front of your eyes the whole time, why is it that you are the most important one, can you answer that? ^M01:00:56 >> Fran Berman: I think we're doing kind of different stuff. The data landscape is a big giant elephant, and you actually need all different kinds of pieces. So I suspect that I'm not sure we would be competing against each other. But, you know, I think the argument for preservation is not for preservation, per se. I think you make a really good point. And for those of you that were involved with us when we did the Blue Ribbon Taskforce on Sustainable Digital Preservation Access a few years ago, what we found is that when we talked to people it isn't about preservation, it's about access, and it's about access in the future. So my data is there today, if I don't preserve it it's not going to be there tomorrow or the day after or 10 years from now. And in the science community there's a lot of talk these days especially, and it's kind of interesting to see this fad come up, about reproducibility. The data collection has to be there for me to try to reproduce my results or build upon that for another set of results. Preservation is the guts of how that stuff happens, how you get access in the future. Now I think our problem, you know, I mean we have a lot of different problems making this happen, but to me one of the hardest problems we have is that infrastructure is just, to use a term in the vernacular, not sexy. Nobody writes news stories about water mains, you need water mains, no way we can get along without water mains. The most newsworthy thing about infrastructure is when it breaks. When you look at science the most newsworthy thing about science is the breakthrough, you know, you got the Hix Boson [Assumed Spelling] or you got a cure to a particular disease. So if you think about the whole infrastructure world, including the infrastructure you need for preservation, it's just a hard sell. The research is the breakthrough, they're the good news, they're the big New York Times article. The only time the infrastructure and preservation gets on there is when you lose something or when it breaks. And so, for us, I think the real Achilles heel of doing this isn't organizations competing and it isn't even, to my mind, the kind of infrastructure you build. It's really getting a sustainable business model so that data is going to be there tomorrow when we need it because the world is - we're in the 21st Century, it's the Information Age, you know, we have data driven innovation in just about everything. If that data is not there we can't do it. >> Bradley Daigle: Just to jump in on this, I would say, for me, preservation is a stub for a series of activities. Preservation is an asentobe, you never really get there. And I think most people in this room understand that you have to kind of get off on that. Like if you think about just like, okay, preservation, like it's done, you're not a good preservation practitioner, it's as simple as that. So to put it more colloquially I mean preservation and stewardship is about the journey, it's not about the destination. So we don't know, so we don't always necessarily know where we're going. We know that we have to do certain things and have certain things in place. We have to have could archival practice to maintain the cultural and scholarly record. We have to have these things there, but it's not with a clear end goal in mind because that goal is always shifting. And if that's, if we get fixated on a single goal state then we'll find that we've achieved it and we'll look and we'll say, oh, awesome, it's 1982. So, for us, we really need to keep - it's an ongoing iterative approach. >> Aaron Rubinstein: I mean I couldn't agree more with you, Bradley and Fran, in terms of your response to that. And I mean the only small thing that I would add is that forget about grants, right? When we're, you know, when you're thinking about large scale infrastructure grants could certainly play an important part in that and especially if it's national infrastructure and there's Federal support there's a natural give and take there. However, especially in the context of smaller institutions there's a huge amount of work that can be done in digital preservation that doesn't require large financial outlay, either from external sources or even from internal sources. And the activity of sort of rolling up your sleeves and starting to ask yourself questions and make some decisions, look and analyze, look at and analyze tools that are out there, figuring out what minimal steps you can make to sort of start this journey or continue on this journey of digital preservation. There's an amazing amount that can be accomplished. And really the work that's required is policy work and it is organizational work and it's intellectual work and it's experimentation. And I think if those are the first step forward then there's a lot more that can be accomplished if folks are just waiting around for grants or some kind of savior, either as a system or as money or even as external experts to solve problems, you know, there's going to be a gap and that gap could be dangerous, that could result in those water main leaks or data breaches or any of those things that can become negative. >> Meg Phillips: Are there any other questions? >> >> Jaime Schumacher: Something you can sell very easily, there are lots of things that you can determine unambiguously and from the first time from large collections of data, and I actually think it's a fairly easy sell. So I am a used car saleswoman, a scientist, because I will say that to anyone in any venue, but there you go. >> Fran Berman: I agree with you, big data are sexy, and everywhere you look, I can't tell you the number and it's really cool to make a PowerPoint of this, you know, the number of magazine covers with big data on the cover, the economists, all this stuff. But data infrastructure and big data are not the same, and I think, of course, we're all trying to surf the wave. Big data, yes, we do big data. But the fact is there's a lot of folks who call themselves, say, long tail data folks, there are data infrastructure folks, data preservation folks, and the data world at large, you know, the big data guys need that, we need the big data guys, et cetera, but they aren't the same. And I think as you look at it, the big data guys are kind of the explorers of this new land of data, and data infrastructure is off in the water mains, and it's often easier to fund the explorers than the water mains. That's just my view. >> I have a follow-up question to address, and this has to do with, first, do you anticipate community based activities independent of technological breakthroughs, infusions? For example, for many years we have been working in the area of quantum computing, it's still not come to fruition. This question is more directed to Fran, since you are from the discipline. Do you anticipate at least in the decade to come that quantum computing will change the paradigm of digital preservation and access? >> Fran Berman: This is a great question, and I wish I had more detailed expertise to actually answer it I think at the level you're looking for. I'll tell you what excites me about some of the focus on next generation and maybe generations after that of computing strategies, and you can include quantum computing and biological computing, and exascale and all these kinds of things. And a lot of the things that we will want to do with these new technologies will really involve looking at data in a whole new way. And so in that sense I think the storage problems, the infrastructure problems, et cetera, will be really important. But for those of you, supercomputing geeks, an interesting and good community, they think of it often in terms of an old model * pyramid, where the highest end computing is on the top, the lowest level, the stuff you might do at your desktop and laptop is on the bottom, and the enterprise level that you might do in your company and university is in the middle. And as you see quantum community, exascale, the new technology is at the top. What you're going to also see is a bunch filtered down into the middle. That means your university computers, your enterprise computers are going to be incredibly more capable and require more data, your smart devices are going to be incredibly more capable and do less, more with data. And I think we're going to find that really interesting storage problems are not just at the top, but there's going to be all kinds of new innovation and new ways of approaching it, social, organizational and technical, at the middle and the bottom of the pyramid, too. I just find that tremendously exciting, and I think over the next decade we're going to be seeing a lot of activity in those areas. >> Meg Phillips: So I think our time for this panel is up. I would like to wrap up by thanking the panelists and, also, reiterating that all this stuff is deeply, deeply sexy and I want you all to leave this room believing that it's sexy. Sorry, you're just going to have like take your hook and get me off the stage. But I mean we often focus on the challenges of digital preservation because we do have a do paper, the paper wouldn't go away if we left it in the closet. So digital preservation is hard, it's new, we're struggling with it still, but it's also incredibly cool because big collections of digital materials lets us do stuff we've never been able to do before with the cultural record of government, culture, science, all these things. So I mean we are serving a real purpose, it's not digital preservation for preservation's sake. That data is going to tell us stuff. And if we preserve it right we're really going to be making a contribution to the culture of the country. So let's thank our panelists here. ^M01:11:28 [ Applause ] ^M01:11:31 >> This has been a presentation of the Library of Congress. Visit us at loc.gov.