Thursday, May 28, 2015

Current State of Repositories Report - SPARC - May 21, 2015

The report was produced on behalf of the COAR Aligning Repository Networks Committee, with significant input from many representatives of the repository community  - including SPARC.  It provides a high-level overview of the international repository landscape, as well as an interesting 
summary of the current repository environment around the world. It also explores potential new future directions that repositories might consider taking.

The report was also submitted  to the Global Research Council (GRC) and the Research Council’s UK as supplementary material for a GRC workshop that SPARC participated in on the future of scholarly communication held in April in London, and will be distributed to the member representatives of both organizations.

Feel free to share this with any interested people in your libraries and network. The full report is available here:

Wednesday, May 20, 2015

Organizations Around the World Denounce Elsevier’s New Policy That Impedes Open Access and Sharing

by Prue Adler | 202-296-2296 | | on May 20, 2015

open times infinity (link to statement against Elsevier sharing policy on COAR website)
image CC-BY-SA by Libby Levi for
On April 30, 2015, Elsevier announced a new sharing and hosting policy for Elsevier journal articles. This policy represents a significant obstacle to the dissemination and use of research knowledge, and creates unnecessary barriers for Elsevier published authors in complying with funders’ open access policies. In addition, the policy has been adopted without any evidence that immediate sharing of articles has a negative impact on publishers subscriptions.
Despite the claim by Elsevier that the policy advances sharing, it actually does the opposite. The policy imposes unacceptably long embargo periods of up to 48 months for some journals. It also requires authors to apply a “non-commercial and no derivative works” license for each article deposited into a repository, greatly inhibiting the re-use value of these articles. Any delay in the open availability of research articles curtails scientific progress and places unnecessary constraints on delivering the benefits of research back to the public.
Furthermore, the policy applies to “all articles previously published and those published in the future” making it even more punitive for both authors and institutions. This may also lead to articles that are currently available being suddenly embargoed and inaccessible to readers.
As organizations committed to the principle that access to information advances discovery, accelerates innovation and improves education, we support the adoption of policies and practices that enable the immediate, barrier free access to and reuse of scholarly articles. This policy is in direct conflict with the global trend towards open access and serves only to dilute the benefits of openly sharing research results.
We strongly urge Elsevier to reconsider this policy and we encourage other organizations and individuals to express their opinions.


  • COAR: Confederation of Open Access Repositories
  • SPARC: Scholarly Publishing and Academic Resources Coalition, USA
  • ACRL: Association of College and Research Libraries, USA
  • ALA: American Library Association, USA
  • ARL: Association of Research Libraries, USA
  • ASERL: Association of Southeastern Research Libraries, USA
  • AOASG: Australian Open Access Support Group, Australia
  • IBICT: Brazilian Institute of Information in Science and Technology, Brazil
  • CARL: Canadian Association of Research Libraries, Canada
  • CLACSO: Consejo Latinoamericano de Ciencias Sociales, Argentina
  • COAPI: Coalition of Open Access Policy Institutions, USA
  • Creative Commons
  • Creative Commons, USA
  • EIFL: Electronic Information for Libraries, Netherlands
  • EFF: Electronic Frontier Foundation, USA
  • GWLA: Greater Western Library Alliance, USA
  • LIBER: European Research Library Association, Belgium
  • National Science Library, Chinese Academy of Sciences, China
  • OpenAIRE
  • Open Data Hong Kong
  • RLUK: Research Libraries UK
  • SANLiC: South African National Licensing Consortium
  • University of St Andrews Library, UK

The Association of Research Libraries (ARL) is a nonprofit organization of 124 research libraries in the US and Canada. ARL’s mission is to influence the changing environment of scholarly communication and the public policies that affect research libraries and the diverse communities they serve. ARL pursues this mission by advancing the goals of its member research libraries, providing leadership in public and information policy to the scholarly and higher education communities, fostering the exchange of ideas and expertise, facilitating the emergence of new roles for research libraries, and shaping a future environment that leverages its interests with those of allied organizations. ARL is on the web at

Friday, May 8, 2015

HathiTrust - Extracted Features Dataset Now Available for 4.8 Million Volumes/1.8 Billion Pages

The HathiTrust Research Center is pleased to announce the release of its Extracted Features Dataset (v.0.2), a dataset derived from 4.8 million public domain volumes, totaling over 1.8 billion pages currently available in the HathiTrust Digital Library collection. The dataset includes over 734 billion words, dozens of languages, and spans multiple centuries. Features are informative, quantified characteristics of a text, and include:

·         Volume-level metadata
·         Page-level features
o    Part-of-speech-tagged token counts
o    Header and footer identification
o    Sentence and line count
o    Algorithmic language detection
·         Line-level features
o    Beginning and end line character count
o    Maximum length of the sequence of capital characters starting a line

These features allow for analysis of large worksets of volumes in the HathiTrust public domain collection, at scales previously intractable for most individual researchers. For example, page-level token (word) counts, can be used to help build topic models, classifications and perform other text analytics. Similarly, features can be used to evaluate readability of a given volume or workset.

How to get the data:
The entire dataset, as well as sample subsets and custom worksets, are available at:

How to cite:
Boris Capitanu, Ted Underwood, Peter Organisciak, Sayan Bhattacharyya, Loretta Auvil, Colleen Fallaw, J. Stephen Downie (2015). Extracted Feature Dataset from 4.8 Million HathiTrust Digital Library Public Domain Volumes (v0.2). [Dataset]. HathiTrust Research Center, doi:10.13012/j8td9v7m.

This feature dataset is provided under a Creative Commons Attribution 4.0 International License.

About the HathiTrust Research Center:
The HTRC is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library, to help meet the technical challenges of dealing with massive amounts of digital text that researchers face by developing cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.

For more information about the HathiTrust Research Center, visit

Posted: May 8, 2015