Article-Level Metrics – Learning to Walk, Run & Do Algebra

Jennifer Lin; Martin Fenner

Article-Level Metrics (ALM) may have outgrown the infant stage, but the “terrible two’s and three’s” have now arrived when it must learn to walk, run, and… do algebra.

catalgeb

As ALM begins to count toes, we are fortunate to have bibliometrics scholars Paul Wouters and Wolfgang Glänzel weigh in on what are appropriate uses of bibliometrics for the assessment of individual researchers – the use of bibliometrics most people are probably interested in, but also the most controversial.

Paul Wouters and Wolfgang Glänzel gave a thoughtful and entertaining presentation on the topic at the ISSI (International Society of Scientometrics and Informetrics) conference in July, followed by commentators Henk Moed and Gunnar Sivertsen, and a discussion in the plenary. Their slides are available at Slideshare and Paul Wouters has also written a blog post about this session. The discussion continued today at theSTI conference in Berlin.

The presentation was called **The dos and don’ts in individual level bibliometrics**and is an attempt by the bibliometrics community to provide some guidance and best practices for researchers and research administrators on how to use – and not use – bibliometrics to assess individuals. They complement the recent San Francisco Declaration on Research Assessment (DORA), and they add the context of extensive bibliometrics research. We start with the 10 Don’ts and have reprinted the list in full, interleaved with pieces of commentary from our perspective as a publisher.

10 DON’TS

Don’t reduce individual research performance to a single number - Context is critical to honest metrics. While normalization aims to provide some level of context, the diversity of dependent and independent variables can not be wholly captured. Furthermore, we believe multiple types or, as Priem; Piwowar; and Hemminger, suggest, flavors of impact exist, applicable to a host of different needs and goals, none of which are best served by a single number used by all and for all.
Don’t use impact factors as measure of quality for individual researchers
Don’t apply (hidden) bibliometric filters for selection, e.g. minimum IF for inclusion in publication lists
Don’t apply arbitrary weights to co-authorship. Algorithms bases on author position might be problematic - We foresee that the current trend of expanding research collaborations continues, both in terms of size as well as research specializations involved. In this light, the traditional list of authors on a paper will no longer suffice. A number of efforts have been underway to establish a more granular system of attribution beyond author order on papers, none of which have gained full adoption by the research communities, but is a true possibility in the future. To the extent that this calculation may apply weights of some kind, we agree that it cannot be arbitrary (c.f. Do #2).
Don’t rank scientists according to one indicator. Ranking should not be merely based on bibliometrics – Very much related to Don’t #1, we will add that this action would more intensely fan the flames of activity that intentionally inflates the counts for higher scores (i.e., gaming). Taking a multi-dimensional view of assessment not only better reflects the values supporting each type of evaluation effort, but also builds in additional protection for data integrity with a suite of indicators that matter.
*Don’t merge incommensurable measures, e.g. citation counts from different sources:*Splicing incommensurable measures (& fruits) are a big taboo for us at PLOS. The intent behind and interest in aggregating sources together is both practical and reasonable, offering a sound summary of the data that can be easily understood and communicated. Nobody wants to tote around a suitcase full of 15+ metrics in their portfolios. That said, we do engage in considerable discussion (often heated!) about what commensurability entails. The work is not so much a balancing act, since fidelity to the research activity captured always prevails, so much as a continued discussion with research communities about how to soundly showcase groups of metrics together when it makes sense alongside continued bibliometrics research on the behavior of these metrics, testing their potential compatibility with each other.
Don’t use flawed statistics
Don’t blindly trust one-hit wonders. Don’t evaluate scientists on the basis of one top paper
Don’t compare apples and oranges, e.g. research and communication in different domains – While there is much bibliometric work to be done, we can still normalize within a single source for time. This allows us to compare the views and downloads for a paper published in the early 2000′s at the advent of digital publishing of scholarly literature to a paper published a decade later. That said, the instinct to normalize across types of activity captured (e.g., between tweets, Mendeley bookmarks, media references, etc.) does not a best practice make. Finally, the crystal ball is clouded over for comparisons between research areas. Boundaries within life sciences and biomedical research are increasingly blurring as the trend towards specialization and cross-hybridization continues. Here, the authors refer to large domains of knowledge - “humanities, mathematics, and life sciences” – but these demarcations are also getting fuzzier at the edges. With respect to the changing research environment, we’re not certain what limitations may prove necessary as more bibliometric analysis is applied to larger sets of data.
Don’t allow deadlines and workload to compel you to drop good bibliometric practices - Research impact measurement takes diligent and sustained work. Neither the advances of ALMs nor any simple, single metric will give us the answer we deserve without some degree of thoughtful interpretation of the data.

This list (and the related 10 dos) may satisfy our desire to simplify the world into discrete elements (and sometimes they are inspiring!). But we see it as more than a set of edicts or commandments. They chart a general road towards healthy adulthood, a robust and responsible metrics ecosystem, even while rapid internal changes are an intrinsic part of any growth spurt.

The list focuses on key issues, and by this simplification some of the details might get lost. But the primary audience is probably not bibliometricians, but the rest of the research community, and this initiative is an attempt to close the growing gap between what bibliometricians have learned about these issues over the years, and how these metrics are increasingly and often wrongly applied by researchers and administrators to make important funding and hiring decisions.

This is an ongoing discussion that we hope to continue at the PLOS ALM workshop in October. We posted the preliminary conference agenda yesterday, and it is not too late to register here.