Scope and Motivation

In this data-driven economy, as society makes increasing use of data mining technology, it is now more important to that our community has a shared understanding on how to assess the results coming out of those data miners. Recent results experience shows that, in the arena of software analytics, we do not share that understanding.

We now have more than a decade of research on data mining in software repositories, reported at all major software engineering venues (ICSE, TSE, EMSE, MSR, ASE, ICSME, ESEM, …). Based on the organizers’ experience on their last dozen journal papers, we assert that conference and journal reviewers in SE have very little shared criteria on how to assess data miners. Simple low-level issues, such as what performance indicator to use, are still controversial. Some reviewers eschew accuracy or precision; some demand SE (standardized error). Similarly, many higher issues are also unclear such as what statistical test to use on how many data sets (and where should that data come from). More generally, recently several papers reported on failed replications or problems with the data we use.

All the above hints at general and systemic problems with the way we evaluate and compare our research. This is a pressing, open and urgent problem not just for researchers since we know many software developers who routinely ship some kind of analytics functionality as part of their delivery tools. If we, as academics, cannot agree on how to assess those tools, then how can industrial practitioners ever certify that the analytic tools they are shipping to clients are useful (or, at the very least, not misleading).

Accordingly, this workshop’s goal is the development of guidelines for assessing software analytics. We want to bring together the community to discuss anti-patterns as a first step towards guidelines for repeatable, comparable, and replicable software analytics research, e.g., on defect prediction and effort prediction. As such, we do not want to discuss new techniques, data sets, or ways to mine data, but instead focus solely on the discussion of how we should actually evaluate our research. This shall give researchers a forum to share anti-patterns they frequently observe and how to avoid them.

This project will take much longer than one day but, as a start, we plan to hold this one day workshop to build lists of analysis anti-patterns which, if seen, should make some reader doubt the veracity of the conclusions achieved by some analytics analysis.

Accordingly, we invite researchers to present lists of anti-patterns they see as problematic in current research. We are looking for two types of anti-patterns: those general to software analytics, and those specific to research areas, with a focus on defect prediction and effort prediction, but also other related areas that may have similar problems.

Our aim is to use these anti-patterns as foundation to create guidelines for how to conduct research. The workshop shall serve as open forum for the community to express their concerns and get involved in the development of guidelines for both authors and reviewers to ensure a consistent quality in software analytic research. Therefore, this will be a highly interactive workshop with short presentations 5-10 minute presentations of anti-patterns followed by breakout sessions where we will seek common ground and potential solutions.

Submissions

We are seeking short papers with up to 4 pages that describe anti-pattens you frequently see in published papers or during peer review of papers that involve software analytics. All submissions must follow the ACM formatting guidelines.

Papers will be evaluated based on the clarity of the description of the anti-patterns, with respect to not only what the anti-patterns are, but also their potential negative consequences. Solutions on how to avoid the anti-patterns are not a requirement, but a bonus that can help shape future guidelines. Papers must not be about pillorying individuals, but should provide a constructive analysis of the anti-patterns. While citing papers as positive or negative examples is in principle allowed, we recommend short self-contained examples of the anti-patterns in the submissions instead.

Papers must be submitted electronically, in PDF format. Submissions should be made at the following website: https://easychair.org/conferences/?conf=apsa2018

Accepted papers will be published as ICSE 2018 Workshop Proceedings in the ACM and IEEE Digital Libraries and will be distributed to the workshop participants. The official publication date of the workshop proceedings is the date the proceedings are made available in the ACM Library. This date may be up to two weeks prior to the first day of ICSE 2018. The official publication date affects the deadline for any patent filings related to published work.

Important Dates

Submissions due Monday, February 05, 2018
Notification Monday, March 5, 2018
Camera-ready version Monday, March 19, 2018
Date of the workshop Saturday, June 2, 2018

Program

To be announced

Organization

General Chair

Organizers

Program Committee