My publications (Daniel Morales German)

1. Other references
- 1.1. My Google Scholar page.
- 1.2. My DBLP page.
2. Totals in the last 10 years
3. Batting Average
4. Currently under review
5. Journal Articles
6. Reviewed Conference Publications
7. Workshops
8. Short papers and other publications
9. Unpublished
- 9.1. A needle in the stack: efficient clone detection for huge collections of source code
10. Papers pre 2001.

These are my papers since I started working as a research professor at the University of Victoria. Most papers link to the latest version of such paper. If not, and you are interested in one, please email me at dmg at uvic dot ca.

–daniel morales german

1 Other references

1.1 My Google Scholar page.

1.2 My DBLP page.

2 Totals in the last 10 years

Year	Journals	Conference	Shorts/Other	Workshops	Total
2012	3	2	1		6
2011		4	2	1	7
2010		5	2	2	9
2009	3	6	1		10
2008	2	6	1		9
2007		5	1	3	9
2006	2	1	1	4	8
2005		5		3	8
2004	2	3		4	9
2003		1		5	6
2002				3	3
total	12	38	9	25	78

3 Batting Average

3.1 Journal Papers

Year	Submitted	Accepted	Rejected	Avg	Actual Rejec
2012	6	3		1.000		Sub. 1 TOSEM, Sub. 1 EMSE
2011	0					1 Submitted and accepted in 2011, but will count when it is published
2010	0
2009	1	0	1	.000	0	Rej. from TOSEM, finally, part of it published in FASE'12
2008	3	2	1	0.667	0	Rej. IEEE Software Special Issue, unpublished

3.2 Full Papers at Conferences

Year	Submitted	Accepted	Rejected	Avg	Actual Reject	Notes
2012	6	2	2	.500		Rej. 2 ICSE,
2011	9	4	5	.444	2*	Rej. ICSE, published at MSR–invited paper for Journal; Rej. ICSE; Rej. FSE and CSCW, published at FASE'12; Rej: WCRE; Rej: ICSM
2010	7	5	2	.714	0	Rej. ICSE, published at ASE'10; Rej. CSMR, published at ICSOFT'11
2009	7	6	1	.857	0	Rej. ICSM, published at ICSE'10
2008	7	6	1	.857	1	Rej. ICSM, never published

Actual rejects are papers that have not been published yet. In many cases, a rejected paper is reworked, send to another venue, then accepted. Such paper counts in the "Reject" tally, but not in the "Actual Reject" one.

3.3 Short Papers and Tech Briefings at Conferences

Year	Submitted	Accepted	Rejected	Avg	Effective Reject
2012	1	1		1.000
2011	2	2	0	1.000	0
2010	3	2	1	.667	1	Rejected from MSR, unpublished
2009	1	1	0	1.000	0

3.4 Papers at Workshops

Year	Submitted	Accepted	Rejected	Avg
2011	1	1	0	1.0
2010	2	2	0	1.0
2009	0
2008	0
2007	3	3	0	1.0

4 Currently under review

4.1 The Evolution of the R Software Ecosystem

Daniel M German, Bram Adams and Ahmed E. Hassan.

@<dl>@<dd> Software ecosystems form the heart of modern companies' collaboration strategies with open source developers and other companies. An ecosystem consists of a core platform and a halo of user contributions that provide value to a company or project. In order to sustain the level and number of high-quality contributions, it is crucial for companies and contributors to understand how their ecosystems can be maintained successfully over time. As a first step, this paper explores the evolution characteristics of a successful, modern desktop ecosystem, i.e., the statistical computing project GNU R. We find that the ecosystem of user-contributed packages has been growing steadily since R's conception, at a significantly faster rate than core packages, yet each individual package remains stable in size. We also identified differences in the way user-contributed and core packages are able to attract an active community of users. @</dl>

4.2 Mining for API usage: Lessons Learned from a Case Study on the GNOME platform

Robles, G., J.M. Gonzalez-Barahona, Daniel M German and German Poo-Camaño.

@<dl>@<dd> Libraries, and the APIs they offer, are elements of great importance in the software development landscape. Many software platforms and ecosystems (Java, Windows, iOS, Android, etc.) rely on them, and it is really difficult to find a software development project that does not build on a sizable number of libraries. Acknowledging this view, most of previous research on software libraries has been performed from the perspective of the users of the API: developers building an application. In this paper we have switched perspective, shedding some light on the libraries themselves, and their maintenance process. We have chosen GNOME, a large software platform with over a dozen libraries and hundreds of applications relying on them. By analyzing information on how the functions in the libraries are used in the whole GNOME platform, we have learned some lessons about how those libraries are built. The results show that using this methodology, not only library maintainers may gain much insight for the maintenance of their libraries, but platform designers will be able as well to address issues related to the evolution of the whole platform. @</dl>

4.3 Peer Review on Open Source Software Projects: Parameters, Statistical Models, and Theory

Peter C Rigby, Daniel M German, Laura Cowen, Margaret-Anne Storey

@<dl>@<dd> Peer review is seen as an important quality assurance mechanism in both industrial development and the open source software (OSS) community. The techniques for performing inspections have been well studied in industry; in OSS development, peer reviews are not as well understood. To develop an empirical understanding of OSS peer review, we examine the archival records of six large, mature, successful OSS projects. We construct a series of measures based on those used in traditional in- spection experiments. We measure the frequency of review, the size and complexity of the contribution under review, the level of participation during review, the experience and expertise of the individuals involved in the review, the review interval, and the number of issues discussed during review. We create statistical models of the review efficiency, review interval, and effectiveness, the issues discussed during review, to determine which measures have the largest impact on review efficacy. We conclude that OSS reviews can be described as (1) early, frequent reviews (2) of small, independent, complete contributions (3) that, despite being asynchronously broadcast to a large group of stakeholders, are reviewed by a small group of self-selected experts (4) resulting in an efficient and effective peer review technique @</dl>

4.4 Management of Community Contributions: A Case Study on the Android and Linux Software Ecosystems

Nicolas Bettenburg, Bram Adams, Ahmed E. Hassan, Daniel M. German

@<dl>@<dd> In recent years, many companies have realized that collaboration with a thriving user or developer community is a major factor in creating innovative technology driven by market demand. As a result, businesses have sought ways to stimulate contributions from developers outside their corporate walls, and integrate external developers into their development process. To support software companies in this process, this paper presents an empirical study on the contribution manage- ment processes of two major, successful, open source software ecosystems. We contrast a for-profit (Android) system with a hybrid contribution style, and a non-for-profit (Linux) system with an open contribution style. To guide our comparisons, we base our analysis on a conceptual model of contribution management that we derived from a total of seven major open-source software systems. A quantitative comparison based on data mined from the Android code review system and Linux code review mailing lists shows that both projects have significantly different contribution management styles, suited to their respective market goals, but with individual advantages and disadvantages that are important for practitioners @</dl>

4.5 Integration of Software Projects in Large Software Distributions

Ryan Kavanagh, Bram Adams, Ahmed E. Hassan and Daniel M. German

@<dl>@<dd> Abstract—Software integrators reuse 3rd party software to create custom products. Whereas the tools and processes to acquire and reuse 3rd party software have been analyzed in depth, the activities involved with maintaining integrated software components are relatively unknown. As middle-men between customers and 3rd party developers, the integrators are responsible for keeping the 3rd party artefacts up-to-date with new versions or bug fixes, for customizing the reused artefacts according to the integrators’ needs and policies, and for reporting (if possibly, fixing) bugs in the reused artefacts. To understand the activities of integrators in practice, and identify the best practices used to manage these activities, we empirically studied three large-scale, successful open source distributions (Debian, Ubuntu and FreeBSD). Using grounded theory, we identified six major integration activities, and documented the major tasks and best practices involved with each activity. The documented activities provide a clear under- standing of the maintenance needs of integrators in general, and highlight the need for better tool and process support. @</dl>

4.6 What Goes into an Executable? Identifying a Binary’s Sources by Tracing Build Processes

@<dl>@<dd> Abstract— With an increasing proliferation of open source software components and libraries in the world, intellectual property in general, and copyright law in particular, has become a critical non-functional requirement for contemporary software systems. Existing research goes a long way toward under- standing these intellectual property concerns as they relate to software engineering, and important “licensing software patterns” have emerged to help practitioners implementing new systems. But little work exists on systematically detecting and resolving intellectual property issues in existing systems. In this exploratory case study of seven open source systems we look at a key sub-problem of copyright detection: how can we reverse-engineering a system’s composition? This sounds trivial, but copyright law operates on fixed works of authorship, as opposed to components. Current legal thought conservatively recommends practitioners assume a “fixed work” to be a single file. By broadly defining ‘source’ as any file that can appear, in a software system, with or without transformation, be it programming code, documentation, image files, sound clips, animations, or any other work where copyright can subsist, we propose a systematic way for determining exactly which sources are in the binary. Our method records and analyzes system calls (e.g., open) made during the software system’s build process. Results from the seven case studies, which together constitute over 6,000 source files, suggest our method is very effective. @</dl>

5 Journal Articles

5.1 2012

5.1.1 EMSE: Software Bertillonage: Determining the Provenance of Software Development Artifacts

To be published by Empirical Software Engineering. Julius Davies, Daniel M. German, Michael W. Godfrey and Abram Hindle. Software Bertillonage: Determining the Provenance of Software Development Artifacts. April 2012, Journal of Empirical Software Engineering. DOI 10.1007/s10664-012-9199-7. Link to the journal version

@<dl>@<dd> Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components — such as external libraries or cloned source code — is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying the source origin of binary libraries within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 275GB collection of open source Java libraries. To show the approach is both valid and effective, we conduct an empirical study on 945 jars from the Debian GNU/Linux distribution, as well as an industrial case study on 81 jars from an e-commerce application. @</dl>

5.1.2 IEEE Software A Method for Open Source License Compliance of Java Applications

Dainel German and Massimiliano di Penta, A Method for Open Source License Compliance of Java Applications IEEE Software, Special Issue on Legal Compliance. Volume 29, Issue 3, pages 58-63. DOI 10.1109/MS.2012.50.

@<dl>@<dd> Open source license compliance (OSLC) is the process of ensuring that an organization satisfies the licensing requirements of the open source software it reuses, whether for its internal use, or as a part of a product it ships. In this paper we describe the major challenges of OSLC: component identification, provenance discovery, license identification, and licensing requirements analysis. Then, we describe \emph{Kenen}, an approach that assists organizations in the OSLC of Java components. We show its effectiveness by analyzing the licensing compliance of a commercial application. @</dl>

5.1.3 IEEE Software. Open Source Peer Review – Lessons and Recommendations for Closed Source

Peter C. Rigby, Brendan Cleary, Frederic Painchaud, Margaret-Anne Storey, Daniel M. German. To be published.

@<dl>@<dd> Although the effectiveness of software inspec- tion (formal peer review) has a long and mostly supportive history, the thought of reviewing a large, unfamiliar software artifact over a period of weeks is something dreaded by both the author and the reviewers. The dislike of this cumbersome process is natural, but neither the formality nor the aversion are fundamental characteristics of peer review. The core idea of review is simply to get an expert peer to examine your work for problems that you are blind to. The actual process is much less important for finding defects than the expertise of the people involved. @</dl>

5.2 2009

5.2.1 INFSOF Change Impact Graphs: Determining the Impact of Prior Code Changes

German, D.M., Robles, G, and Hassan, A. "Change Impact Graphs: Determining the Impact of Prior Code Changes" , Journal of Information and Software Technology (INFSOF), Volume 51, Number 10, pages 1394–1408, Oct 2009.

@<dl>@<dd> The source code of a software system is in constant change. The impact of these changes spreads out across the software system and may lead to the sudden manifestation of failures in unchanged parts. To help developers fix such failures, we propose a method that, in a pre-processing stage, analyzes prior code changes to determine what functions have been modified. Next, given a particular period of time in the past, the functions changed during that period are propagated throughout the rest of the system using the dependence graph of the system. This information is visualized using Change Impact Graphs (CIGs). Through a case study based on the Apache Web Server, we demonstrate the benefit of using CIGs to investigate several real defects. @</dl>

5.2.2 Computer And Graphics. Improving scans of black and white photographs by recovering the print maker's artistic intent

German, D.M. and Rigau, J. "Improving scans of black and white photographs by recovering the print maker's artistic intent", Computer And Graphics, Volume 33, Issue 4, Pages 509–520 (August 2009). Journal version

@<dl>@<dd> In this paper we propose a method that reverse engineers the aesthetic decisions made by a print maker to produce a print from a negative, namely cropping, contrast selection, and dodging-and-burning. It then re-applies this process to the electronic negative in order to achieve an electronic version of such print with better tonal range and detail than one produced by scanning the print. We then extend this method to restore a print by combining scans of different versions of the same image. @</dd>

5.2.3 JESE Macro-level software evolution: a case study of a large software compilation

Robles, G., J.M. Gonzalez-Barahona, J.J. Amor, M. Michlmayr, and D.M. German, "Macro-level software evolution: a case study of a large software compilation" Extended version of Best Paper Award at MSR 2006. Available in electronic form at: http://www.springerlink.com/content/c516h8t6l16251l5/?p=642a63b621dc475dbe64817b2f4ff32b Journal of Empirical Software Engineering, Vol. 14, No. 3. (1 June 2009), pp. 262-285.

@<dl>@<dd> With the success of libre (free, open source) software, a new type of software compilation has become increasingly common: the 'software distribution'. Software distributions group hundreds, if not thousands, of software applications and libraries, written by independent parties, into an integrated system. Software evolution studies usually focus on the evolution of a single application. The study of software distributions provides a different evolutionary point of view. In this sense, we identify a dichotomy, similar to the one found in economics: software evolution in the small (the evolution of a single application) versus software evolution in the large (the evolution of compilations of software, composed of many different individual software applications that work together to form a system). In this paper we focus on this macro view by studying the evolution of Debian GNU/Linux (the well-known Linux distribution) over a period of 9 years. For each release of Debian we downloaded and analyzed the source code of the each of the application that compose them. We then proceeded to study their evolution in terms of number applications included, the size of each of these applications, the programming language used, and their interdependencies. Our findings demonstrate that Debian is an interesting ecosystem composed of applications of all sizes (with a large proportion of small, and few huge ones). Some evolve rapidly, while others do it more slowly. We also found that some applications never change, while others are removed from the distribution. We have also discovered that applications are hardly isolated: they depend upon many others to function, and other applications might depend upon them. We believe that the study of distributions such as Debian can be of great interest not only for understanding their own evolution, but they can be used as a proxy to understand the evolution of libre software, and software in general. @</dl>

5.3 2008

5.3.1 IJBIS Managing legal risks associated with intellectual property on the web

Kienle, H.H., D.M. German, S. Tilley, and H. Muller, "Managing legal risks associated with intellectual property on the web," Int. Journal of Business Information Systems, vol. 3, no. 1, 2008.

@<dl>@<dd> Intellectual property (IP) has taken a prominent place on the Web. Today's organizations (and small and medium-sized enterprises (SMEs) in particular) need to know the ways in which their Web sites can be the target of costly IP litigation. Organizations also need to know how to manage and protect their own IP that they expose through their Web presence.

This paper provides an overview of the legal risks associated with IP on the Web. Managing such risks begins with gaining a clear understanding of how to address the salient issues related to IP that any organization has to take into account when it has a Web presence or provides a service using the Web. Towards this end, a comprehensive survey of existing IP case law in the context of Web content is provided. The survey focuses on three essential IP areas: copyright, patents, and trademarks. @</dl>

5.3.2 JSME A survey and evaluation of tool features for understanding reverse-engineered sequence diagrams

C. Bennett, D. Myers, M.-A. Storey, D. M. German, D. Ouellet, M. Salois, P. Charland "A survey and evaluation of tool features for understanding reverse-engineered sequence diagrams", Journal of Software Maintenance and Evolution. Vol. 20, no. 4. 2008.

@<dl>@<dd> Sequence diagrams can be valuable aids to software understanding. However, they can be extremely large and hard to understand even using modern tool support. Consequently, providing the right set of tool features is important if the tools are to help rather than hinder the user. This paper surveys research and commercial sequence diagram tools to determine the features they provide to support program understanding. Although there has been significant effort in developing these tools, many of them have not been evaluated using human subjects. To begin to address this gap, a preliminary study was performed with a specially designed sequence diagram tool that implements the features found during the survey. Based on an analysis of the study results, we discuss the features that were found useful and relate these to the tasks performed. It concludes by proposing how future tools can be improved to better support the exploration of large sequence diagrams. @</dl>

5.4 2006

5.4.1 JESE An empirical study of fine-grained software modifications

German, D.M., "An empirical study of fine-grained software modifications," in Journal of Empirical Software Engineering, vol. 11, no. 3, pp. 5-22, Feb. 2006.

@<dl>@<dd> Software is typically improved and modified in small increments (we refer to each of these increments as a modification record–MR). MRs are usually stored in a configuration management or version control system and can be retrieved for analysis. In this study we retrieved the MRs from several mature open software projects. We then concentrated our analysis on those MRs that fix defects and provided heuristics to automatically classify them. We used the information in the MRs to visualize what files are changed at the same time, and who are the people who tend to modify certain files. We argue that these visualizations can be used to understand the development stage of in which a project is at a given time (new features are added, or defects are being fixed), the level of modularization of a project, and how developers might interact between each other and the source code of a system @</dl>

5.4.2 SEKE Visualizing the evolution of software using softChange

German, D.M., and A. Hindle, "Visualizing the evolution of software using softChange," in Journal of Software Engineering Knowledge Engineering, Special Issue of Best Papers SEKE 2004, vol.16, no. 1, pp. 4-22, Feb. 2006.

@<dl>@<dd> A typical software development team leaves behind a large amount of information. This information takes different forms, such as mail messages, software releases, version control logs, defect reports, etc. softChange is a tool that retrieves this information, analyses and enhances it by finding new relationships amongst it, and then allows users to navigate and visualize this information. The main objective of softChange it to help programmers, their management and software evolution researchers in understanding how a software product has evolved since its conception. @</dl>

5.5 2004

5.5.1 JSME Using software trails to reconstruct the evolution of software

German, D.M., "Using software trails to reconstruct the evolution of software," Journal of Software Maintenance and Evolution: Research and Practice, vol. 16, no. 6, pp. 367-384, 2004.

@<dl>@<dd> This paper describes a method to recover the evolution of a software system using its software trails: information left behind by the contributors to the development process of the product, such as mailing lists, Web sites, version control logs, software releases, documentation, and the source code. This paper demonstrates the use of this method by recovering the evolution of Ximian Evolution, a mail client for Unix. By extracting useful facts stored in these software trails and correlating them, it was possible to provide a detailed view of the history of this project. This view provides interesting insight into how an open source software project evolves and some of the practices used by its software developers. @</dl>

5.5.2 JSPIP Decentralized open source global software development, the GNOME experience

German, D.M., "Decentralized open source global software development, the GNOME experience," Journal of Software Process: Improvement and Practice, vol. 8, no.4, pp. 201-215, 2004.

5.5.3 TODO abstract and reference to paper

@<hr>

6 Reviewed Conference Publications

6.1 2012

6.1.1 FASE: Cohesive and Isolated Development with Branches

Earl T. Barr, Christian Bird, Peter C. Rigby, Abram Hindle, Daniel M. German, and Premkumar Devanbu. Acceptance Rate 24.6% (33 of 134 full papers). FASE 2012

@<dl>@<dd> Abstract. The adoption of distributed version control (DVC), such as Git and Mercurial, in open-source software (OSS) projects has been explosive. Why is this and how are projects using DVC? This new generation of version control supports two important new features: distributed repositories and histories that preserve branches and merges. Through interviews with lead developers in OSS projects and a quantitative analysis of mined data from the histories of sixty project, we find that the vast majority of the projects now using DVC continue to use a centralized model of code sharing, while using branching much more extensively than before their transition to DVC. We then examine the Linux history in depth in an effort to understand and evaluate how branches are used and what benefits they provide. We find that they enable natural collaborative processes: DVC branching allows developers to collaborate on tasks in highly cohesive branches, while enjoying reduced interference from developers working on other tasks, even if those tasks are strongly coupled to theirs. @</dl>

6.1.2 ICSE Educational Track: Five Days of Empirical Software Engineering: the PASED Experience

Massimiliano Di Penta, Giuliano Antoniol, Daniel M. German, Yann-Gael Gueheneuc, Bram Adams. Acceptance Rate 22% (11 out of 44 papers). ICSE Education 2012.

@<dl>@<dd> Abstract—Acquiring the skills to plan and conduct different kinds of empirical studies is a mandatory requirement for graduate students working in the field of software engineering. These skills typically can only be developed based on the teaching and experience of the students’ supervisor, because of the lack of specific, practical courses providing these skills. To fill this gap, we organized the first Canadian Summer School on Practical Analyses of Software Engineering Data (PASED). The aim of PASED is to provide—using a “learning by doing” model of teaching—a solid foundation to software engineering graduate students on conducting empirical studies. This paper describes our experience in organizing the PASED school, i.e., what challenges we encountered, how we designed the lectures and laboratories, and what could be improved in the future based on the participants’ feedback. @</dl>

6.2 2011

6.2.1 ISVC. A Comparative Evaluation of Feature Detectors on Historic Repeat Photography

Christopher Gat, Alexandra Branzan Albu, Daniel German, Eric Higgs: "A Comparative Evaluation of Feature Detectors on Historic Repeat Photography" Advances in Visual Computing - 7th International Symposium, ISVC 2011. Proceedings, Part II. Lecture Notes in Computer Science 6939 Springer 2011, ISBN 978-3-642-24030-0.

@<dl>@<dd> XXX To be written XXX @</dl>

6.2.2 CaE. gamutHeatMap: Visualizing the Colour Shift of Rendering Intent Transformations

Christopher Gat, Hanyu Zhang, Daniel M. German and Melanie Tory. "gamutHeatMap: Visualizing the Colour Shift of Rendering Intent Transformations", in International Symposium on Computational Aesthetics in Graphics, Visualization, and Imaging 2010. To be presented. @42% Acceptance Rate ( 10 out of 24 full papers)@

@<dl>@<dd> When a photograph is printed, its original colours are converted to those of the output medium using a rendering intent transformation. This process takes into consideration the colour properties of the paper and the printer used. gamutHeatMap are a visualization that highlights the perceptual difference between a soft-proof of a photograph in the intended output medium, and its original. They can be used to compare different output media to determine the one that most accurately renders the colours of a given photograph. @</dl>@</dd>

6.2.3 ICSOFT. On The Distribution Of Program Sizes

Herraiz, Israel and German, Daniel M. and Hassan, Ahmed E. (2011) On the distribution of source code file sizes. In: ICSOFT 2011 - International Conference on Software and Data Technologies, 18-21 July 2011, Seville, Spain. ICSOFT'2011. 21% acceptance rate (214 full papers submitted, 46 accepted)

@<dl>@<dd> Source code size is an estimator of software effort. Size is also often used to calibrate models and equations to estimate the cost of software. The distribution of source code file sizes has been shown in the literature to be a lognormal distribution. In this paper, we measure the size of a large collection of software (the Debian GNU/Linux distribution version 5.0.2), and we find that the statistical distribution of its source code file sizes follows a double Pareto distribution. This means that large files are to be found more often than predicted by the lognormal distribution, therefore the previously proposed models underestimate the cost of software. @</dl>@</dd>

6.2.4 MSR: Software Bertillonage: Finding the provenance of an entity

Software Bertillonage: Finding the provenance of an entity, by Julius Davies, Daniel M. German, and Michael W. Godfrey, Working Conference in Mining Software Repositories MSR'2011. 183-192. @32.8% acceptance rate (21 out of 61 full papers)@

We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19-th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called \emph{anchored signature matching} for identifying library version information within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 150GB collection of open source Java libraries. An exploratory case study using a proprietary e-commerce Java application illustrates that the approach is both feasible and effective. @</dl>

6.3 2010

6.3.1 ICSE. Tracking the Evolution of Software Licensing: An Empirical Study

Di Penta M., German, D., Gueheneuc Y. and Antoniol, G "Tracking the Evolution of Software Licensing: An Empirical Study", International Conference on Software Engineering (ICSE 2010), @13.7% acceptance rate (52 out of 380 full papers)@

@<dl>@<dd> Free and open source software (FOSS) is distributed and made available to users under different software licenses, mentioned in FOSS code by means of licensing statements. Various factors, such as changes in the legal landscape, commercial code licensed as FOSS, or code reused from other FOSS systems, lead to evolution of licensing, which may affect the way a system or part of it can be subsequently used. Therefore, it is crucial to monitor licensing evolution. However, manually tracking the licensing evolution of thousands of files is a daunting task. After presenting several cases about the effects of licensing evolution, we argue that developers and system integrators must monitor licensing evolution and they need an automatic approach due of the sheer size of FOSS. We propose an approach to automatically track changes occurring in the licensing terms of a system and report an empirical study of the licensing evolution of six different FOSS systems. Results show that licensing underwent frequent and substantial changes. @</dl>

6.3.2 MSR. Identifying Licensing of Jar Archives using a Code-Search Approach

Di Penta M., German, D. and Antoniol, G. "Identifying Licensing of Jar Archives using a Code-Search Approach", International Working Conference in Mining Software Repositories, (MSR 2010). Pages 86–89. @31% Acceptance Rate (17 out of 51)@

@<dl>@<dd> Free and open source software strongly promotes the reuse of source code. Some open source Java components/libraries are distributed as jar archives only containing the bytecode and some additional information. For whoever wanting to integrate this jar in her own project, it is important to determine the license(s) of the code from which the jar archive was produced, as this affects the way that such component can be used.

This paper proposes an automatic approach to determine the license of jar archives, combining the use of a code-search engine with the automatic classification of licenses contained in textual files enclosed in the jar.

Results of an empirical study performed on 37 jars—from 17 different systems—indicate that this approach is able to successfully infer the jar licenses in over 95\% of the cases, but that in many cases the license in textual files may differ from the one of the classes contained in the jar. @</dl>

6.3.3 CAe. Pannini: A New Projection for Rendering Wide Angle Perspective Images

Sharpless T., Postle B. and German M. "Pannini: A New Projection for Rendering Wide Angle Perspective Images", International Symposium on Computational Aesthetics in Graphics, Visualization, and Imaging 2010, pages 9-16. Acceptance rate 38% (14 out of 37). Please email me for a copy, the copyright agreement of the conference does not allow me to post it for 6 months after the conference.

@<dl>@<dd> The widely used rectilinear perspective projection cannot render realistic looking flat views with fields of view much wider than 70 degrees. Yet 18th century artists known as `view painters' depicted wider architectural scenes without visible perspective distortion. We have found no written records of how they did that, however, quantitative analysis of several works suggests that the key is a system for compressing horizontal angles while preserving certain straight lines important for the perspective illusion.

We show that a simple double projection of the sphere to the plane, that we call the @Pannini projection@, can render images 150\degree or more wide with a natural appearance, reminiscent of Vedutismo perspective. We give the mathematical formulas for realizing it numerically, in a general form that can be adjusted to suit a wide range of subject matter and field widths, and briefly compare it to other proposed alternatives to the rectilinear projection. @</dl>

6.3.4 ICPC. Understanding and Auditing the Licensing of Open Source Software Distributions

German D., Di Penta M. and Davies J. "Understanding and Auditing the Licensing of Open Source Software Distributions", International Conference in Program Comprehension (ICPC 2010) @20% acceptance rate (15 out of 76 full papers accepted).@

@<dl>@<dd> Free and open source software (FOSS) applications, libraries, and components are very commonly distributed in binary packages, often part of GNU/Linux operating system distributions, or as part of software or hardware products distributed (and potentially sold) to users.

FOSS creates great opportunities for users, developers and integrators, however it is important for them to understand the licensing requirements of any package they use.

Determining the license of a package and assessing whether it depends on other software with incompatible licenses is not trivial. Although this task has been done in a labor intensive manner by software distributions, such as Fedora and Debian, automatic tools to perform this analysis are highly desired.

This paper proposes a method to understand licensing compatibility issues in software packages, and reports an empirical study aimed at auditing licensing issues in binary packages of the Fedora-12 GNU/Linux distribution. The objective of this study is (i) to understand how the license declared in packages is consistent with those of source code files, and (ii) to audit the licensing information of Fedora-12, highlighting cases of incompatibilities between dependent packages.

The obtained results—supported by feedback received from Fedora contributors—show that there exist many nuances in determining the license of a binary package from its source code, as well as cases of license incompatibility issues due to package dependencies.

@</dl>

6.3.5 ASE. A sentence-matching method for automatic license identification of source code files

by D.M. German, Y. Manabe and K. Inoue. @Accepted for publication at Automated Software Engineering, 2010, final version will be different from this one.@ Paper in PDF

@<dl>@<dd> The reuse of free and open source software (FOSS) components is becoming more prevalent. One of the major challenges in finding the right component is finding one that has a license that is adequate for its intended use. The license of a FOSS component is determined by the licenses of its source code files. In this paper, we describe the challenges of identifying the license under which source code is made available, and propose a sentence-based matching algorithm to automatically do it. We demonstrate the feasibility of our approach by implementing a tool named Ninka. We performed an evaluation that shows that Ninka outperforms other methods of license identification in precision and speed. We also performed an empirical study on 0.8 million source code files of Debian that highlight interesting facts about the manner in which licenses are used by FOSS. @</dl>

6.4 2009

6.4.1 WCRE. Who are Source Code Contributors and How do they Change?

Di Penta, Massimiliano and German, Daniel M. Who are Source Code Contributors and How do they Change?, in Proc. 14th Working Conf. on Reverse Engineering, pp. 11-20, 2009. Acceptance rate 25\% (20 full papers accepted from 79 submissions).

@<dl>@<dd> Determining who are the copyright owners of a software system is important as they are the individuals and organizations that license the software to its users, and ultimately the legal entities that can enforce its licensing terms and change its license. In this paper we describe the difficulties of identifying the explicit copyright owners of a system, and those who contribute source code to it–who could potentially claim are also copyright owners of it.

The paper introduces a method to track the names of contributors, including those explicitly listed as copyright owners from licensing statements in source code file. Then, it reports an empirical study performed on four open source systems—namely ArgoUML, Mozilla, Samba, and Squid—aimed at investigating the characteristics of their contributors and how they relate to the commits recorded in the system and users who perform them (its committers).

Results indicate that explicit contributors and copyright owners are not necessarily the most frequent committers. Also, they are often added during larger changes than average. @</dl>

6.4.2 MSR. The Promises and Perils of Mining Git

Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, and Prem Devanbu, The Promises and Perils of Mining Git, 6th International Working Conference on Mining Software Repositories (MSR 2009), May 2009. Acceptance rate 30% (14 out of 47 full papers)

@<dl>@<dd> We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as ``How do contributions flow between developers to the official project repository?'' However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data. @</dl>

6.4.3 MSR. Code siblings: Technical and Legal Implications

German, D., Di Penta M., Gueheneuc Y. and Antoniol, G ``Code Siblings: Technical and Legal Implications'', 6th International Working Conference on Mining Software Repositories (MSR 2009), May 2009. Acceptance rate 30% (14 out of 47 full papers)

@<dl>@<dd> Software systems, like islands, create environments where code is developed and evolved. They can be very close or far apart from a technical and a legal point of view. Like finches, code fragments can move from one system to another.

We investigate the effect of software licenses on source code fragments migration between different systems developed under different licenses. Like wind currents between two islands may prevent seagull migration in one or both directions, we posit that licenses may prevent code migration between systems: in presence of incompatible licenses, code cannot—legally—migrate between them or may only migrate in one direction.

In this paper, we use clone detection, license mining and classification, and change history techniques to understand how code fragments—under different licenses—flow in one direction or the other between Linux and two BSD Unixes, FreeBSD and OpenBSD, and to what extent third-party code is introduced into different kernels. @</dl>

6.4.4 ICPC. Automatic Classification of Large Changes into Maintenance Categories

Hindle, A, German, D, Godfrey M, and Holt R. "Automatic Classification of Large Changes into Maintenance Categories", 17th IEEE International Conference on Program Comprehension (ICPC), 2009. (27% acceptance rate 20, full papers accepted from 74 papers submitted).

@<dl>@<dd> Large software systems undergo significant evolution during their lifespan, yet often individual changes are not well documented. In this work, we seek to automatically classify large changes into various categories of maintenance tasks — corrective, adaptive, perfective, feature addition, and non-functional improvement — using machine learning techniques. In a previous paper, we found that many commits could be classified easily and reliably based solely on the manual analysis of the commit metadata and commit messages (i.e., without reference to the source code). Our extension is the automation of classification by training Machine Learners on features extracted from the commit metadata, such as the word distribution of a commit message, commit author, and modules modified. We validated the results of the learners via 10-fold cross validation, which achieved accuracies consistently above 50%, indicating good to fair results. We found that the identity of the author of a commit provided much information about the maintenance class of a commit, almost as much as the words of the commit message. This implies that for most large commits, the Source Control System (SCS) commit messages plus the commit author identity is enough information to accurately and automatically categorize the nature of the maintenance task. @</dl>

6.4.5 ICSE. License Integration Patterns: Dealing with Licenses Mismatches in Component-Based Development

German, D.M., and Hassan, A. "License Integration Patterns: Dealing with Licenses Mismatches in Component-Based Development", International Conference of Software Engineering (ICSE) 2009, May 2009. Acceptance rate 13%.

@<dl>@<dd> In this paper we address the problem of combining software components with different and possibly incompatible legal licenses to create a software application that does not violate any of these licenses while potentially having its own. We call this problem the "license mismatch" problem. The rapid growth and availability of Open Source Software (OSS) components with varying licenses, and the existence of more than 70 OSS licenses increases the complexity of this problem. Based on a study of 124 OSS software packages, we developed a model which describes the interconnection of components in these packages from a legal point of view. We used our model to document integration patterns that are commonly used to solve the license mismatch problem in practice when creating both proprietary and OSS applications. Software engineers with little legal expertise could use these documented patterns to understand and address the legal issues involved in reusing components with different and possibly conflicting licenses. @</dl>

6.4.6 OSS. An empirical study of the reuse of software licensed under the GNU General Public License

German, D. and Gonzalez-Barahona J "An empirical study of the reuse of software licensed under the GNU General Public License". IFIP Open Source Ecosystems: Diverse Communities Interacting, Advances in Information and Communication Technology, Volume 299/2009. Pages 185-198. Acceptance rate 49% (29 out of 59) Publishers version

@<dl>@<dd> Software licensing is a complex issue in free and open source software (FOSS), specially when it involves the redistribution of derived works. The creation of derivative works created from components with different FOSS licenses poses complex challenges, particularly when one of the components is licensed under the terms of one of the versions of the GNU General Public License (GPL). This paper describes an empirical study of the manner in which GPLed licensed software is combined with components under different FOSS licenses. We have discovered that FOSS software developers have found interesting methods to create derivative works with GPLed software that legally circumvent the apparent restrictions of the GPL. In this paper we document these methods and show that FOSS licenses interact in complex and unexpected ways. In most of these cases the goal of the developers (both licensors and licensees) is to further increase the commons of FOSS. @</dd>

6.5 2008

6.5.1 MSR. What do large commits tell us? A taxonomical study of large commits

Hindle, A., German, D.M. and Holt, R. "What do large commits tell us? A taxonomical study of large commits", 5th International Working Conference on Mining Software Repositories (MSR 2008), May 2008, pages 99—108. Acceptance rate 40%.

@<dl>@<dd> Research in the mining of software repositories has frequently ignored commits that include a large number of files (we call these large commits). The main goal of this paper is to understand the rationale behind large commits, and if there is anything we can learn from them. To address this goal we performed a case study that included the manual classification of large commits of nine open source projects. The contributions include a taxonomy of large commits, which are grouped according to their intention. We contrast large commits against small commits and show that large commits are more perfective while small commits are more corrective. These large commits provide us with a window on the development practices of maintenance teams. @</dl>

6.5.2 CAe. Improving scans of black and white photographs by recovering the print maker's artistic intent

German, D.M. "Improving scans of black and white photographs by recovering the print maker's artistic intent", Computational Aesthetics in Graphics, Visualization, and Imaging (CaE’08), pages 99-106, June 2008. This paper was invited to the special issue of Best Papers of CaE’08, to appear in Computers and Graphics.

6.5.3 SCAM. Change Impact Graphs: Determining the Impact of Prior Code Changes

German, D.M., Robles, G, and Hassan, A. "Change Impact Graphs: Determining the Impact of Prior Code Changes", International Conference in Source Code Analysis and Manipulation (SCAM) 2008, pages 184—193, Sept. 2008. Acceptance rate 41%. Shortlisted for best paper award and invited to special issue of Best Papers of SCAM’08 to appear in Journal of Information and Software Technology.

@<dl>@<dd> The source code of a software system is in constant change. The impact of these changes spreads out across the software system and may lead to the sudden manifestation of failures in unchanged parts. To help developers fix such failures, we propose a method that, in a pre-processing stage, analyzes prior code changes to determine what functions have been modified. Next, given a particular period of time in the past, the functions changed during this period are propagated throughout the rest of the system using the dependence graph of the system. This information is visualized using Change Impact Graphs (CIGs). Through a case study based on the Apache Web Server, we demonstrate the benefit of using CIGs to investigate several real defects. @</dl>

6.5.4 ICSM. Remixing visualization to support collaboration in software maintenance

Storey, M.A.; Bennett, C.; Bull, R. I.; German, D.M.; "Remixing visualization to support collaboration in software maintenance". International Conference of Software Maintenance and Evolution (ICSM’08)Frontiers of Software Maintenance, 2008. FoSM 2008. Pages:139 – 148. Invited paper.

@<dl>@<dd> In this paper wWe propose that collaborative software visualization can improve team software maintenance work. We first provide a brief overview oreview f how visualization is used to can support software maintenance from three the perspectives of: system understanding, process understanding and software evolution. From this review, we determined conclude that visualization tools are rarely designed to provide explicit support for collaborative authoring and sharing of views. Thus, wWe then provide an overview of research from a from Computer Supported Cooperative Work (CSCW) perspective, and propose that CSCW this research should be applied to software visualization. We explore both the opportunities and challenges this research focus presents and conclude that more attention paid to the social aspects of software maintenance visualization should improve both individual and team processes in software maintenance @</dl>

6.5.5 ICSM. The past, present, and future of software evolution

Godfrey, Michael W.; German, Daniel M.; "The past, present, and future of software evolution", International Conference of Software Maintenance and Evolution (ICSM’08) Frontiers of Software Maintenance, 2008. FoSM 2008. Page(s):129 – 138. Invited paper.

@<dl>@<dd> Change is an essential characteristic of software development, as software systems must respond to evolving requirements, platforms, and other environmental pressures. In this paper, we discuss the concept of software evolution from several perspectives. We examine how it relates to and differs from software maintenance. We discuss insights about software evolution arising from Lehman's laws of software evolution and the staged lifecycle model of Bennett and Rajlich. We compare software evolution to other kinds of evolution, from science and social sciences, and we examine the forces that shape change. Finally, we discuss the changing nature of software in general as it relates to evolution, and we propose open challenges and future directions for software evolution research. @</dl>

6.5.6 ICSE. Open source software peer review practices: a case study of the Apache server

Rigby, P., D.M. German, and M. A. Storey, "Open source software peer review practices: a case study of the Apache server" in ACM/IEEE Int. Conf. on Software Engineering (ICSE'08), 2008, pages 541—550. Acceptance rate 15% (56 of the 371 technical papers).

@<dl>@<dd> Peer review is seen as an important quality assurance mechanism in both industrial development and the open source software (OSS) community. The techniques for performing inspections have been well studied in industry; in OSS development, peer reviews are less well understood. We examine the two peer review techniques used by the successful, mature Apache server project: review-then-commit and commit-then-review. Using archival records of email discussion and version control repositories, we construct a series of metrics that produces measures similar to those used in traditional inspection experiments. Specifically, we measure the frequency of review, the level of participation in reviews, the size of the artifact under review, the calendar time to perform a review, and the number of reviews that find defects. We provide a comparison of the two Apache review techniques as well as a comparison of Apache review to inspection in an industrial project. We conclude that Apache reviews can be described as (1) early, frequent reviews (2) of small, independent, complete contributions (3) conducted asynchronously by a potentially large, but actually small, group of self-selected experts (4) leading to an efficient and effective peer review technique. @</dl>

6.6 2007

6.6.1 CAe. Flattening the viewable sphere

German, D.M., L. Burchill, A. Duret-Lutz, S. Pèrez-Duarte, E. Pèrez-Duarte, and J. Sommers, "Flattening the viewable sphere" (artistic paper), in Computational Aesthetics in Graphics, Visualization, and Imaging, 2007 (CAe 2007), (D. W. Cunningham, G. Meyer, L. Neumann, A. Dunning, and R. Paricio, Eds.), pp. 23-28. Eurographics Association, June 2007.

@<dl>@<dd> The viewable sphere corresponds to the space that surrounds us. The evolution of photography and panoramic software and hardware has made it possible for anybody to capture the viewable sphere. It is now up to the artist to determine what can be done with this raw material. In this paper we explore the underdeveloped field of flat panoramas from an artistic point of view. We argue that its future lies in the exploration of conformal mappings, specialized software, and the interaction of its practitioners via the Internet. @</dl>

6.6.2 CAe. New methods to project panoramas for practical and aesthetic purposes

German, D.M., Pablo d'Angelo, Michael Gross, and Bruno Postle, "New methods to project panoramas for practical and aesthetic purposes," in Computational Aesthetics in Graphics, Visualization, and Imaging, 2007 (CAe 2007), (D. W. Cunningham, G. Meyer, L. Neumann, A. Dunning, and R. Paricio, Eds.), pp.13-22. Eurographics Association, June 2007.

@<dl>@<dd> Recent advances in digital photomontage have simplified the creation of extreme wide-angle views from a vantage point, including the recreation of the entire sphere (we will refer to these type of images as panoramas). In order to minimize the distortion from the point of view of the viewer, panoramas have been typically presented using curved displays (such as the original panoramas, by Barker, in 1787; or several cinematographic systems, such as Circle-Vision 360, still in use), and more recently with the help of the computer (such as the QuickTime VR format). Unfortunately requiring such systems restricts their use, and little research has been done in the representation of panoramas into a flat surface. In this paper we propose the use of several geographic map projections to project a panorama into a flat surface, both for realistic purposes (where the projection can be easily accepted as a faithful representation of the original image) and for artistic purposes (where the projection is used as an artistic tool intended for the creation of an innovative interpretation of the panorama). Finally we explore the use of inclinometers and map projections to automatically project an image from a wide-angle lens (rectilinear or fisheye) into a new image that is more aesthetically pleasant.

We believe the projections discussed in this paper will be useful to photographers, artists, and the designers of virtual reality environments, all of who might require the displaying of images with a wide field-of-view. @</dl>

6.6.3 WCRE. A model to understand the building and running inter-dependencies of software

German, D.M., J.M. Gonzalez-Barahona, and G. Robles, "A model to understand the building and running inter-dependencies of software," in Proc. 14th Working Conf. on Reverse Engineering, pp. 130-139, 2007. Acceptance rate 31% (27 out of 87 technical papers)

@<dl>@<dd> The notion of functional or modular dependency is fundamental to understand the architecture and inner workings of any software system. In this paper, we propose to extend that notion to consider dependencies at a larger scale, between software applications (usually programs or libraries themselves). These dependencies, which we call inter-dependencies are of exceptional importance in free an open source software (FOSS), where it is common to build new applications by taking advantage of a rich and complex environment of programs and libraries whose functionality is available. To explore this concept, a methodology and visualization for studying inter-dependencies of a complex software system is presented and applied to one of the largest distributions of FOSS: Debian GNU/Linux. @</dl>

6.6.4 ICSM. On the prediction of the evolution of libre software projects

Herraiz, I., M.Gonzalez-Barahona, G. Robles, and D.M. German, "On the prediction of the evolution of libre software projects," in 23rd IEEE Int. Conf. on Software Maintenance (ICSM'07), 2007. Acceptance rate 21% (46 of 214 full papers)

@<dl>@<dd> Libre (free / open source) software development is a complex phenomenon. Many actors (core developers, casual contributors, bug reporters, patch submitters, users, etc.), in many cases volunteers, interact in complex patterns without the constrains of formal hierarchical structures or organizational ties. Understanding this complex behavior with enough detail to build explanatory models suitable for prediction is an open challenge, and few results have been published to date in this area. Therefore statistical, non-explanatory models (such as the traditional regression model) have a clear role, and have been used in some evolution studies. Our proposal goes in this direction, but using a model that we have found more useful: time series analysis. Data available from the source code management repository is used to compute the size of the software over its past life, using this information to estimate the future evolution of the project. In this paper we present this methodology and apply it to three large projects, showing how in these cases predictions are more accurate than regression models, and precise enough to estimate with little error their near future evolutions. @</dl>

6.6.5 WCRE. Visualizing software architecture evolution using change-sets

McNair, A., D.M. German, and J. Weber-Jahnke, "Visualizing software architecture evolution using change-sets," in Proc. 14th Working Conf. on Reverse Engineering, pp.140-149, 2007. Acceptance rate 31% (27 out of 87 technical papers)

@<dl>@<dd> When trying to understand the evolution of a software system it can be useful to visualize the evolution of the system's architecture. Existing tools for viewing architectural evolution assume that what a user is interested in can be described in an unbroken sequence of time, for example the changes over the last six months. We present an alternative approach that provides a lightweight method for examining the net effect of any set of changes on a system's architecture. We also present Motive, a prototype tool that implements this approach, and demonstrate how it can be used to answer questions about software evolution by describing case studies we conducted on two Java systems. @</dl>

6.7 2006

6.7.1 BioFOSS: a survey of free/open source software in Bioinformatics

Shabawa, K., and D.M. German, "BioFOSS: a survey of free/open source software in Bioinformatics", 2006 IEEE Symp. on Computer Based Medical Systems, pp. 861-866, June 2006.

@<dl>@<dd> This paper discusses the current state of free/open source software (F/OSS) projects in the field of academic bioinformatics. The paper reports on a survey of the bioinformatics journal that enumerates the number of Application Notes published between volumes 2004-20-17 and 2005-21-7. The purpose of this survey is to determine what percentage of bioinformatics applications are made available under open source licenses. Bioinformatics includes tools, databases, and organizations to support them. An overview is given for the EMBOSS project, the Open Bioinformatics Foundation, and GenBank. In addition, a short discussion of Linux distributions tailored to the needs of bioinformaticians is provided @</dl>

6.8 2005

6.8.1 Metrics. Measuring fine-grained change: towards modification aware change metrics

German, D.M., and A. Hindle, "Measuring fine-grained change: towards modification aware change metrics," in Metrics ’05: 11th IEEE Int. Metrics Symp., p. 28 (10 pages), Sept. 2005.

@<dl>@<dd> In this paper we propose the notion of change metrics, those that measure change in a project or its entities. In particular we are interested in measuring fine-grained changes, such as those stored by version control systems (such as CVS). A framework for the classification of change metrics is provided. We discuss the idea of change metrics which are modification aware, that is metrics which evaluate the change itself and not just the change in a measurement of the system \emph{before} and \emph{after} the change. We then provide examples of the use of these metrics on two mature projects. @</dl>

6.8.2 ICWE. A system of patterns for web navigation

Akanda, M.A.K., D.M. German, "A system of patterns for web navigation," in Web Engineering: 5th Int. Conf. ICWE 2005, pp 136-141, July. 2005.

@<dl>@<dd> In this paper we propose a system of design patterns for Web navigation. We have collected patterns already published in the literature, selected ten of them, refined them and identified the relationships among them. The selected patterns are rewritten in the Gang of Four (GoF) notation. They are implemented and integrated together leading to a framework intended to be used as the central part in developing data intensive Web applications. @</dl>

6.8.3 TODO Gild: an open-source environment to teach programming to novices

Rigby, P.C., D. Cubranic, S. Thompson, D.M. German, and M.-A. Storey, "Gild: an open-source environment to teach programming to novices," in Open Educational Symp. of 1st Int. Conf. on Open Source Systems, pp. 338-340, July 2005.

6.8.4 Experiences teaching a graduate course in open source software engineering

German, D.M., "Experiences teaching a graduate course in open source software engineering," in Open Educational Symp. of 1st Int. Conf. on Open Source Systems, pp. 326-329, July 2005.

6.8.5 SoftVis. On the use of visualization to support awareness of human activities in software development: a survey and a framework

Storey, M.-A., S. Cubranic, and D.M. German, "On the use of visualization to support awareness of human activities in software development: a survey and a framework," in Proc. of the 2nd ACM Symp. on Software Visualization, pp. 193-202, May 2005.

6.9 2004

6.9.1 SIGDOC. Intellectual property aspects of web publishing

Kienle, H.M., D.M. German, S. Tilley, and H.A. Müller, "Intellectual property aspects of web publishing," in SIGDOC ’04: Proc. of the 22 Annual Int. Conf. on design of Communication, ACM Press, pp. 136-144,

6.9.2 SEKE. Visualizing the evolution of software using softChange

German, D.M., A. Hindle, and N. Jordan, " Visualizing the evolution of software using softChange," in Proc. SEKE 2004, 16th Int. Conf. on eSoftware Engineering and Knowledge Engineering, Knowledge Systems Institute, 3420 Main St., Skokie IL 60076, USA, pp. 336-341, June

6.9.3 ICSM. An empirical study of fine-grained software modifications

German, D.M., "An empirical study of fine-grained software modifications," in 20th IEEE Int. Conf. on Software Maintenance (ICSM’04), Sept. 2004.

@<dl>@<dd> Software is typically improved and modified in small increments. These changes are usually stored in a configuration management or version control system and can be retrieved. In this paper we retrieved each individual modification made to a mature software project and proceeded to analyze them. We studied the characteristics of these Modification Requests (MRs), the interrelationships of the files that compose them, and their authors. We propose several metrics to quantify MRs, and use these metrics to create visualization graphs that can be used to understand the interrelationships. @</dl>

6.10 2003

6.10.1 ICWE Partitioning the navigational model: a component-driven approach

Kerr, S., and D.M. German, "Partitioning the navigational model: a component-driven approach," in Proc. of the Int. Conf. on web Engineering, July 2003.

@<hr>

7 Workshops

7.1 2011

7.1.1 Web2SE Towards Understanding Twitter Use in Software Engineering: Preliminary Findings, Ongoing Challenges and Future Questions

Gargi Bougie, Jamie Starke, Margaret-Anne Storey and Daniel M. German. Towards Understanding Twitter Use in Software Engineering: Preliminary Findings, Ongoing Challenges and Future Questions. Second International Workshop on Web 2.0 for Software Engineering (Web2SE 2011)

7.2 2010

7.2.1 Lawful Software Engineering

D. German, M. Di Penta, and J. Weber-Jahnke. Lawful Software Engineering, 2010 FSE/SDP Workshop on the Future of Software Engineering Research, to appear, 2010.

@<dl>@<dd> Legislation is constantly affecting the way in which software developers can create software systems, and deliver them to their users. This raises the need for methods and tools that support developers in the creation and re-distribution of software systems with the ability of properly coping with legal constraints. We conjecture that legal constraints are another dimension software analysts, architects and developers have to consider, making them an important area of future research in software engineering. @</dl>

7.2.2 RESER. Beyond Replication: An example of the potential benefits of replicability in the Mining Software Repositories Community

G. Robles and D. M. German, Beyond Replication: An example of the potential benefits of replicability in the Mining Software Repositories Community by G. Robles and D. M. German, 1st International Workshop on Replication in Empirical Software Engineering Research (RESER), 2010.

@<dl>@<dd> While in theory the mining software repositories is an area where replication is easier to perform than for other empirical software engineering fields, a review of papers presented at the MSR workshop/working conference shows that the research studies presented do not satisfy the requirements for an easy replication. In this paper, we present some possibilities that replicability may provide to this community that go beyond the verification of results presented in the original study. @</dl>

7.3 2007

7.3.1 PCODA. Working with 'monster' traces: building scalable, usable sequence viewer

Bennett, C., D. Myers, M.-A. Storey, and D. German, "Working with 'monster' traces: building scalable, usable sequence viewer," in 3rd Int. Workshop on Program Comprehension through Dynamic Analysis (PCODA'07), 2007.

7.3.2 TOSS. In what do you trust when you trust? the importance of dependencies in trust analysis

German, D.M., J.M. Gonzalez-Barahona, and G. Robles, "In what do you trust when you trust? the importance of dependencies in trust analysis," Proc. of the 1st Int. Workshop on Trust, in Open Source Software (TOSS), 2007.

7.3.3 MSR. Using software distributions to understand the relationship among free and open source software projects

German, D.M., "Using software distributions to understand the relationship among free and open source software projects", in 4th Int. Workshop on Mining Software Repositories (MSR 2007), May 2007. Acceptance rate 38% (15 of 39 full papers).

@<dl>@<dd> Success in the open source software world has been measured in terms of metrics such as number of downloads, number of commits, number of lines of code, number of participants, etc. These metrics tend to discriminate towards applications that are small and tend to evolve slowly. A problem is, however, how to identify applications in these latter categories that are important. Software distributions specify the dependencies needed to build and to run a given software application. We use this information to create a dependency graph of the applications contained in such a distribution. We explore the characteristics of this graph, and use it to define some metrics to quantify the dependencies (and dependents) of a given software application. We demonstrate that some applications that are invisible to the final user (such as libraries) are widely used by end-user applications. This graph can be used as a proxy to measure success of small, slowly evolving free and open source software. @</dl>

7.4 2006

7.4.1 The challenges of automated quantitative analysis of open source software projects

German, D.M., J. Robles, J.M. Gonzalez-Baharona, "The challenges of automated quantitative analysis of open source software projects," Workshop on Evaluation Frameworks for Open Source Software (EFOSS), Apr. 2006.

7.4.2 Using evolutionary annotations from change logs to enhance program comprehension

German, D.M., and P. Rigby, and M.-A. Storey, "Using evolutionary annotations from change logs to enhance program comprehension," 3rd Int. Workshop on Mining Software Repositories (MSR 2006), pp. 159-162, May 2006.

7.4.3 The flow of knowledge in free and open source communities

German, D.M. The flow of knowledge in free and open source communities", 2nd Int. Workshop in Supporting Knowledge Collaboration in Software Development (KCSD 2006), Sept. 2006.

7.4.4 The challenges of quantitative analysis of open source software projects

German, D.M., G. Robles, and J.M. Gonzales-Baharona, "The challenges of quantitative analysis of open source software projects," EFOSS Workshop at the 2nd Open Source Conf., June 2006.

7.5 2005

7.5.1 A formal model and a query language for source control repositories

Hindle, A., and D.M. German, "A formal model and a query language for source control repositories," in 2nd Int. Workshop on Mining Software Repositories, pp. 100-105, May 2005.

7.5.2 Towards understanding requirements management in a special case of global software development: A study of asynchronous communication in the Open Source Community

Izquierdo, L., D. Damian, and D.M. German, "Towards understanding requirements management in a special case of global software development: A study of asynchronous communication in the Open Source Community," in Proc. of Int. Workshop on Distributed Software Development, pp. 171-185, 2005.

7.5.3 A framework for describing and understanding mining tools in software development

German, D.M., D. Cubranic, and M.-A. Storey, "A framework for describing and understanding mining tools in software development," in 2nd Int. Workshop on Mining Software Repositories, pp. 95-99, May

7.6 2004

7.6.1 Legal concerns of Web site reverse engineering

Kienle, H.M., D.M. German, and H.A. Müller, "Legal concerns of web site reverse engineering," in Proc. of the Int. Workshop in Web Site Evolution, pp. 41-50, Sept. 2004.

7.6.2 Developing market support within Eclipse

Myers, D., E. Hargreaves, J. Ryall, S. Thompson, M. Burgess, D.M. German, and M.A. Storey, "Developing market support within eclipse," in Proc. of OOPSLA Workshop on Eclipse Technology eXchange, Sept. 2004.

7.6.3 Mining CVS repositories, the softChange experience

German, D.M., "Mining CVS repositories, the softChange experience," in 1st Int. Workshop on Mining Software Repositories, pp. 17-21, 2004.

7.6.4 Using CVS historical information to understanding how students develop software

Liu, Y, E. Stroulia, K. Wong, and D. German, "Using CVS historical information to understanding how students develop software," in 1st Int. Workshop in Mining Software Repositories, pp. 32-36, May 2004.

7.7 2003

7.7.1 GNOME, a case of open source global software development

German, D.M., "GNOME, a case of open source global software development," in Proc. of the Int. Workshop on Global Software Development, May 2003.

7.7.2 Improving the usability of Eclipse for novice programmers

Storey, M.-A., J. Michaud, M. Mindel, M. Sanseverino, D. Damian, D. Myers, D. German, and E. Hargreaves, "Improving the usability of Eclipse for novice programmers," in Proc. of the OOPSLA Workshop on Eclipse Technology eXchange, pp. 35-39, Oct. 2003.

7.7.3 Automating the measurement of open source projects

German, D.M., and A. Mockus, "Automating the measurement of open source projects," in Proc. of the 3rd Workshop on Open Source Software Engineering, May 2003.

7.7.4 Adopting GILD: an integrated learning and development environment for programming

Storey, M.A., M. Sanseverino, D.M. German, D.Damian, A. Damian, J. Michaud, A. Murray, R. Lintern, and J. Chisan, "Adopting GILD: an integrated learning and development environment for programming," in Proc. of the 3rd Int. Workshop on Adoption-Centric Software Engineering, May 2003.

7.7.5 ICWE. A component-oriented framework for the implementation of navigational design patterns

Akanda, M.A.K., and D.M. German, "A component-oriented framework for the implementation of navigational design patterns," in Proc. of the Int. Conf. on Web Engineering, pp. 449-450, July 2003.

7.8 2002

7.8.1 Using HadeZ to formally specify the Web museum of the National Gallery of Art

German, D.M., "Using HadeZ to formally specify the Web museum of the National Gallery of Art," in 2nd Int. Workshop on Web Oriented Software Technology, June 2002.

7.8.2 Architectural patterns for data mediation in web-centric information systems

Jahnke, J., D. German, and J. Wadsack, "Architectural patterns for data mediation in web-centric information systems," in Proc. of the ’02 ICSE Web Engineering Workshop, May 2002.

7.8.3 The evolution of the GNOME Project

German, D., "The evolution of the GNOME Project," in Proc. of the 2nd Workshop on Open Source Software Engineering, May 2002.

@<hr>

8 Short papers and other publications

8.0.1 Open Source License Compliance

8.1 2012

Daniel M. German and Massimiliano Di Penta. ICSE Technical Briefings proposal.

@<dl>@<dd> Open source license compliance (OSLC) is the process of ensuring that an organization satisfies the licensing requirements of the open source software it reuses, whether for its internal use, or as a part of a product it ships. In this paper we described the major challenges of OSLC: architectural analysis, component identification, provenance discovery, license identification, and licensing requirements analysis. Then, we describe Kenen, an approach that assists organizations in the OSLC of Java components. We show its effectiveness by analyzing the licensing compliance of one open source and one commercial application. @</dl>

8.2 2011

8.2.1 Source code licensing as an essential aspect of modern software development

Daniel M. German, and Massimiliano Di Penta

FSE Technical Briefings proposal.

8.2.2 Apples Vs. Oranges? An exploration of the challenges of comparing the source code of two software systems

Daniel M. German, and Julius Davis Apples Vs. Oranges? An exploration of the challenges of comparing the source code of two software systems International Working Conference on Mining Software Repositories (MSR 2011). MSR Challenge Report. Best Challenge Report

8.3 2010

8.3.1 A Comparative Exploration of FreeBSD Bug Lifetimes

Gargi Bougie, Christoph Treude,Daniel M. German, and Margaret-Anne Storey. A Comparative Exploration of FreeBSD Bug Lifetimes MSR Challenge, International Working Conference on Mining Software Repositories (MSR 2010). Pages 106–109.

@<dl>@<dd> In this paper, we explore the viability of mining the basic data provided in bug repositories to predict bug lifetimes. We follow the method of Lucas D. Panjer as described in his paper, Predicting Eclipse Bug Lifetimes. However, in place of Eclipse data, the FreeBSD bug repository is used. A comparative approach is taken to explore the difference in bug lifetimes and prediction accuracy between the Eclipse and FreeBSD projects. In addition, we question whether there is a more informative way of classifying bugs than is considered by current bug tracking systems. @</dl>

8.3.2 Perspectives on Bugs in the Debian Bug Tracking System

Julius Davies, Hanyu Zhang, Lucas Nussbaum and Daniel M. German. Perspectives on Bugs in the Debian Bug Tracking System MSR Challenge, International Working Conference on Mining Software Repositories (MSR 2010), Pages 86–89.

@<dl>@<dd> Bugs in Debian differ from regular software bugs. They are usually associated with packages, instead of software modules. They are caused and fixed by source package uploads instead of code commits. And Debian bugs deviate from conventional bugs in one other surprising aspect: a high bug-frequency might not indicate poor quality. @</dl>

8.4 2009

8.4.1 Code Siblings: Phenotype Evolution

German, D., Di Penta, M., Antoniol, G. and Gueheneuc Y. ``Code Siblings: Phenotype Evolution'', in 3rd International Workhsop on Software Clones, IWSC'2009.

8.5 2008

8.5.1 Towards a simplification of the bug report form in eclipse

Herraiz I,, German, D.M, Gonzalez-Barahona J and Robles G. "Towards a simplification of the bug report form in eclipse", Challenge Report, 5th International Working Conference on Mining Software Repositories (MSR 2008), May 2008. pages 145—148.

8.6 2007

8.6.1 Intellectual property for software engineers

German. D.M., Tutorial: "Intellectual property for software engineers," in Proc. 14th Working Conf. on Reverse Engineering, page 297, 2007.

8.7 2006

8.7.1 A study of the contributors of PostgreSQL

German, D.M., "A study of the contributors of PostgreSQL," in 3rd Int. Workshop on Mining Software Repositories - MSR Challenge Reports (MSR 2006), received Best Challenge Report Award, May 2006.

@<hr>

9 Unpublished

9.1 A needle in the stack: efficient clone detection for huge collections of source code

by Simone Livieri, Daniel M. German, Katsuro Inoue. @Under review@

@<dl>@<dd> One of the important uses of source code clone detection analysis is plagiarism detection, where a file is compared against a known corpus of source code to try to find potential matches. As the availability of Free and Open Source Software (FOSS) continues to increase it has become important to know if specific source code has been created from copies of FOSS software. Version 5.0.2 of Debian GNU/Linux contains approximately 323 millions SLOCs, distributed in approximately 1.45 million files. Current clone detection tools are incapable of dealing with a corpus of this size, and either can take literally months to complete a detection run, or simply crash due to lack of resources. In this paper we propose an efficient token-based method to detect clones of a source code file against a known corpus of source code that is time and space efficient. With an empirical study, we demonstrate that our method is capable of finding clones of a file in a corpus of 100,000 files source code files in few seconds. @</dl>

@<hr>

10 Papers pre 2001.

The following are papers I published during my PhD

– German, D., and D. Cowan, "Towards a unified catalog of hypermedia design patterns," in Proc. of the 33rd Hawaii Int. Conf. on System Sciences, Hawaii, U.S.A., Jan. 2000.

– German, D., and D. Cowan, "Three hypermedia design patterns," in 2nd Workshop in Hypermedia Design: Design Patterns in Hypermedia, Darmstaed, Germany, Feb. 1999.

– Fraser, B., J. Roberts, G. Pianosi, P. Alencar, D. Cowan, D. German, and L. Nova, "Dynamic Views of SGML Tagged Documents," in Proc. of the 7th Annual Int. Conf. of Computer Documentation (SIGDOC-99), 93-98, New Orleans, U.S.A., Sep. 1999.

– German, D., and D. Cowan, "Formalizing the specification of web applications," in Advances in Conceptual Modelling: ER'99 Workshops on Evolution and Change in Data Management, Reverse Engineering in Information Systems, and the World Wide Web and Conceptual Modeling (P. Chen, D. Embley, J. Kouloumdjian, and S. W. L. J. Roddick, eds.), vol. 1727 of Lecture Notes in Computer Science, pp. 281-292, Springer Verlag, (Paris, France), Nov. 1999.

– German, D., D. Cowan, and P. Alencar, "A framework for formal design of hypertext applications," in 4th Brazilian Symp. on Multimedia and Hypermedia Systems, Rio de Janeiro, Brazil, Jul. 1998.

– German, D., and D. Cowan, "A formal approach to the specification of hypermedia applications," in 1st Workshop in Hypermedia Design, Pittsburgh, U.S.A., Aug. 1998.

– German, D., and D. Cowan, "Hypermedia design patterns," in 7th Mini Euro Conf. on Decision Support Systems, Groupware, Multimedia and Electronic Commerce, (Brugge, Belgium), Apr. 1997.

– German, D., and D. Cowan, "Towards the definition of semantic hyperstructures to allow reader-defined instantiation of hypertext systems," in Proc. of the First Workshop on Flexible Hypertext, (Southampton, England), Apr. 1997.

– German, D., D. Cowan, and E. Mackie, "LivePAGE - a multimedia database system to support world-wide web development," in Proc. of the 2nd Australian Document Computing Symp., (Melbourne, Australia), Aug. 1997.

– German, D., and D. Cowan, "SGML-Lite - an SGML-based programming environment for literate programming", Proc. of the Int. Symp. on Applied Corporate Computing (ISACC) 96, Monterrey, Mexico, Oct. 1996.

– German, D., and D. Cowan, "A federated database for hypermedia for the WWW," in Cooperative Databases Applications, Proc. of Int. Symp. on Cooperative Database Systems for Advanced Applications, (Y. Kambayashi and K. Yokota, eds.), pp. 178-181, Kyoto University, World Scientific, Dec. 1996.

– German, D., and A.López-Ortiz, "A multi-cooperative push-caching scheme for WWW," Poster, 5th Int. World Wide Web Conf., (Paris, France), May 1996.

– German, D., and D. Cowan, "Experiments with the Z interchange format and SGML," in ZUM '95, the Z formal specification notation: the 9th Int. Conf. of Z Users, vol. 967 of Lectures Notes in Computer Science, pp. 224-233, Springer-Verlag, (Limerick, Ireland), Sep. 1995.

– German, D., D. Cowan, A. von Staa, and C. Lucena, "Enhancing code readability using SGML," in Proc. of the 1994 Int. Conf. in Software Maintenance, pp. 181-190, Victoria, B.C., Canada, Sep. 1994.

– German D. Morales, "An SGML based literate programming environment," in Proc. of the 1994 CAS Conf., Toronto, Nov. 1994.

My Home page

My publications (Daniel Morales German)

Table of Contents

1 Other references

1.1 My Google Scholar page.

1.2 My DBLP page.

2 Totals in the last 10 years

3 Batting Average

3.1 Journal Papers

3.2 Full Papers at Conferences

3.3 Short Papers and Tech Briefings at Conferences

3.4 Papers at Workshops

4 Currently under review

4.1 The Evolution of the R Software Ecosystem

4.2 Mining for API usage: Lessons Learned from a Case Study on the GNOME platform

4.3 Peer Review on Open Source Software Projects: Parameters, Statistical Models, and Theory

4.4 Management of Community Contributions: A Case Study on the Android and Linux Software Ecosystems

4.5 Integration of Software Projects in Large Software Distributions

4.6 What Goes into an Executable? Identifying a Binary’s Sources by Tracing Build Processes

5 Journal Articles

5.1 2012

5.1.1 EMSE: Software Bertillonage: Determining the Provenance of Software Development Artifacts

5.1.2 IEEE Software A Method for Open Source License Compliance of Java Applications

5.1.3 IEEE Software. Open Source Peer Review – Lessons and Recommendations for Closed Source

5.2 2009

5.2.1 INFSOF Change Impact Graphs: Determining the Impact of Prior Code Changes

5.2.2 Computer And Graphics. Improving scans of black and white photographs by recovering the print maker's artistic intent

5.2.3 JESE Macro-level software evolution: a case study of a large software compilation

5.3 2008

5.3.1 IJBIS Managing legal risks associated with intellectual property on the web

5.3.2 JSME A survey and evaluation of tool features for understanding reverse-engineered sequence diagrams

5.4 2006

5.4.1 JESE An empirical study of fine-grained software modifications

5.4.2 SEKE Visualizing the evolution of software using softChange

5.5 2004

5.5.1 JSME Using software trails to reconstruct the evolution of software

5.5.2 JSPIP Decentralized open source global software development, the GNOME experience

5.5.3 TODO abstract and reference to paper

6 Reviewed Conference Publications

6.1 2012

6.1.1 FASE: Cohesive and Isolated Development with Branches

6.1.2 ICSE Educational Track: Five Days of Empirical Software Engineering: the PASED Experience

6.2 2011

6.2.1 ISVC. A Comparative Evaluation of Feature Detectors on Historic Repeat Photography

6.2.2 CaE. gamutHeatMap: Visualizing the Colour Shift of Rendering Intent Transformations

6.2.3 ICSOFT. On The Distribution Of Program Sizes

6.2.4 MSR: Software Bertillonage: Finding the provenance of an entity

6.3 2010

6.3.1 ICSE. Tracking the Evolution of Software Licensing: An Empirical Study

6.3.2 MSR. Identifying Licensing of Jar Archives using a Code-Search Approach

6.3.3 CAe. Pannini: A New Projection for Rendering Wide Angle Perspective Images

6.3.4 ICPC. Understanding and Auditing the Licensing of Open Source Software Distributions

6.3.5 ASE. A sentence-matching method for automatic license identification of source code files

6.4 2009

6.4.1 WCRE. Who are Source Code Contributors and How do they Change?

6.4.2 MSR. The Promises and Perils of Mining Git

6.4.3 MSR. Code siblings: Technical and Legal Implications

6.4.4 ICPC. Automatic Classification of Large Changes into Maintenance Categories

6.4.5 ICSE. License Integration Patterns: Dealing with Licenses Mismatches in Component-Based Development

6.4.6 OSS. An empirical study of the reuse of software licensed under the GNU General Public License

6.5 2008

6.5.1 MSR. What do large commits tell us? A taxonomical study of large commits

6.5.2 CAe. Improving scans of black and white photographs by recovering the print maker's artistic intent

6.5.3 SCAM. Change Impact Graphs: Determining the Impact of Prior Code Changes

6.5.4 ICSM. Remixing visualization to support collaboration in software maintenance

6.5.5 ICSM. The past, present, and future of software evolution

6.5.6 ICSE. Open source software peer review practices: a case study of the Apache server

6.6 2007

6.6.1 CAe. Flattening the viewable sphere

6.6.2 CAe. New methods to project panoramas for practical and aesthetic purposes

6.6.3 WCRE. A model to understand the building and running inter-dependencies of software

6.6.4 ICSM. On the prediction of the evolution of libre software projects

6.6.5 WCRE. Visualizing software architecture evolution using change-sets

6.7 2006

6.7.1 BioFOSS: a survey of free/open source software in Bioinformatics

6.8 2005

6.8.1 Metrics. Measuring fine-grained change: towards modification aware change metrics

6.8.2 ICWE. A system of patterns for web navigation

6.8.3 TODO Gild: an open-source environment to teach programming to novices

6.8.4 Experiences teaching a graduate course in open source software engineering

6.8.5 SoftVis. On the use of visualization to support awareness of human activities in software development: a survey and a framework