Open Source Software (OSS) development projects are online communities that involve voluntary and heterogeneous software developers who collaborate and exchange knowledge as they develop and enhance software products. OSS communities represent a fluid form of organizing, where the constituent members, the interactions among the members, and their outputs are constantly changing over time. In a way, the fluidity of the open source communities provides the social basis of the generativity that we often observe from such communities, that produces surprising and unpredictable innovations. While these new forms of fluid organizing represent the dynamic nature of OSS communities, they do maintain certain stability. It retains distinguishable identity, membership, and norms. How can we understand the presence of dynamic and stable structures of open source communities at the same time? This is the question that we seek to answer in this research.
Prior studies on OSS also put much emphasis on the development and coordination among developers through the structuralism’s view. Those studies discuss the coordination mechanisms that support knowledge exchange among members that are dispersed in location and time. The literature also studies the principles of self-governance in OSS communities, such as authority, and roles and rules. However, limited scholarly endeavors in this regard have been extended to more fluid organizing settings (e.g. OSS communities). Faraj et al. conceptualizes online communities as “fluid organizational objects that are simultaneously morphing and yet retaining a recognizable shape” and highlights the importance to look at the dynamic changes in such fluid organizational objects. The ever-changing nature of OSS communities is embodied in the continuous changes in participated developers (difference in actors, or change in the attributes of the same developers) and the continuous evolutions of artifacts (software). At every moment, this flux impacts interactions among the members, boundaries and norms of the communities, and foci and goals of the OSS projects.
This paucity is probably due to the difficulty to capture the spirit of stability of routines while accounting the very nature of fluidity of those settings with the concepts and methods designed for relatively stationary organizational routines. In the organizational science literature, organizational routine has been extensively studied in stable organizational settings as it provides a way to understand organizational design, management and change, and to predict the outcomes. In this paper, we propose a routine-based approach as a way to approach such a paradox. In particular, we draw on sequence analysis techniques together with clustering analysis techniques to identify a set of routines that are enacted in open source communities. An important theoretical and methodological novelty that we present in this work is a way to identify variations of routines and their clusters as a way of resolving the tension between the stability and fluidity that co-present in these communities.
Specifically, our approach involves three steps. In the first step, for each OSS project, we create a sequence of actions carried out by developers. There are 12 types of actions included in our study, such as pushing a commit, opening an issue or posting a comment. Then in the second step, we use a sub-sequence mining technique to detect frequently repeated sub-sequences from the set of sequences we created in the first step. This technique allows us to recognize all of the identical pieces of sequence that are co-possessed by a given percentage of the set of sequences of actions. Therefore, each of the sub-sequences detected is conceptualized as an event, or a routine enactment, and each instance of that sub-sequence in a sequence of actions is an occurrence of that event. In the last step, we take advantage of a clustering analysis to group those sub-sequences into several routine families. The technique we apply is a hierarchical agglomerative clustering method in the sense that it compares the “distance” within each pair of the sub-sequences, and combines the two sub-sequences with the least distance into a branch, which are themselves fused in a future step. In the end, we are able to identify routine families and trace different variants of each family through a dendrogram produced by the clustering analysis.
To conduct a preliminary analysis, we collect the digital trace data for 200 JavaScript projects on Github.com. Altogether, we extract 89,884 rows of the digital trace data with relevant action types in the period between Feb 2012 and Dec 2014. These raw data are converted into 200 sequences, from which we are able to identify 80 frequently repeated sub-sequences. We cluster the 80 sub-sequences and identify four classes of routines, each of which contains a numbers of routine variants. We find the four classes of routines are likely to be related to the degree of complexity and centralized control of the project. Although our results need further examination, the preliminary study may not be able to unfold the entire potential of this work.
By August 2016, we plan to expand our dataset to 2,000 OSS projects covering top 10 most popular programming languages. With the expanded dataset, we would like to first confirm our preliminary results at the larger dataset and then test other project characteristics that may moderate the routine stability.