Advances in industrial ecology often require a solid foundation of good data. In addition to specific data collection, work often also relies on more general data sources, and several specialized data pools have been developed... [ view full abstract ]
Advances in industrial ecology often require a solid foundation of good data. In addition to specific data collection, work often also relies on more general data sources, and several specialized data pools have been developed for use in industrial ecology applications. As these are developed with somewhat different primary purposes, there are sometimes significant barriers to using multiple sources in one study. As data sources become larger, these barriers can be insurmountable. Manual adaptations, which can be feasible for combining smaller data sources, are generally not feasible for researchers and practitioners aiming to use multiple large data sources or databases in parallel. Some efforts exist, for example in the field of hybridizing LCA between the input-output-based and the process-based approaches, but more work is needed to facilitate such efforts.
The ecoinvent database, a large database primarily for process-based LCA but also used in other areas of industrial ecology, has recently been combined with other data sources in several projects. The presentation will highlight the potential obstacles to combining big data sources as well as the solutions chosen. In one project, the objective was to combine process-based environmental data with sector-based social data, and modifications to the database were necessary, e.g. on classification and economic data on the process. These are now available with the data in general, and the possibilities for further matching to other data sources resulting from these changes will be explored in the talk. In particular, these additions make matching to input-output-based data sources easier. In 2 other projects, a partial but extensive and consistent replacement of the supply chain data was required. Re-linking thousands of datasets hundreds of times requires automation, and the right procedure for this is critical to avoid problems or inconsistencies. Both examples and their solutions will be showcased.
While some solutions in the 3 example projects required significant development work, others are quickly implemented with the right tools. ecoinvent has participated in a project to publish its linking algorithm as open source software tool to facilitate users of large databases in attempts to modify and combine its data with other data sources. As the tool can be useful for practitioners it will be briefly introduced.
Handling and combining multiple databases, especially large ones not developed for the same purposes, can be challenging. The presentation will showcase new developments of both data and data handling methods to facilitate the process, for both database controllers and the general user base of such databases. These developments can facilitate, or allow for the first time, new work in various areas of industrial ecology, as they can make existing knowledge available in a more applicable form.
• Open source data, big data, data mining and industrial ecology