A challenge of Industrial Symbiosis and waste valorization involves identifying novel uses of waste streams which can satisfy the demand for feedstocks needed by other industries. For these efforts, a variety of characteristics must often be considered (calorific values, lipid content, odor, tensile strength, contaminants, etc).
Large amounts of knowledge have been - and continue to accumulate in resources such as academic journals and patent databases. However, compiling information on a broad range of material properties, along with technologies and their specific requirements, involves a very significant manual effort for industrial symbiosis practitioners.
As a means to help ameliorate this situation, we demonstrate an automated system using machine learning that, given a collection of 500k patent abstracts related to waste valorization, is able to assist with the process of identifying which waste streams may A) contain relevant chemical compounds, B) have certain physical characteristics, and C) are closely associated with particular technologies, applications and products (TAPs) which could potentially use these waste streams as feedstocks. Instead of aiming to measure properties and their values directly, we use word correlations supported by large amounts of literature as a proxy to reflect “common knowledge”. In other words, if a waste stream is frequently mentioned in a sentence with the word “lipid” or “hazardous”, then this provides a useful indicator that it may have a high lipid content or be hazardous in nature.
Core to this approach is the use of Word Vectors, which have emerged as a promising tool within the field of Natural Language Processing. The process employs a neural network, which given one word, aims to predict which words are likely to appear next within a sentence. Words are represented as high dimensional vectors which, through the process of machine learning, encode latent features related to the other words that often appear around it. An advantage of this approach is that it learns relations directly from the text and needs minimal outside intervention.
Using the output of this analysis, one can search for a particular term (such as a waste stream) and retrieve a list of the most similar words ranked by the vector distances, which are then filtered to only keep terms of interest (i.e. chemicals or desired properties such as ``combustible'', ``high compressive strength'', ``filler material'' etc.).
As new research literature becomes available, it can be integrated easily by rerunning the analysis over the entire collection of text. The drawback of this approach is that one must still disambiguate the relation between words. As such, this method should be augmented with techniques that can relocate the exact source literature analysed mentioning feedstocks and their corresponding features and properties. While this approach is not meant to replace existing databases, it can play an important role in augmenting them, suggesting relations, and identifying gaps in coverage.
• Industrial symbiosis and eco-industrial development , • Open source data, big data, data mining and industrial ecology , • Circular economy