Research
Cooperated with Meichun LIU and Yuyan Liang
Presented on Chinese Lexical Semantic Workshop (CLSW) in 2022 and will be embodied in conference proceedings by Springer.
Abstract: This project uses automatic tools to identify specific cases of flexible use of adjectives and nouns in Chinese social media and analyzes this newly arising lexical semantic change with usage-based approach. The analysis aims to answer the research questions about characteristics and the motivation of the alternations.
"类案类判"中人工智能与人类智能之限度与效用
Cooperated with Meng Zhang
Awarded the third prize in '东方法学新锐奖论文大赛' and will be published by CNKI in collected papers
Abstract: For historical reasons, the legal system of Hong Kong mainly adopts the common law, which usually lists the cases with similar facts and legal basis to summarize the general legal rules for judgements. At the same time, the legal system of Mainland China also uses guiding cases from the Supreme People’s Court as supplement of statutory law. In both Hong Kong and Mainland China, the analysis and usage of precedents are crucial for judicial system. Previous retrieval systems mainly use similarity of document embeddings or keywords matching to find similar cases. However, the retrieved content may not be enough accurate for users. This section selects criminal guiding cases as samples to build a hybrid model for automatic recognition of focus of dispute, providing more accurate evidence for similar cases retrieval.
Material Prediction using word embeddings
Paper writing
Abstract: This project aims to find materials suitable for both 'passivating' method (coating a material so it becomes passive) and 'contact' (a kind of structure in solar cell) based on different kinds of word embedding, like Word2Vec and BERT.
Opinion Mining for Maximizing Discoverability of Energy Materials by Deep Learning Methods
Paper writing
Code: TextMaster
Abstract: This project adapted the opinion mining technique to the body content of energy material-related publications and proposed an automatic system composed of four modules: (i) text preparation, (ii) opinion extraction, (iii) opinion classification, and (iv) opinion mining for information analysis. In the (i) module, a mixed-topic opinion dataset was generated by pseudo labelling and proofreading. In both (ii) and (iii), deep learning models were trained on this dataset and achieved high performance. The (iv) module analyzed the mined opinions to provide interdisciplinary insights with detailed references from various aspects, such as manufacturing performance and natural properties. Furthermore, opinions towards thermoelectric materials were used to improve the rank correlation of experimental results with predicted ranking for maximum power factor from 59% to 69%, outperforming the existing embedding-based method. This work introduces a novel perspective on the assimilation and integration of knowledge performed by opinion mining, which allows researchers to become informed about the key results published in the literature, making decisions on the most promising path forward.
An opinion mining method of Chinese and English e-commercial comments based on dependency parsing and rules
Patent under application (Application Announcement Date: 2022.03.04)
Abstract: This invention relates to the technical field of text mining, and discloses a Chinese and English comment opinion mining method combining dependency parsing and rules. The method considers the syntax tree information and the emotional polarity of vocabulary at the same time, summarizes comments in a multi-dimensional manner such as grammar and part of speech, and defines a plurality of opinion extraction rules, so as to effectively identify the opinion-related vocabulary. This method not only reduces the cost of manually labeling data, but also improves the problem of low screening accuracy of candidate opinions to a certain extent. For each opinion and the clause where the opinion is located, the method makes emotional judgment at the level of phrase and single sentence, captures the emotional tendency of consumers in different attribute dimensions of the same commodity, and filters the opinions according to the emotional score to screen out the opinions which have low emotional polarity. The final fine-grained opinions reflect consumers' overall emotional tendency towards the commodity, which can help businesses make decisions in time.