- Learning state importance for preference-based reinforcement learningGuoxi Zhang and Hisashi KashimaMachine Learning, 2023
Preference-based reinforcement learning (PbRL) develops agents using human preferences. Due to its empirical success, it has prospect of benefiting human-centered applications. Meanwhile, previous work on PbRL overlooks interpretability, which is an indispensable element of ethical artificial intelligence (AI). While prior art for explainable AI offers some machinery, there lacks an approach to select samples to construct explanations. This becomes an issue for PbRL, as transitions relevant to task solving are often outnumbered by irrelevant ones. Thus, ad-hoc sample selection undermines the credibility of explanations. The present study proposes a framework for learning reward functions and state importance from preferences simultaneously. It offers a systematic approach for selecting samples when constructing explanations. Moreover, the present study proposes a perturbation analysis to evaluate the learned state importance quantitatively. Through experiments on discrete and continuous control tasks, the present study demonstrates the proposed framework’s efficacy for providing interpretability without sacrificing task performance.
- Machine Learning in Materials Chemistry: An InvitationDaniel Packwood, Linh Thi Hoai Nguyen, Pierluigi Cesana, Guoxi Zhang, Aleksandar Staykov, Yasuhide Fukumoto, and Dinh Hoa NguyenMachine Learning with Applications, 2022
Materials chemistry is being profoundly influenced by the uptake of machine learning methodologies. Machine learning techniques, in combination with established techniques from computational physics, promise to accelerate the discovery of new materials by elucidating complex structure–property relationships from massive material databases. Despite exciting possibilities, further methodological developments call for a greater synergism between materials chemists, physicists, and engineers on one side, with computer science and math majors on the other. In this review, we provide a non-exhaustive account of machine learning in materials chemistry for computer scientists and applied mathematicians, with an emphasis on molecule datasets and materials chemistry problems. The first part of this review provides a tutorial on how to prepare such datasets for subsequent model building, with an emphasis on the construction of feature vectors. We also provide a self-contained introduction to density functional theory, a method from computational physics which is widely used to generate datasets and compute response variables. The second part reviews two machine learning methodologies which represent the status quo in materials chemistry at present – kernelized machine learning and Bayesian machine learning – and discusses their application to real datasets. In the third part of the review, we introduce some emerging machine learning techniques which have not been widely adopted by materials scientists and therefore present potential avenues for computer science and applied math majors. In the final concluding section, we discuss some recent machine learning-based approaches to real materials discovery problems and speculate on some promising future directions.
- (To Appear) Estimating Treatment Effects Under Heterogeneous InterferenceXiaofeng Lin, Guoxi Zhang, Xiaotian Lu, Han Bao, Koh Takeuchi, and Hisashi KashimaIn Machine Learning and Knowledge Discovery in Databases, 2024
Treatment effect estimation can assist in effective decision-making in e-commerce, medicine, and education. One popular application of this estimation lies in the prediction of the impact of a treatment (e.g., a promotion) on an outcome (e.g., sales) of a particular unit (e.g., an item), known as the individual treatment effect (ITE). In many online applications, the outcome of a unit can be affected by the treatments of other units, as units are often associated, which is referred to as interference. For example, on an online shopping website, sales of an item will be influenced by an advertisement of its co-purchased item. Prior studies have attempted to model interference to estimate ITE accurately, but they often assume a homogeneous interference, i.e., relationships between units only have a single view. However, in real-world applications, interference may be heterogeneous, with multi-view relationships. For instance, the sale of an item is usually affected by the treatment of its co-purchased and co-viewed items. If this heterogeneous interference is not properly modelled, ITE estimation will be inaccurate. Therefore, we propose a novel approach to model heterogeneous interference by developing a new architecture to aggregate information from diverse neighbors. Our proposed method contains a graph neural network that aggregates same-view information, a mechanism that aggregates information from different views, and attention mechanisms. In our experiments on multiple datasets with heterogeneous interference, the proposed method significantly outperformed existing methods for ITE estimation, confirming the importance of modeling heterogeneous interference.
- Batch Reinforcement Learning from CrowdsGuoxi Zhang and Hisashi KashimaIn Machine Learning and Knowledge Discovery in Databases, 2023
A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for the lack of reward, such as behavioral cloning, rely on optimal demonstrations collected from humans. Unfortunately, extensive expertise is required for ensuring optimality, which hinder the acquisition of large-scale data for complex tasks. This paper addresses the lack of reward by learning a reward function from preferences between trajectories. Generating preferences only requires a basic understanding of a task, and it is faster than performing demonstrations. Thus, preferences can be collected at scale from non-expert humans using crowdsourcing. This paper tackles a critical challenge that emerged when collecting data from non-expert humans: the noise in preferences. A novel probabilistic model is proposed for modelling the reliability of labels, which utilizes labels collaboratively. Moreover, the proposed model smooths the estimation with a learned reward function. Evaluation on Atari datasets demonstrates the effectiveness of the proposed model, followed by an ablation study to analyze the relative importance of the proposed ideas.
- On Modeling Long-Term User Engagement from Stochastic FeedbackGuoxi Zhang, Xing Yao, and Xuanji XiaoIn Companion Proceedings of the ACM Web Conference 2023, 2023
An ultimate goal of recommender systems (RS) is to improve user engagement. Reinforcement learning (RL) is a promising paradigm for this goal, as it directly optimizes overall performance of sequential recommendation. However, many existing RL-based approaches induce huge computational overhead, because they require not only the recommended items but also all other candidate items to be stored. This paper proposes an efficient alternative that does not require the candidate items. The idea is to model the correlation between user engagement and items directly from data. Moreover, the proposed approach consider randomness in user feedback and termination behavior, which are ubiquitous for RS but rarely discussed in RL-based prior work. With online A/B experiments on real-world RS, we confirm the efficacy of the proposed approach and the importance of modeling the two types of randomness.
- Behavior Estimation from Multi-Source Data for Offline Reinforcement LearningGuoxi Zhang and Hisashi KashimaIn Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that lays the foundation of many offline RL algorithms. Behavior estimation aims at estimating the policy with which training data are generated. In particular, this work considers a scenario where the data are collected from multiple sources. In this case, neglecting data heterogeneity, existing approaches for behavior estimation suffers from behavior misspecification. To overcome this drawback, the present study proposes a latent variable model to infer a set of policies from data, which allows an agent to use as behavior policy the policy that best describes a particular trajectory. This model provides with a agent fine-grained characterization for multi-source data and helps it overcome behavior misspecification. This work also proposes a learning algorithm for this model and illustrates its practical usage via extending an existing offline RL algorithm. Lastly, with extensive evaluation this work confirms the existence of behavior misspecification and the efficacy of the proposed model.
- Improving Pairwise Rank Aggregation via Querying for Rank DifferenceGuoxi Zhang, Jiyi Li, and Hisashi KashimaIn Proceedings of the Ninth IEEE International Conference on Data Science and Advanced Analytics, 2022
Pairwise rank aggregation (PRA) aims at learning a ranking from pairwise comparisons between objects that specify relative ordering of objects. The present study proposes the use of rank difference information for PRA, which characterizes the extent winners in paired comparisons beat their opponents. While such information can be effortlessly recognized by annotators, to our knowledge, it has not been utilized for PRA before. The challenge is three-fold: how to solicit such information, how to utilize it in rank aggregation, and how to overcome the noise from heterogeneous annotators. The present study proposes a new query for soliciting information about rank difference from annotators with limited cognitive burden. As prior methods for PRA abounds, an objective is to empower them with information on rank difference. To this end, the present study proposes a conservative learning objective that can be combined with many existing PRA algorithms in a straightforward manner. The third contribution is a new method for PRA called mixture of exponentials (MoE). Annotators from a heterogeneous population might have diverse views concerning rank difference. An annotator might be good at recognizing rank difference only for a subset of items but not the rest. This means that information about rank difference is likely to be perturbed. Unfortunately, such an object-dependent error pattern cannot be modeled with existing approaches. MoE assumes that each annotator uses a mixture of ranking functions in generating answers. The mixture components can capture object-related patterns in data. The present study evaluates the proposals with extensive experiments on both real and synthetic datasets. The results confirm the efficacy of the proposals and shed light on their practical usage.
- On Reducing Dimensionality of Labeled Data EfficientlyGuoxi Zhang, Tomoharu Iwata, and Hisashi KashimaIn Advances in Knowledge Discovery and Data Mining, 2018
We address the problem of reducing dimensionality for labeled data. Our objective is to achieve better class separation in latent space. Existing nonlinear algorithms rely on pairwise distances between data samples, which are generally infeasible to compute or store in the large data limit. In this paper, we propose a parametric nonlinear algorithm that employs a spherical mixture model in the latent space. The proposed algorithm attains grand efficiency in reducing data dimensionality, because it only requires distances between data points and cluster centers. In our experiments, the proposed algorithm achieves up to 44 times better efficiency while maintaining similar efficacy. In practice, it can be used to speedup k-NN classification or visualize data points with their class structure.
- Robust Multi-view Topic Modeling by Incorporating Detecting AnomaliesGuoxi Zhang, Tomoharu Iwata, and Hisashi KashimaIn Machine Learning and Knowledge Discovery in Databases, 2017
Multi-view text data consist of texts from different sources. For instance, multilingual Wikipedia corpora contain articles in different languages which are created by different group of users. Because multi-view text data are often created in distributed fashion, information from different sources may not be consistent. Such inconsistency introduce noise to analysis of such kind of data. In this paper, we propose a probabilistic topic model for multi-view data, which is robust against noise. The proposed model can also be used for detecting anomalies. In our experiments on Wikipedia data sets, the proposed model is more robust than existing multi-view topic models in terms of held-out perplexity.