Xiaojun Chen

Associate Professor of Computer Science and Software Engineering
Shenzhen University

Research Affiliate
Big Data Institute

Home

Research

CV

Workshops

Social

Zhiteng Building, Canghai Campus of Shenzhen University, Nanshan District, Shenzhen, Guangdong Province, China 518060

Contact:
College of Computer Science and Software Engineering
Shenzhen University
Room 614, Zhiteng Building, Canghai Campus of Shenzhen University
Shenzhen, Guangdong Province, China 518060

Hosted on GitHub Pages — Theme by orderedlist

2023

Semantic-Enhanced Image Clustering
AAAI 2023: 37(6), 6869-6878.

Image clustering is an important and open challenging task in computer vision. Although many methods have been proposed to solve the image clustering task, they only explore images and uncover clusters according to the image features, thus being unable to distinguish visually similar but semantically different images. In this paper, we propose to investigate the task of image clustering with the help of visual-language pre-training model. Different from the zero-shot setting, in which the class names are known, we only know the number of clusters in this setting. Therefore, how to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems. To solve the above problems, we propose a novel image clustering method guided by the visual-language pre-training model CLIP, named Semantic-Enhanced Image Clustering (SIC). In this new method, we propose a method to map the given images to a proper semantic space first and efficient methods to generate pseudo-labels according to the relationships between images and semantics. Finally, we propose to perform clustering with consistency learning in both image space and semantic space, in a self-supervised learning fashion. The theoretical result of convergence analysis shows that our proposed method can converge at a sublinear speed. Theoretical analysis of expectation risk also shows that we can reduce the expectation risk by improving neighborhood consistency, increasing prediction confidence, or reducing neighborhood imbalance. Experimental results on five benchmark datasets clearly show the superiority of our new method.

 
@inproceedings{cai2023semantic,
author       = {Shaotian Cai and
          Liping Qiu and
          Xiaojun Chen and
          Qin Zhang and
          Longteng Chen},
title        = {Semantic-Enhanced Image Clustering},
booktitle    = {Proceedings of the AAAI Conference on Artificial Intelligence 2023},
pages        = {6869--6878},
year         = {2023}
}
           

2022

Deep Unsupervised Hashing with Latent Semantic Components
AAAI 2022: 7488-7496

Deep unsupervised hashing has been appreciated in the regime of image retrieval. However, most prior arts failed to detect the semantic components and their relationships behind the images, which makes them lack discriminative power. To make up the defect, we propose a novel Deep Semantic Components Hashing (DSCH), which involves a common sense that an image normally contains a bunch of semantic components with homology and co-occurrence relationships. Based on this prior, DSCH regards the semantic components as latent variables under the Expectation-Maximization framework and designs a two-step iterative algorithm with the objective of maximum likelihood of training data. Firstly, DSCH constructs a semantic component structure by uncovering the fine-grained semantics components of images with a Gaussian Mixture Modal~(GMM), where an image is represented as a mixture of multiple components, and the semantics co-occurrence are exploited. Besides, coarse-grained semantics components, are discovered by considering the homology relationships between fine-grained components, and the hierarchy organization is then constructed. Secondly, DSCH makes the images close to their semantic component centers at both fine-grained and coarse-grained levels, and also makes the images share similar semantic components close to each other. Extensive experiments on three benchmark datasets demonstrate that the proposed hierarchical semantic components indeed facilitate the hashing model to achieve superior performance.

 
@inproceedings{lin2022deep,
author       = {Qinghong Lin and
          Xiaojun Chen and
          Qin Zhang and
          Shaotian Cai and
          Wenzhe Zhao and
          Hongfa Wang},
title        = {Deep Unsupervised Hashing with Latent Semantic Components},
booktitle    = {Proceedings of the AAAI Conference on Artificial Intelligence 2022},
pages        = {7488--7496},
year         = {2022}
}
           

A Dynamic Variational Framework for Open-World Node Classification in Structured Sequences
ICDM 2022: 703-712

Structured sequences are a popular data representation, used to model complex data such as traffic networks. A key machine learning task for structured sequences is node classification, that is predicting the class labels of unlabeled nodes. Though many node classification models were proposed, they assume a closed world setting, that all class labels appear in the training data. But in the real-world, the presence of never-before-seen class labels in testing data can considerably degrade a classifier’s accuracy. A promising solution to this issue is to build classifiers for an open-world setting, where samples with unknown class labels are continuously observed such that training and testing data may have different class label spaces. Several approaches have been proposed for open-world learning problems in computer vision and natural language processing, but they cannot be applied directly to structured sequences due to the complexity of their non-Euclidean properties and their dynamic nature. This paper addresses this important research gap by proposing a novel Open-world Structured Sequence node Classification (OSSC) model, to learn from structured sequences in an open-world setting. OSSC captures the structural and temporal information via a GCN-based dynamic variational framework. A latent distribution sequence is learned for each node using both stochastic states and deterministic states, to capture the evolution of node attributes and topology, followed by a sampling process to generate node representations. An open-world classification loss is further adopted to ensure that node representations are sensitive to unknown classes. And a combination of Openmax and Softmax is utilized to recognize nodes from unknown classes and to classify others to one of the known classes. Experiments on real-world datasets show that the proposed OSSC method is capable of learning accurate open-world node classifiers from structured sequence data.

 
@inproceedings{zhang2022dynamic,
author       = {Qin Zhang and
            Qincai Li and
            Xiaojun Chen and
            Peng Zhang and
            Shirui Pan and
            Philippe Fournier{-}Viger and
            Joshua Zhexue Huang},
title        = {A Dynamic Variational Framework for Open-World Node Classification
            in Structured Sequences},
booktitle    = {{IEEE} International Conference on Data Mining, {ICDM} 2022},
pages        = {703--712},
year         = {2022}
}
               

Logic tensor network with massive learned knowledge for aspect-based sentiment analysis
Knowl. Based Syst. 257: 109943 (2022)

Aspect-based sentiment analysis assists service providers to better understand users’ opinions expressed in massive amounts of online posts, because it automatically infers users’ sentiments towards the aspect terms of interest. Recently, several researchers have attempted to apply first-order logic (FOL) rules to deep neural networks via the posterior constraint method. However, existing methods simply apply a priori constraints to represent the FOL with coefficients selected by hand, which requires improvements in incorporating and adapting abstract knowledge in data. In this study, we propose a novel logic tensor network with massive rules (LTNMR) for aspect-based sentiment analysis, which is constructed by incorporating FOL. Specifically, we integrate two types of knowledge into the logic tensor network: (1) dependency knowledge, which improves the efficiency of the capture of aspect-related words and (2) the human-defined knowledge rule, which helps the classifier understand the sentiment of the extracted aspect-related words. Furthermore, to achieve high inferring accuracy, we propose a mutual distillation structure knowledge injection (MDSKI) strategy. MDSKI transfers dependency knowledge from teacher Bert to LTNMR, which acts as the student network. Experiments demonstrate that the proposed LTNMR, combined with the MDSKI strategy, substantially outperforms state-of-the-art results for aspect-based sentiment analysis.

 
@article{huang2022logic,
  author       = {Hu Huang and
                  Bowen Zhang and
                  Liwen Jing and
                  Xianghua Fu and
                  Xiaojun Chen and
                  Jianyang Shi},
  title        = {Logic tensor network with massive learned knowledge for aspect-based
                  sentiment analysis},
  journal      = {Knowl. Based Syst.},
  volume       = {257},
  pages        = {109943},
  year         = {2022},
  doi          = {10.1016/j.knosys.2022.109943}
}
                 

Directly solving normalized cut for multi-view data
Pattern Recognit. 130: 108809 (2022)

Graph-based multi-view clustering, which aims to uncover clusters from multi-view data with graph clustering technique, is one of the most important multi-view clustering methods. Such methods usually perform eigen-decomposition first to solve the relaxed problem and then obtain the final cluster indicator matrix from eigenvectors by k-means or spectral rotation. However, such a two-step process may result in undesired clustering result since the two steps aim to solve different problems. In this paper, we propose a k-way normalized cut method for multi-view data, named as the Multi-view Discrete Normalized Cut (MDNC). The new method learns a set of implicit weights for each view to identify its quality, and a novel iterative algorithm is proposed to directly solve the new model without relaxation and post-processing. Moreover, we propose a new method to adjust the distribution of the implicit view weights to obtain better clustering result. Extensive experimental results show that the performance of our approach is superior to the state-of-the-art methods.

 
@article{wang2022direct,
  author       = {Chen Wang and
                  Xiaojun Chen and
                  Feiping Nie and
                  Joshua Zhexue Huang},
  title        = {Directly solving normalized cut for multi-view data},
  journal      = {Pattern Recognit.},
  volume       = {130},
  pages        = {108809},
  year         = {2022},
  doi          = {10.1016/j.patcog.2022.108809}
}
                   

Semisupervised Feature Selection via Structured Manifold Learning
IEEE Trans. Cybern. 52(7): 5756-5766 (2022)

Recently, semisupervised feature selection has gained more attention in many real applications due to the high cost of obtaining labeled data. However, existing methods cannot solve the “multimodality” problem that samples in some classes lie in several separate clusters. To solve the multimodality problem, this article proposes a new feature selection method for semisupervised task, namely, semisupervised structured manifold learning (SSML). The new method learns a new structured graph which consists of more clusters than the known classes. Meanwhile, we propose to exploit the submanifold in both labeled data and unlabeled data by consuming the nearest neighbors of each object in both labeled and unlabeled objects. An iterative optimization algorithm is proposed to solve the new model. A series of experiments was conducted on both synthetic and real-world datasets and the experimental results verify the ability of the new method to solve the multimodality problem and its superior performance compared with the state-of-the-art methods.

 
@article{DBLP:journals/tcyb/00060W00022,
  author       = {Xiaojun Chen and
                  Renjie Chen and
                  Qingyao Wu and
                  Feiping Nie and
                  Min Yang and
                  Rui Mao},
  title        = {Semisupervised Feature Selection via Structured Manifold Learning},
  journal      = {{IEEE} Trans. Cybern.},
  volume       = {52},
  number       = {7},
  pages        = {5756--5766},
  year         = {2022},
  doi          = {10.1109/TCYB.2021.3052847}
}
                     

Semisupervised Feature Selection With Sparse Discriminative Least Squares Regression
IEEE Trans. Cybern. 52(8): 8413-8424 (2022)

In big data time, selecting informative features has become an urgent need. However, due to the huge cost of obtaining enough labeled data for supervised tasks, researchers have turned their attention to semisupervised learning, which exploits both labeled and unlabeled data. In this article, we propose a sparse discriminative semisupervised feature selection (SDSSFS) method. In this method, the ϵ -dragging technique for the supervised task is extended to the semisupervised task, which is used to enlarge the distance between classes in order to obtain a discriminative solution. The flexible ℓ2,p norm is implicitly used as regularization in the new model. Therefore, we can obtain a more sparse solution by setting smaller p . An iterative method is proposed to simultaneously learn the regression coefficients and ϵ -dragging matrix and predicting the unknown class labels. Experimental results on ten real-world datasets show the superiority of our proposed method.

 
@article{DBLP:journals/tcyb/WangCYNY22,
author       = {Chen Wang and
                Xiaojun Chen and
                Guowen Yuan and
                Feiping Nie and
                Min Yang},
title        = {Semisupervised Feature Selection With Sparse Discriminative Least
                Squares Regression},
journal      = {{IEEE} Trans. Cybern.},
volume       = {52},
number       = {8},
pages        = {8413--8424},
year         = {2022},
doi          = {10.1109/TCYB.2021.3060804}
}
                       




2021

Deep Structured Clustering of Short Text
Big Data (CCF) 2021: 310-323

Short text clustering is beneficial in many applications such as articles recommendations, user clustering and event exploration. Recent works of short text clustering boost the clustering results by improving the representation of short text with deep neural networks, such as CNN and autoencoder. However, existing short text deep clustering methods ignore the structure information of short texts. In this paper, we present a GCN-based clustering method for short text clustering, named as Deep Structured Clustering (DSC) method, to explore the relationships among short texts for representation learning. We first construct a k-nn graph to capture the relationships among the short texts, and then jointly learn the short text representations and perform clustering with a dual self-supervised learning module. The experimental results demonstrate the superiority of our proposed method, and the ablation experimental results verify the effectiveness of the modules in our proposed method.

 
@inproceedings{DBLP:conf/bdccf/Wu0CLW21,
author       = {Junxian Wu and
                Xiaojun Chen and
                Shaotian Cai and
                Yongqi Li and
                Huzi Wu},
editor       = {Xiangke Liao and
                Wei Zhao and
                Enhong Chen and
                Nong Xiao and
                Li Wang and
                Yang Gao and
                Yinghuan Shi and
                Changdong Wang and
                Dan Huang},
title        = {Deep Structured Clustering of Short Text},
booktitle    = {Big Data - 9th {CCF} Conference, BigData 2021, Guangzhou, China, January
                8-10, 2022, Revised Selected Papers},
series       = {Communications in Computer and Information Science},
volume       = {1496},
pages        = {310--323},
publisher    = {Springer},
year         = {2021},
doi          = {10.1007/978-981-16-9709-8\_21},
}
           

Deep Self-Adaptive Hashing for Image Retrieval
CIKM 2021: 1028-1037

Hashing technology has been widely used in image retrieval due to its computational and storage efficiency. Recently, deep unsupervised hashing methods have attracted increasing attention due to the high cost of human annotations in the real world and the superiority of deep learning technology. However, most deep unsupervised hashing methods usually pre-compute a similarity matrix to model the pairwise relationship in the pre-trained feature space. Then this similarity matrix would be used to guide hash learning, in which most of the data pairs are treated equivalently. The above process is confronted with the following defects:1) The pre-computed similarity matrix is inalterable and disconnected from the hash learning process, which cannot explore the underlying semantic information. 2) The informative data pairs may be buried by the large number of less-informative data pairs. To solve the aforementioned problems, we propose a Deep Self-Adaptive Hashing(DSAH) model to adaptively capture the semantic information with two special designs: Adaptive Neighbor Discovery(AND) and Pairwise Information Content(PIC). Firstly, we adopt the AND to initially construct a neighborhood-based similarity matrix, and then refine this initial similarity matrix with a novel update strategy to further investigate the semantic structure behind the learned representation. Secondly, we measure the priorities of data pairs with PIC and assign adaptive weights to them, which is relies on the assumption that more dissimilar data pairs contain more discriminative information for hash learning. Extensive experiments on several datasets demonstrate that the above two technologies facilitate the deep hashing model to achieve superior performance.

 
@inproceedings{DBLP:conf/cikm/LinCZTC21,
author       = {Qinghong Lin and
                Xiaojun Chen and
                Qin Zhang and
                Shangxuan Tian and
                Yudong Chen},
editor       = {Gianluca Demartini and
                Guido Zuccon and
                J. Shane Culpepper and
                Zi Huang and
                Hanghang Tong},
title        = {Deep Self-Adaptive Hashing for Image Retrieval},
booktitle    = {{CIKM} '21: The 30th {ACM} International Conference on Information
                and Knowledge Management, Virtual Event, Queensland, Australia, November
                1 - 5, 2021},
pages        = {1028--1037},
publisher    = {{ACM}},
year         = {2021},
doi          = {10.1145/3459637.3482247}
}
             

Deep Self-Adaptive Hashing for Image Retrieval
CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, ACM, 2021: 1028--1037

Hashing technology has been widely used in image retrieval due to its computational and storage efficiency. Recently, deep unsupervised hashing methods have attracted increasing attention due to the high cost of human annotations in the real world and the superiority of deep learning technology. However, most deep unsupervised hashing methods usually pre-compute a similarity matrix to model the pairwise relationship in the pre-trained feature space. Then this similarity matrix would be used to guide hash learning, in which most of the data pairs are treated equivalently. The above process is confronted with the following defects: 1) The pre-computed similarity matrix is inalterable and disconnected from the hash learning process, which cannot explore the underlying semantic information. 2) The informative data pairs may be buried by the large number of less-informative data pairs. To solve the aforementioned problems, we propose a Deep Self-Adaptive Hashing (DSAH) model to adaptively capture the semantic information with two special designs: Adaptive Neighbor Discovery (AND) and Pairwise Information Content (PIC). Firstly, we adopt the AND to initially construct a neighborhood-based similarity matrix, and then refine this initial similarity matrix with a novel update strategy to further investigate the semantic structure behind the learned representation. Secondly, we measure the priorities of data pairs with PIC and assign adaptive weights to them, which is relies on the assumption that more dissimilar data pairs contain more discriminative information for hash learning. Extensive experiments on several datasets demonstrate that the above two technologies facilitate the deep hashing model to achieve superior performance.

 
@inproceedings{linDeepSelfAdaptiveHashing2021,
title = {Deep Self-Adaptive Hashing for Image Retrieval},
booktitle = {Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
author = {Lin, Qinghong and Chen, Xiaojun and Zhang, Qin and Tian, Shangxuan and Chen, Yudong},
year = {2021},
month = oct,
pages = {1028--1037},
publisher = {ACM},
doi = {10.1145/3459637.3482247}
}
     

Adaptive discriminant analysis for semi-supervised feature selection
Inf. Sci. 566: 178-194 (2021)

As semi-supervised feature selection is becoming much more popular among researchers, many related methods have been proposed in recent years. However, many of these methods first compute a similarity matrix prior to feature selection, and the matrix is then fixed during the subsequent feature selection process. Clearly, the similarity matrix generated from the original dataset is susceptible to the noise features. In this paper, we propose a novel adaptive discriminant analysis for semi-supervised feature selection, namely, SADA. Instead of computing a similarity matrix first, SADA simultaneously learns an adaptive similarity matrix S and a projection matrix W with an iterative process. Moreover. we introduce the l_(2,p) norm to control the sparsity of S by adjusting p. Experimental results show that S will become sparser with the decrease of p. The experimental results for synthetic datasets and nine benchmark datasets demonstrate the superiority of SADA, in comparison with 6 semi-supervised feature selection methods.

 
@article{DBLP:journals/isci/ZhongCNH21,
author       = {Weichan Zhong and
Xiaojun Chen and
Feiping Nie and
Joshua Zhexue Huang},
title        = {Adaptive discriminant analysis for semi-supervised feature selection},
journal      = {Inf. Sci.},
volume       = {566},
pages        = {178--194},
year         = {2021},
doi          = {10.1016/j.ins.2021.02.035}
}
       

Learning unsupervised node representation from multi-view network
Inf. Sci. 579: 700-716 (2021)

This paper studies the problem of learning node representations for networks with multiple views, which aims to infer robust node representations by simultaneously considering multiple views during the representations learning process. We propose an effective method for this task, named as Multi-View Representation Learning (MVRL). The new method extends the matrix factorization model for node representation learning of multi-view network in unsupervised representation learning scenario, which simultaneously learns a set of view weights to identify the quality of each view, and the network representations as matrix factorization of the weighted combination of multiple views. An efficient optimization method with linear complexity is proposed to solve the new model, and a simple yet efficient method is proposed for fast updating of the new node’s vector representation without updating the whole nodes’ representation vectors. We have evaluated the performance of our proposed approach on five real-world multi-view network datasets. Experimental results on the node classification task demonstrated the superior performance and efficiency of our proposed method.

 
@article{DBLP:journals/isci/WangCCNWM21,
author       = {Chen Wang and
Xiaojun Chen and
Bingkun Chen and
Feiping Nie and
Bo Wang and
Zhong Ming},
title        = {Learning unsupervised node representation from multi-view network},
journal      = {Inf. Sci.},
volume       = {579},
pages        = {700--716},
year         = {2021},
doi          = {10.1016/j.ins.2021.07.087}
}
         

Selection of diverse features with a diverse regularization
Pattern Recognit. 120: 108154 (2021)

Many embedded feature selection methods ignore the correlation among the important features. To reduce correlation, some models introduce constraints to impose sparsity on features, some try to exploit the similarity and group features without changing the objective function. In this paper, we propose diverse feature selection (DFS), which simultaneously performs feature clustering and selection. Given a dataset with known class labels, we separate the features into a set of feature clusters where the features in the same cluster have a higher correlation with each other than with the features in different clusters. A diverse regularization (DR) is proposed to reduce the linear and nonlinear correlations among important features. Using this regularization, DFS can select features that are both informative and diverse. The experimental results on seven image datasets, five gene datasets as well as four other datasets demonstrate the superior performance of DFS.

 
@article{DBLP:journals/pr/Zhong0W0H21,
author       = {Weichan Zhong and
Xiaojun Chen and
Qingyao Wu and
Min Yang and
Joshua Zhexue Huang},
title        = {Selection of diverse features with a diverse regularization},
journal      = {Pattern Recognit.},
volume       = {120},
pages        = {108154},
year         = {2021},
doi          = {10.1016/j.patcog.2021.108154}
}
           

Neural Attentive Network for Cross-Domain Aspect-Level Sentiment Classification
IEEE Trans. Affect. Comput. 12(3): 761-775 (2021)

This work takes the lead to study the aspect-level sentiment classification in the domain adaptation scenario . Given a document of any domains, the model needs to figure out the sentiments with respect to fine-grained aspects in the documents. Two main challenges exist in this problem. One is to build a robust document modeling across domains; the other is to mine the domain-specific aspects and make use of the sentiment lexicon. In this paper, we propose a novel approach Neural Attentive model for cross-domain Aspect-level sentiment CLassification (NAACL), which leverages the benefits of the supervised deep neural network as well as the unsupervised probabilistic generative model to strengthen the representation learning. NAACL jointly learns two tasks: (i) a domain classifier, working on documents in both the source and target domains to recognize the domain information of input texts and transfer knowledge from the source domain to the target domain. In particular, a weakly supervised Latent Dirichlet Allocation model (wsLDA) is proposed to learn the domain-specific aspect and sentiment lexicon representations that are then used to calculate the aspect/lexicon-aware document representations via a multi-view attention mechanism; (ii) an aspect-level sentiment classifier, sharing the document modeling with the domain classifier. It makes use of the domain classification results and the aspect/sentiment-aware document representations to classify the aspect-level sentiment of the document in domain adaptation scenario. NAACL is evaluated on both English and Chinese datasets with the out-of-domain as well as in-domain setups. Quantitatively, the experiments demonstrate that NAACL has robust superiority over the compared methods in terms of classification accuracy and F1 score. The qualitative evaluation also shows that the proposed model is capable of reasonably paying attention to those words that are important to judge the sentiment polarity of the input text given an aspect.

 
@article{DBLP:journals/taffco/00070QTS021,
author       = {Min Yang and
Wenpeng Yin and
Qiang Qu and
Wenting Tu and
Ying Shen and
Xiaojun Chen},
title        = {Neural Attentive Network for Cross-Domain Aspect-Level Sentiment Classification},
journal      = {{IEEE} Trans. Affect. Comput.},
volume       = {12},
number       = {3},
pages        = {761--775},
year         = {2021},
doi          = {10.1109/TAFFC.2019.2897093}
}
             

Fast Manifold Ranking With Local Bipartite Graph
IEEE Transactions on Image Processing, 2021,30:6744-6756

During the past decades, manifold ranking has been widely applied to content-based image retrieval and shown excellent performance. However, manifold ranking is computationally expensive in both graph construction and ranking learning. Much effort has been devoted to improve its performance by introducing approximating techniques. In this paper, we propose a fast manifold ranking method, namely Local Bipartite Manifold Ranking (LBMR). Given a set of images, we first extract multiple regions from each image to form a large image descriptor matrix, and then use the anchor-based strategy to construct a local bipartite graph in which a regional k-means (RKM) is proposed to obtain high quality anchors. We propose an iterative method to directly solve the manifold ranking problem from the local bipartite graph, which monotonically decreases the objective function value in each iteration until the algorithm converges. Experimental results on several real-world image datasets demonstrate the effectiveness and efficiency of our proposed method.

 
@article{DBLP:journals/tip/ChenYWN21,
author       = {Xiaojun Chen and
Yuzhong Ye and
Qingyao Wu and
Feiping Nie},
title        = {Fast Manifold Ranking With Local Bipartite Graph},
journal      = {{IEEE} Trans. Image Process.},
volume       = {30},
pages        = {6744--6756},
year         = {2021},
doi          = {10.1109/TIP.2021.3096082}
}
 

Hierarchical Human-Like Deep Neural Networks for Abstractive Text Summarization
IEEE Trans. Neural Networks Learn. Syst. 32(6): 2744-2757 (2021)

Developing an abstractive text summarization (ATS) system that is capable of generating concise, appropriate, and plausible summaries for the source documents is a long-term goal of artificial intelligence (AI). Recent advances in ATS are overwhelmingly contributed by deep learning techniques, which have taken the state-of-the-art of ATS to a new level. Despite the significant success of previous methods, generating high-quality and human-like abstractive summaries remains a challenge in practice. The human reading cognition, which is essential for reading comprehension and logical thinking, is still relatively new territory and underexplored in deep neural networks. In this article, we propose a novel Hierarchical Human-like deep neural network for ATS (HH-ATS), inspired by the process of how humans comprehend an article and write the corresponding summary. Specifically, HH-ATS is composed of three primary components (i.e., a knowledge-aware hierarchical attention module, a multitask learning module, and a dual discriminator generative adversarial network), which mimic the three stages of human reading cognition (i.e., rough reading, active reading, and postediting). Experimental results on two benchmark data sets (CNN/Daily Mail and Gigaword) demonstrate that HH-ATS consistently and substantially outperforms the compared methods.

 
@article{DBLP:journals/tnn/YangLSWZC21,
author       = {Min Yang and
Chengming Li and
Ying Shen and
Qingyao Wu and
Zhou Zhao and
Xiaojun Chen},
title        = {Hierarchical Human-Like Deep Neural Networks for Abstractive Text
Summarization},
journal      = {{IEEE} Trans. Neural Networks Learn. Syst.},
volume       = {32},
number       = {6},
pages        = {2744--2757},
year         = {2021},
doi          = {10.1109/TNNLS.2020.3008037}
}
 

An Effective Hybrid Learning Model for Real-Time Event Summarization
IEEE Trans. Neural Networks Learn. Syst. 32(10): 4419-4431 (2021)

Real-time event summarization (RES) aims at extracting a handful of document updates from an overwhelming document stream as the real-time event summary that tracks and summarizes the evolving event of interest. It has been attracting much attention, especially with the growth of streaming applications. Despite the effectiveness of previous studies, obtaining relevant, nonredundant, and timely event summaries remains challenging in real-life applications. This study proposes an effective Hybrid learning model for RES (HRES), which attempts to resolve all three challenges (i.e., nonredundancy, relevance, and timeliness) of RES in a unified framework. The main idea is to: 1) exploit the factual background knowledge from the knowledge base (KB) to capture the informative knowledge and implicit information from the input document/query for better text matching; 2) design a memory network to memorize the input facts temporally from the historical document stream and avoid pushing redundant facts in subsequent timesteps; 3) leverage relevance prediction as an auxiliary task to strengthen the document modeling and help to extract relevant documents; and 4) consider both historical dependencies and future uncertainty of the real-time document stream by exploiting the reinforcement learning technique. Extensive experiments demonstrate that HRES has robust superiority over competitors and gains the state-of-the-art results.

 
@article{DBLP:journals/tnn/YangQSZCL21,
author       = {Min Yang and
Qiang Qu and
Ying Shen and
Zhou Zhao and
Xiaojun Chen and
Chengming Li},
title        = {An Effective Hybrid Learning Model for Real-Time Event Summarization},
journal      = {{IEEE} Trans. Neural Networks Learn. Syst.},
volume       = {32},
number       = {10},
pages        = {4419--4431},
year         = {2021},
doi          = {10.1109/TNNLS.2020.3017747}
}
 




2020

Enhanced Balanced Min Cut
International Journal of Computer Vision, 2020,128(7): 1982-1995

Spectral clustering is a hot topic and many spectral clustering algorithms have been proposed. These algorithms usually solve the discrete cluster indicator matrix by relaxing the original problems, obtaining the continuous solution and finally obtaining a discrete solution that is close to the continuous solution. However, such methods often result in a non-optimal solution to the original problem since the different steps solve different problems. In this paper, we propose a novel spectral clustering method, named as Enhanced Balanced Min Cut (EBMC). In the new method, a new normalized cut model is proposed, in which a set of balance parameters are learned to capture the differences among different clusters. An iterative method with proved convergence is used to effectively solve the new model without eigendecomposition. Theoretical analysis reveals the connection between EBMC and the classical normalized cut. Extensive experimental results show the effectiveness and efficiency of our approach in comparison with the state-of-the-art methods.

 
@article{chenEnhancedBalancedMin2020,
  title = {Enhanced Balanced Min Cut},
author = {Chen, Xiaojun and Hong, Weijun and Nie, Feiping and Huang, Joshua Zhexue and Shen, Li},
year = {2020},
month = jul,
journal = {International Journal of Computer Vision},
volume = {128},
number = {7},
pages = {1982--1995},
doi = {10.1007/s11263-020-01320-3}
}
       

Semi-Supervised Feature Selection via Sparse Rescaled Linear Square Regression
IEEE Transactions on Knowledge & Data Engineering, 2020, 32(1): 165-176

Spectral clustering is a hot topic and many spectral clustering algorithms have been proposed. These algorithms usually solve the discrete cluster indicator matrix by relaxing the original problems, obtaining the continuous solution and finally obtaining a discrete solution that is close to the continuous solution. However, such methods often result in a non-optimal solution to the original problem since the different steps solve different problems. In this paper, we propose a novel spectral clustering method, named as Enhanced Balanced Min Cut (EBMC). In the new method, a new normalized cut model is proposed, in which a set of balance parameters are learned to capture the differences among different clusters. An iterative method with proved convergence is used to effectively solve the new model without eigendecomposition. Theoretical analysis reveals the connection between EBMC and the classical normalized cut. Extensive experimental results show the effectiveness and efficiency of our approach in comparison with the state-of-the-art methods.

 
@article{chenSemiSupervisedFeatureSelection2020,
title = {Semi-Supervised Feature Selection via Sparse Rescaled Linear Square Regression},
author = {Chen, Xiaojun and Yuan, Guowen and Nie, Feiping and Ming, Zhong},
year = {2020},
month = jan,
journal = {IEEE Transactions on Knowledge and Data Engineering},
volume = {32},
number = {1},
pages = {165--176},
issn = {1041-4347, 1558-2191, 2326-3865},
doi = {10.1109/TKDE.2018.2879797}
}
         

LABIN: Balanced Min Cut for Large-Scale Data
IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(3): 725 - 736

Although many spectral clustering algorithms have been proposed during the past decades, they are not scalable to large-scale data due to their high computational complexities. In this paper, we propose a novel spectral clustering method for large-scale data, namely, large-scale balanced min cut (LABIN). A new model is proposed to extend the self-balanced min-cut (SBMC) model with the anchor-based strategy and a fast spectral rotation with linear time complexity is proposed to solve the new model. Extensive experimental results show the superior performance of our proposed method in comparison with the state-of-the-art methods including SBMC.

 
@article{chenLABINBalancedMin2020,
title = {LABIN: Balanced Min Cut for Large-Scale Data},
shorttitle = {LABIN},
author = {Chen, Xiaojun and Chen, Renjie and Wu, Qingyao and Fang, Yixiang and Nie, Feiping and Huang, Joshua Zhexue},
year = {2020},
month = {mar},
journal = {IEEE Transactions on Neural Networks and Learning Systems},
volume = {31},
number = {3},
pages = {725--736},
doi = {10.1109/TNNLS.2019.2909425}
}
     




2019

Subspace Weighting Co-Clustering of Gene Expression Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019, 16(2): 352--364

Microarray technology enables the collection of vast amounts of gene expression data from biological experiments. Clustering algorithms have been successfully applied to exploring the gene expression data. Since a set of genes may be only correlated to a subset of samples, it is useful to use co-clustering to recover co-clusters in the gene expression data. In this paper, we propose a novel algorithm, called Subspace Weighting Co-Clustering (SWCC), for high dimensional gene expression data. In SWCC, a gene subspace weight matrix is introduced to identify the contribution of gene objects in distinguishing different sample clusters. We design a new co-clustering objective function to recover the co-clusters in the gene expression data, in which the subspace weight matrix is introduced. An iterative algorithm is developed to solve the objective function, in which the subspace weight matrix is automatically computed during the iterative co-clustering process. Our empirical study shows encouraging results of the proposed algorithm in comparison with six state-of-the-art clustering algorithms on ten gene expression data sets. We also propose to use SWCC for gene clustering and selection. The experimental results show that the selected genes can improve the classification performance of Random Forests.

 
@article{chenSubspaceWeightingCoClustering2019,
title = {Subspace Weighting Co-Clustering of Gene Expression Data},
author = {Chen, Xiaojun and Huang, Joshua Z. and Wu, Qingyao and Yang, Min},
year = {2019},
month = mar,
journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics},
volume = {16},
number = {2},
pages = {352--364},
doi = {10.1109/TCBB.2017.2705686}
}
        
     




2018

Spectral Clustering of Large-scale Data by Directly Solving Normalized Cut
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018: 1206--1215

During the past decades, many spectral clustering algorithms have been proposed. However, their high computational complexities hinder their applications on large-scale data. Moreover, most of them use a two-step approach to obtain the optimal solution, which may deviate from the solution by directly solving the original problem. In this paper, we propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model. DNC has a quadratic time complexity, which is a significant reduction comparing with the cubic time complexity of the traditional spectral clustering. To cope with large-scale data, a Fast Normalized Cut (FNC) method with linear time and space complexities is proposed by extending DNC with an anchor-based strategy. In the new method, we first seek a set of anchors and then construct a representative similarity matrix by computing distances between the anchors and the whole data set. To find high quality anchors that best represent the whole data set, we propose a Balanced k-means (BKM) to partition a data set into balanced clusters and use the cluster centers as anchors. Then DNC is used to obtain the final clustering result from the representative similarity matrix. A series of experiments were conducted on both synthetic data and real-world data sets, and the experimental results show the superior performance of BKM, DNC and FNC.

 
@inproceedings{chenSpectralClusteringLargescale2018,
title = {Spectral Clustering of Large-scale Data by Directly Solving Normalized Cut},
booktitle = {Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
author = {Chen, Xiaojun and Hong, Weijun and Nie, Feiping and He, Dan and Yang, Min and Huang, Joshua Zhexue},
year = {2018},
month = {jul},
pages = {1206--1215},
publisher = {ACM},
doi = {10.1145/3219819.3220039},
}
   

Local Adaptive Projection Framework for Feature Selection of Labeled and Unlabeled Data
IEEE Transactions on Neural Networks & Learning Systems, 2019,49(9):3230-3241

Most feature selection methods first compute a similarity matrix by assigning a fixed value to pairs of objects in the whole data or to pairs of objects in a class or by computing the similarity between two objects from the original data. The similarity matrix is fixed as a constant in the subsequent feature selection process. However, the similarities computed from the original data may be unreliable, because they are affected by noise features. Moreover, the local structure within classes cannot be recovered if the similarities between the pairs of objects in a class are equal. In this paper, we propose a novel local adaptive projection (LAP) framework. Instead of computing fixed similarities before performing feature selection, LAP simultaneously learns an adaptive similarity matrix S and a projection matrix W with an iterative method. In each iteration, S is computed from the projected distance with the learned W and W is computed with the learned S. Therefore, LAP can learn better projection matrix W by weakening the effect of noise features with the adaptive similarity matrix. A supervised feature selection with LAP (SLAP) method and an unsupervised feature selection with LAP (ULAP) method are proposed. Experimental results on eight data sets show the superiority of SLAP compared with seven supervised feature selection methods and the superiority of ULAP compared with five unsupervised feature selection methods.

 
@article{chenLocalAdaptiveProjection2018,
title = {Local Adaptive Projection Framework for Feature Selection of Labeled and Unlabeled Data},
author = {Chen, Xiaojun and Yuan, Guowen and Wang, Wenting and Nie, Feiping and Chang, Xiaojun and Huang, Joshua Zhexue},
year = {2018},
month = {dec},
journal = {IEEE Transactions on Neural Networks and Learning Systems},
volume = {29},
number = {12},
pages = {6362--6373},
doi = {10.1109/TNNLS.2018.2830186}
}
 

TWCC: Automated Two-way Subspace Weighting Partitional Co-Clustering
Pattern Recognition, 2018, 76: 404--415

A two-way subspace weighting partitional co-clustering method TWCC is proposed. In this method, two types of subspace weights are introduced to simultaneously weight the data in two ways, i.e., columns on row clusters and rows on column clusters. An objective function that uses the two types of weights in the distance function to determine the co-clusters of data is defined, and an iterative TWCC co-clustering algorithm to optimize the objective function is proposed, in which the two types of subspace weights are automatically computed. A series of experiments on both synthetic and real-life data were conducted to investigate the properties of TWCC, compare the two-way clustering results of TWCC with those of eight co-clustering algorithms, and compare one-way clustering results of TWCC with those of six clustering algorithms. The results have shown that TWCC is robust and effective for large high-dimensional data.

 
@article{chenTWCCAutomatedTwoway2018,
title = {TWCC: Automated Two-way Subspace Weighting Partitional Co-Clustering},
shorttitle = {TWCC},
author = {Chen, Xiaojun and Yang, Min and Zhexue Huang, Joshua and Ming, Zhong},
year = {2018},
month = apr,
journal = {Pattern Recognition},
volume = {76},
pages = {404--415},
issn = {00313203},
doi = {10.1016/j.patcog.2017.10.026}
}
 

PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data
IEEE Transactions on Knowledge and Data Engineering, 30(3): 559-572 (2018)

Clustering of customer transaction data is an important procedure to analyze customer behaviors in retail and e-commerce companies. Note that products from companies are often organized as a product tree, in which the leaf nodes are goods to sell, and the internal nodes (except root node) could be multiple product categories. Based on this tree, we propose the “personalized product tree”, named purchase tree, to represent a customer’s transaction records. So the customers’ transaction data set can be compressed into a set of purchase trees. We propose a partitional clustering algorithm, named PurTreeClust, for fast clustering of purchase trees. A new distance metric is proposed to effectively compute the distance between two purchase trees. To cluster the purchase tree data, we first rank the purchase trees as candidate representative trees with a novel separate density, and then select the top k customers as the representatives of k customer groups. Finally, the clustering results are obtained by assigning each customer to the nearest representative. We also propose a gap statistic based method to evaluate the number of clusters. A series of experiments were conducted on ten real-life transaction data sets, and experimental results show the superior performance of the proposed method.

 
@article{chenPurTreeClustClusteringAlgorithm2018,
title = {PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data},
shorttitle = {PurTreeClust},
author = {Chen, Xiaojun and Fang, Yixiang and Yang, Min and Nie, Feiping and Zhao, Zhou and Huang, Joshua Zhexue},
year = {2018},
month = mar,
journal = {IEEE Transactions on Knowledge and Data Engineering},
volume = {30},
number = {3},
pages = {559--572},
issn = {1041-4347, 1558-2191, 2326-3865},
doi = {10.1109/TKDE.2017.2763620}
}
 




2017

A Self-Balanced Min-Cut Algorithm for Image Clustering
International Conference on Computer Vision, 2017: 2080--2088

Many spectral clustering algorithms have been proposed and successfully applied to image data analysis such as content based image retrieval, image annotation, and image indexing. Conventional spectral clustering algorithms usually involve a two-stage process: eigendecomposition of similarity matrix and clustering assignments from eigenvectors by k-means or spectral rotation. However, the final clustering assignments obtained by the two-stage process may deviate from the assignments by directly optimize the original objective function. Moreover, most of these methods usually have very high computational complexities. In this paper, we propose a new min-cut algorithm for image clustering, which scales linearly to the data size. In the new method, a self-balanced min-cut model is proposed in which the Exclusive Lasso is implicitly introduced as a balance regularizer in order to produce balanced partition. We propose an iterative algorithm to solve the new model, which has a time complexity of O(n) where n is the number of samples. Theoretical analysis reveals that the new method can simultaneously minimize the graph cut and balance the partition across all clusters. A series of experiments were conducted on both synthetic and benchmark data sets and the experimental results show the superior performance of the new method.

 
@inproceedings{chenSelfBalancedMinCutAlgorithm2017,
title = {A Self-Balanced Min-Cut Algorithm for Image Clustering},
booktitle = {2017 IEEE International Conference on Computer Vision (ICCV)},
author = {Chen, Xiaojun and Haung, Joshua Zhexue and Nie, Feiping and Chen, Renjie and Wu, Qingyao},
year = {2017},
month = oct,
pages = {2080--2088},
publisher = {IEEE},
address = {Venice},
doi = {10.1109/ICCV.2017.227},
}
   

Semi-supervised Feature Selection via Rescaled Linear Regression
Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017: 1525-1531

With the rapid increase of complex and high-dimensional sparse data, demands for new methods to select features by exploiting both labeled and unlabeled data have increased. Least regression based feature selection methods usually learn a projection matrix and evaluate the importances of features using the projection matrix, which is lack of theoretical explanation. Moreover, these methods cannot find both global and sparse solution of the projection matrix. In this paper, we propose a novel semi-supervised feature selection method which can learn both global and sparse solution of the projection matrix. The new method extends the least square regression model by rescaling the regression coefficients in the least square regression with a set of scale factors, which are used for ranking the features. It has shown that the new model can learn global and sparse solution. Moreover, the introduction of scale factors provides a theoretical explanation for why we can use the projection matrix to rank the features. A simple yet effective algorithm with proved convergence is proposed to optimize the new model. Experimental results on eight real-life data sets show the superiority of the method.

 
@inproceedings{chenSemisupervisedFeatureSelection2017,
title = {Semi-Supervised Feature Selection via Rescaled Linear Regression},
booktitle = {Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence},
author = {Chen, Xiaojun and Yuan, Guowen and Nie, Feiping and Huang, Joshua Zhexue},
year = {2017},
month = aug,
pages = {1525--1531},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
address = {Melbourne, Australia},
doi = {10.24963/ijcai.2017/211},
isbn = {978-0-9992411-0-3}
}
   

Scalable Normalized Cut with Improved Spectral Rotation
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017: 1518--1524

Many spectral clustering algorithms have been proposed and successfully applied to many high-dimensional applications. However, there are still two problems that need to be solved: 1) existing methods for obtaining the final clustering assignments may deviate from the true discrete solution, and 2) most of these methods usually have very high computational complexity. In this paper, we propose a Scalable Normalized Cut method for clustering of large scale data. In the new method, an efficient method is used to construct a small representation matrix and then clustering is performed on the representation matrix. In the clustering process, an improved spectral rotation method is proposed to obtain the solution of the final clustering assignments. A series of experimental were conducted on 14 benchmark data sets and the experimental results show the superior performance of the new method.

 
@inproceedings{chenScalableNormalizedCut2017,
title = {Scalable Normalized Cut with Improved Spectral Rotation},
booktitle = {Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence},
author = {Chen, Xiaojun and Nie, Feiping and Huang, Joshua Zhexue and Yang, Min},
year = {2017},
month = aug,
pages = {1518--1524},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
address = {Melbourne, Australia},
doi = {10.24963/ijcai.2017/210}
}
   




2016

PurTreeClust: A Purchase Tree Clustering Algorithm for Large-scale Customer Transaction Data
International Conference on Data Engineering,2016, pp. 661-672

Clustering of customer transaction data is usually an important procedure to analyze customer behaviors in retail and e-commerce companies. Note that products from companies are often organized as a product tree, in which the leaf nodes are goods to sell, and the internal nodes (except root node) could be multiple product categories. Based on this tree, we present to use a "personalized product tree", called purchase tree, to represent a customer's transaction data. The customer transaction data set can be represented as a set of purchase trees. We propose a PurTreeClust algorithm for clustering of large-scale customers from purchase trees. We define a new distance metric to effectively compute the distance between two purchase trees from the entire levels in the tree. A cover tree is then built for indexing the purchase tree data and we propose a leveled density estimation method for selecting initial cluster centers from a cover tree. PurTreeClust, a fast clustering method for clustering of large-scale purchase trees, is then presented. Last, we propose a gap statistic based method for estimating the number of clusters from the purchase tree clustering results. A series of experiments were conducted on ten large-scale transaction data sets which contain up to four million transaction records, and experimental results have verified the effectiveness and efficiency of the proposed method. We also compared our method with three clustering algorithms, e.g., spectral clustering, hierarchical agglomerative clustering and DBSCAN. The experimental results have demonstrated the superior performance of the proposed method.

 
@inproceedings{chenPurTreeClustPurchaseTree2016,
title = {PurTreeClust: A Purchase Tree Clustering Algorithm for Large-Scale Customer Transaction Data},
shorttitle = {PurTreeClust},
booktitle = {2016 IEEE 32nd International Conference on Data Engineering (ICDE)},
author = {Chen, Xiaojun and Huang, Joshua Zhexue and Luo, Jun},
year = {2016},
month = may,
pages = {661--672},
publisher = {IEEE},
address = {Helsinki, Finland},
doi = {10.1109/ICDE.2016.7498279},
isbn = {978-1-5090-2020-1}
}
   




2013

TW-k-Means: Automated Two-Level Variable Weighting Clustering Algorithm for Multiview Data
IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 932--944

This paper proposes TW-k-means, an automated two-level variable weighting clustering algorithm for multiview data, which can simultaneously compute weights for views and individual variables. In this algorithm, a view weight is assigned to each view to identify the compactness of the view and a variable weight is also assigned to each variable in the view to identify the importance of the variable. Both view weights and variable weights are used in the distance function to determine the clusters of objects. In the new algorithm, two additional steps are added to the iterative k-means clustering process to automatically compute the view weights and the variable weights. We used two real-life data sets to investigate the properties of two types of weights in TW-k-means and investigated the difference between the weights of TW-k-means and the weights of the individual variable weighting method. The experiments have revealed the convergence property of the view weights in TW-k-means. We compared TW-k-means with five clustering algorithms on three real-life data sets and the results have shown that the TW-k-means algorithm significantly outperformed the other five clustering algorithms in four evaluation indices.

 
@article{xiaojunchenTWkmeansAutomatedTwolevel2013,
title = {TW-k-means: Automated Two-Level Variable Weighting Clustering Algorithm for Multiview Data},
shorttitle = {TW-k-means},
author = {Xiaojun Chen and Xiaofei Xu and Huang, J. Z. and Yunming Ye},
year = {2013},
month = apr,
journal = {IEEE Transactions on Knowledge and Data Engineering},
volume = {25},
number = {4},
pages = {932--944},
doi = {10.1109/TKDE.2011.262}
}
          
     




2012

A feature group weighting method for subspace clustering of high-dimensional data
Pattern Recognition, 2012, 45(1): 434--446

This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.

 
@article{chenFeatureGroupWeighting2012,
title = {A Feature Group Weighting Method for Subspace Clustering of High-Dimensional Data},
author = {Chen, Xiaojun and Ye, Yunming and Xu, Xiaofei and Huang, Joshua Zhexue},
year = {2012},
month = jan,
journal = {Pattern Recognition},
volume = {45},
number = {1},
pages = {434--446},
issn = {00313203},
doi = {10.1016/j.patcog.2011.06.004}
}