Feature Selection in Privacy Preserving in Data Mining-Review
Keywords:
privacy preserving, feature selection, perturbationAbstract
A key problem that arises in any mass collection of data is that of confidentiality of the data. Privacypreserving data mining (PPDM) is the area of mining that seeks to safeguard sensitive information from uninvited or
unofficial speech act of individual information records. There are several basic techniques like anonymization,
cryptography and randomization. The attributes are segregated supported their sensitivity for privacy preservation
purposes. There are many basic privacy preservation techniques like anonymization, cryptography and randomization.
The attributes are segregated based on their sensitivity for privacy preservation purposes. Automating this attribute
segregation becomes complicated in high dimensional datasets and data streams. Information or correlation of the
attribute on the target class attribute is measured using Information Gain [IG], Gain Ratio [GR] and Pearson
Correlation [PC] ranker based feature selection methods with decision tree and this values are used to segregate them
as Sensitive Attributes [SA], Quasi Identifiers [QI] and Non-Sensitive. Segregated attributes are subjected to various
levels of privacy preservation using both the Double layer Perturbation [DLP] and Single Layer Perturbation [SLP]
algorithms to form the level-1 perturbed datasets. Since the attribute selection uses tree structure, the work proposes a
linked array instead of tree to reduce the number of iterations and increase the efficiency.