This paper analyses the effects of masking mechanism for privacy preservation in data-driven models (regression) with respect to database integration. Especially two data masking methods (microaggregation and rank swapping) are applied on two public datasets to evaluate the linear regression model in terms of privacy protection and prediction performance. Our preliminary experimental results show that both methods achieve a good trade-off of privacy protection and information loss. We also show that for some experiments although data integration produces some incorrect links, the linear regression model is still comparable, with respect to prediction error, to the one inferred from the original data.
Also part of the Lecture Notes in Artificial Intelligence book sub series (LNAI, volume 13199).