Investigating Privacy Preservation of Language Models in Legal Text Summarization: A Preliminary Study

Language models (LMs) have shown outstanding performance in legal text summarization. It is crucial that personally identifying information (PII) included in the source document should not leak in the summary. Prior efforts have mostly focused on studying how LMs may inadvertently leak PII from training data. However, to what extent LMs can provide privacy-preserving summaries given a non-private source legal document remains under-explored. In this paper, we perform a preliminary empirical study on privacy-preservation in legal text summarization showing that LMs often cannot prevent PII leakage in summaries.