Downloads

Bagwe, G., Zhang, L., Guo, L., Pan, M., Ma, X., & Yuan, X. Is Embedding-as-a-Service Safe? Meta-Prompt-Based Backdoor Attacks for User-Specific Trigger Migration. Transactions on Artificial Intelligence. 2025. doi: Retrieved from https://w3.sciltp.com/journals/tai/article/view/503

Article

Is Embedding-as-a-Service Safe? Meta-Prompt-Based Backdoor Attacks for User-Specific Trigger Migration

Gaurav Bagwe 1,*, Lan Zhang 1, Linke Guo 1, Miao Pan 2, Xiaolong Ma 1 and Xiaoyong Yuan 1

1 Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, USA

2 Department of Electrical and Computer Engineering, University of Houston, Houston, TX 77204, USA

* Correspondence: gbagwe@clemson.edu

Received: 20 September 2024; Revised: 18 November 2024; Accepted: 20 December 2024; Published: 9 January 2025

Abstract: Embedding-as-a-Service (EaaS) has emerged as a popular paradigm for empowering users with limited resources to leverage large language models (LLMs). Through an API, EaaS providers grant access to their large language embedding models (LLEMs), enabling users with domain expertise to construct the domain-specific layers locally. However, the close interaction between EaaS providers and users raises new concerns: Is EaaS safe for users? Although recent research has highlighted the vulnerability of LLMs to backdoor attacks, especially task-agnostic backdoor attacks, existing attacks cannot be effectively executed in EaaS due to challenges in terms of attack efficacy, attack stealthiness, and user-side knowledge limitations. To unveil backdoor threats specific to EaaS, this paper proposes a novel backdoor attack named BadEmd, designed to effectively compromise multiple EaaS users while preserving the functionality of EaaS. BadEmd comprises two key modules: meta-prompt-based attack buildup creates backdoor attack surfaces in EaaS while seamlessly integrating with prior task-agnostic attacks to ensure attack stealthiness; user-specific trigger migration enforces attack efficacy despite limited user-side knowledge. Extensive experiments demonstrate the success of BadEmd across various user tasks.

Keywords:

large language model embedding as a service backdoor attack security

References

  1. Liu, P.; Yuan, W.; Fu, J.; et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35.
  2. Kandpal, N.; Jagielski, M.; Tramèr, F.; et al. Backdoor attacks for in-context learning with language models. arXiv 2023, arXiv:2307.14692.
  3. Peng, W.; Yi, J.; Wu, F.; et al. Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark. arXiv 2023, arXiv:2305.10036.
  4. Liu, Y.; Jia, J.; Liu, H.; et al. StolenEncoder: Stealing pre-trained encoders in self-supervised learning. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 2115–2128.
  5. Chen, X.; Salem, A.; Chen, D.; et al. Badnl: Backdoor attacks against nlp models with semantic-preserving improvements. In Proceedings of the Annual Computer Security Applications Conference, Virtual, 6–10 December 2021; pp. 554–569.
  6. Guo, S.; Xie, C.; Li, J.; et al. Threats to pre-trained language models: Survey and taxonomy. arXiv 2022, arXiv:2202.06862.
  7. Du, W.; Li, P.; Li, B.; et al. UOR: Universal Backdoor Attacks on Pre-trained Language Models. arXiv 2023, arXiv:2305.09574.
  8. Zhang, Z.; Xiao, G.; Li, Y.; et al. Red alarm for pre-trained models: Universal vulnerability to neuron-level backdoor attacks. Mach. Intell. Res. 2023, 20, 180–193.
  9. Shen, L.; Ji, S.; Zhang, X.; et al. Backdoor pre-trained models can transfer to all. arXiv 2021, arXiv:2111.00197.
  10. Kurita, K.; Michel, P.; Neubig, G. Weight poisoning attacks on pre-trained models. arXiv 2020, arXiv:2004.06660.
  11. Zhang, X.; Zhang, Z.; Ji, S.; et al. Trojaning language models for fun and profit. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P), Virtual, 6–10 September 2021; pp. 179–197.
  12. Huang, Y.; Zhuo, T.Y.; Xu, Q.; et al. Training-free Lexical Backdoor Attacks on Language Models. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 2198–2208.
  13. Qu, W.; Jia, J.; Gong, N.Z. REaaS: Enabling Adversarially Robust Downstream Classifiers via Robust Encoder as a Service. arXiv 2023, arXiv:2301.02905.
  14. Xue, J.; Zheng, M.; Hua, T.; et al. Trojllm: A black-box trojan prompt attack on large language models. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023.
  15. Huang, H.; Zhao, Z.; Backes, M.; et al. Composite backdoor attacks against large language models. arXiv 2023, arXiv:2310.07676.
  16. Smith, J.S.; Karlinsky, L.; Gutta, V.; et al. CODA-Prompt: Continual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11909–11919.
  17. Hospedales, T.; Antoniou, A.; Micaelli, P.; et al. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5149–5169.
  18. Devlin, J.; Chang, M.-W.; Lee, K.; et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805.
  19. Liu, Y. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692.
  20. Zhu, Y. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015.
  21. Socher, R.; Perelygin, A.; Wu, J.; et al. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642.
  22. Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. Adv. Neural Inf. Process Syst. 2015, 28, 1.
  23. Lehmann, J.; Isele, R.; Jakob, M.; et al. Dbpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Website 2015, 6, 167–195.
  24. Merity, S.; Xiong, C.; Bradbury, J.; et al. Pointer sentinel mixture models. arXiv 2016, arXiv:1609.07843.
  25. Wang, Y.; Mishra, S.; Alipoormolabashi, P.; et al. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. arXiv 2022, arXiv:2204.07705.
  26. Qi, F.; Chen, Y.; Li, M.; et al. Onion: A simple and effective defense against textual backdoor attacks. arXiv 2020, arXiv:2011.10369.
  27. Jigsaw Unintended Bias in Toxicity Classification. Available online: https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification (accessed on 25 September 2024).
  28. Metsis, V.; Androutsopoulos, I.; Paliouras, G. Spam filtering with naive bayes-which naive bayes? CEAS 2006, 17, 28–69.
  29. Maas, A.L.; Daly, R.E.; Pham, P.T.; et al. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Association for Computational Linguistics, Portland, OR, USA, 19–24 June 2011; pp. 142–150. Available online: http://www.aclweb.org/anthology/P11-1015(accessed on 25 September 2024).
  30. Pang, B.; Lee, L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. arXiv 2005, arxiv.org/abs/cs/0506075.
  31. Saravia, E.; Liu, H.-C.T.; Huang, Y.-H.; et al. Carer: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 CONFERENCE on empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3687–3697.
  32. Sheng, E.; Uthus, D. Investigating societal biases in a poetry composition system. arXiv 2020, arXiv:2011.02686.
  33. Lhoest, Q.; Del Moral, A.V.; Jernite, Y.; et al. Datasets: A Community Library for Natural Language Processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Online, 7–11 November 2021; pp. 175–184. Available online: https://aclanthology.org/2021.emnlp-demo.21 (accessed on 25 September 2024).