LLM-Twin: mini-giant model-driven beyond 5G digital twin networking framework with semantic secure communication and computation
In this section, we present detailed mathematical modeling of the computation and communication for LLM-Twin presented in the previous section, as shown in Fig. 2, which demonstrates the advantages of the high efficiency of LLM-Twin. Specifically, we analyze the traditional FL-based networking framework of DTNs and our approach, which demonstrates the superiority of the proposed approach by comparing the time to complete one DTN modeling, \(T_LLM-Twin\) and \(T_fl\). Meanwhile, we give a macro security analysis of the whole protocol of LLM-Twin with Universally Composable (UC)42 framework based on the design of one-way security and homomorphism in the previous section, which provides a more comprehensive framework for proving the security of LLM-Twin. In addition, we show the key notations and parameters of this section in Table 3.
Computation and communication model
The traditional FL-based DTN paradigm \(DT_i(t)\)17 is shown in Eq. (1), which contains the decision model \(M_i\), the virtual network \(N_i\), the historical data \(H_i\), the state information \(S_i\), and the shared data \(d_i\), which are synchronously mapped and constructed from the physical entities \(u_i\). Specifically, \(DT_i\) continuously interacts with the physical entity \(u_i\) to maintain consistency, which involves model synchronization and state synchronization. DT trains the model by collecting historical data and state information, which is used to analyze and make decisions based on the current real-time state and shared data. In contrast, we propose LLM-Twin and redesign \(\widetildeDT_i(t)\) in Eq. (1). The cloud LLM model \(\widetildeM_i\) contains the static state information \(S_s,i\) of the physical entities when synchronizing the model, without additional state synchronization, as shown in Fig. 2. Meanwhile, the \(\widetildeM_i\) itself is a virtual network where the DTs can share their information, which means that additional construction is not required. For the history information \(\widetildeH_i\), each physical entity uploads from local to the prompt database via semantic communication, which contains prompt history, real-time state, and shared data.
$$\beginaligned \beginaligned \left\{ \beginarrayl DT_i(t) = \Gamma (M_i, N_i, H_i, S_i, d_i, t)\\ \widetildeDT_i(t) = \Gamma (\widetildeM_i, \widetildeH_i, t) \\ \widetildeM_i = M_i + N_i + S_s,i \\ \widetildeH_i = \text Prompt(u_i,…,u_n) =\sum _i = 1^n(H_i+d_i+S_d,i) \endarray\right. \endaligned \endaligned$$
(1)
In the traditional FL scheme, the DT model is constructed by distributed training. The physical entity \(u_i\) trains local weights using local historical data based on the loss function
$$\beginaligned \beginaligned&L_i(w) = \frac1 \sum _x_j,y_j \in H_i l(w, x_j, y_j)\\&L_g(w) = \frac1\left \sum _i=1^n L_i(w) \endaligned \endaligned$$
(2)
where \(x_j,y_j \in H_i\) is the samples of training data. DT will aggregate the weights of each entity and perform the next round of distributed training to minimize the aggregation loss function \(L_g(w)\), where \(\left| H_g \right| = \sum _i=1^n \left| H_i \right|\) is the total size of data from participating physical entities. Specifically, the local parameter update and the global parameter aggregation are shown in Eq. (3).
$$\beginaligned \beginaligned&w_i(t) = w(t – 1) – \eta \nabla L_i(w(t – 1))\\&w(t) = \frac1\left \sum _i = 1^n H_i w_i(t) \endaligned \endaligned$$
(3)
However, the full-parameter training and multiple rounds of iterations of the above schemes result in significant computation and communication overheads. Therefore, we propose the LLM-Twin with edge fine-tuning based on LoRA39 in Eq. (4), which only requires training a few parameters to get a good alignment for LLM.
$$\beginaligned \max _w \sum _(x,y) \in Z \sum _t=1^y \log \left( P_w + \Delta \widetildew \left( y_t | x, y_<t \right) \right) , \quad \left| \widetildew \right| \ll \left| w \right| \endaligned$$
(4)
The above equation shows the LLM is initialized with weights w and updated with \(w + \Delta \widetildew\) by maximizing the conditional language modeling objective, where \(\widetildew\) is much smaller than w, 1% or less. Furthermore, based on the above preliminary knowledge, we can obtain the time consumption T of the traditional DTN and the time consumption \(\widetildeT\) of the proposed LLM-Twin. We define the CPU cycle frequency of the physical entity \(u_i\) as \(f_u_i\), \(\xi\) denotes the number of CPU cycles required for each data unit when training the model, and \(\alpha\) denotes the number of CPU cycles required for each parameter unit when aggregating the parameters by the DT server. Finally, we get
$$\beginaligned \beginaligned T^cmp_u_i = \frac\xi _i \leftf_u_i \left| w_i \right| , \quad T_g_j^cmp = \frac\alpha _j\sum _i=1^n \leftf_g_j\\ \endaligned \endaligned$$
(5)
$$\beginaligned \beginaligned \widetildeT ^cmp_u_i = \frac S_s,i \rightf_u_i \left| \widetildew_i \right| , \quad \widetildeT_g_j^cmp = \frac \widetildew_i \rightf_g_j, \quad \left| S_s,i \right| \ll \left| H_i \right| , \left| \widetildew_i \right| \ll \left| w_i \right| \endaligned \endaligned$$
(6)
where \(T^cmp_u_i\) and \(T_g_j^cmp\) respectively denote the physical entity computation time and the server computation time for constructing a DTN based on the traditional FL; similarly, \(\widetildeT ^cmp_u_i\) and \(\widetildeT_g_j^cmp\) denote the computation time of LLM-Twin. In addition, to further model the communication consumption, we define the data transmission rate between a physical entity \(u_i\) and a DT server \(g_j\)
$$\beginaligned \beginaligned&r_u_i,g_j = \fracc_u_iC_0B\log (1+\gamma _u_i,g_j) \\&\sum _i=1^n c_u_i + \sum _j=1^m c_g_j \le C_0 \endaligned \endaligned$$
(7)
where \(C_0\) denotes the total number of subchannels over the whole bandwidth W, the number of subchannels allocated to the physical entity \(u_i\) is \(c_u_i\), and \(\gamma _u_i,g_j\) denotes the channel state. Further, we define the intra-twin communication time \(T^intra_u_i\) and the inter-twin comminication time \(T^inter_g_j\) for the FL method in Eq. 8, where \(T^intra_u_i\) contains the data size \(\left| w_i(t) \right|\) for model synchronization and the data size \(\left| S_i \right|\) for entity state synchronization. And then, \(T^inter_g_j\) is modeled as the time required for parameter aggregation in the DT server.
$$\beginaligned \beginaligned T^intra_u_i = \frac\leftr_i, \quad T^inter_g_j = T^cmp_g_j \endaligned \endaligned$$
(8)
Similarly, we get the LLM-Twin communication time
$$\beginaligned \beginaligned&\widetildeT ^intra_u_i = \frac \widetildeS_d,i \rightr_i, \quad \widetildeT ^inter_g_j = \frac\leftr_N_j \\&\widetildeS_d,i = Semantic(S_d,i), \quad \left| \widetildeS_d,i \right| \ll \left| S_d,i \right| \ll \left| S_i \right| , \quad r_N_j \gg r_j \endaligned \endaligned$$
(9)
where \(\left| \widetildeS_d, i \right|\) denotes the state information that needs to be transmitted dynamically, and \(r_N_j\) denotes the transmission rate of the virtual network, such as the parameter aggregation computation rate of the server in the FL method. Obviously, DTNs rely on communication and computation for their construction and maintenance, so communication and computation are the core components for analyzing the efficiency of DTNs. Therefore, we obtain the time consumption \(T_fl\) of the FL-based DTN construction method by counting the total communication time and computation time in Eq. (10).
$$\beginaligned \beginaligned T_fl&= K \left( \max \left\ T^cmp_u_i, i=1,…,n \right\ + T^cmp_g_i + \frac w_i(t+1) \rightr_i\right) + \frac\leftr_i\\&= K\left( \frac H_i(t) \rightf_u_l + \frac\alpha _j \sum _i=1^n\leftf_g_j\right) + \frac H_i(t) \rightr_i \endaligned \endaligned$$
(10)
Similarly, we obtain the time efficiency equation for LLM-Twin
$$\beginaligned \beginaligned T_LLM-Twin&= \frac1\lambda \left( \widetildeT ^cmp_u_i + \widetildeT ^cmp_g_j + \frac r_i \right) + \frac r_i + \widetildeT ^inter_s_j \\&= \frac1\lambda \left( \frac S_s,i \rightf_u_i \left| \widetildew _i(t) \right| + \frac \widetildew _i(t) \rightf_g_j + \frac r_i\right) + \frac \widetildeS _d,i \rightr_i + \frac\leftr_N_j \endaligned \endaligned$$
(11)
where K denotes the number of rounds required for convergence since FL involves multiple iterations. Differently, as mentioned in the previous section, LLM-Twin is not required to update the static information (semantic knowledge base) in real-time, so \(\lambda\) denotes the periodicity of the update of the static information. Finally, according to the following constraints
$$\beginaligned \beginaligned \left\{ \beginarrayl \left| \widetildeS _d,i \right| \ll \left| S_i \right| ,\quad \left| S_s,i \right|< \left| S_i \right| , \quad \left| \widetildew_i \right| \ll \left| w_i \right| \\ \left| S_s,i \right| \left| \widetildew _i \right| – \left| S_i \right| \left| w_i \right|< 0\\ \left| \widetildew _i \right| – \sum _i=1^n\left| w_i \right|< 0\\ (\left| \widetildew _i \right| + \left| \widetildeS _d,i \right| ) – (\left| w_i \right| + \left| S_i \right| ) < 0\\ \endarray\right. \endaligned \endaligned$$
(12)
we will compare the efficiency of LLM-Twin with the traditional FL method. To demonstrate the advantages of LLM-Twin more intuitively, we assume that the FL method completes in just one round of iterations, i.e., \(K=1\). In addition, it is assumed that LLM-Twin synchronizes static information in real-time, i.e., \(\lambda\) = 1. Despite this assumption, we still get that LLM-Twin consumes less time than the FL approach in Eq. (13).
$$\beginaligned \beginaligned T_LLM-Twin – T_fl&= \left( \frac f_u_i\right) + \left( \frac\alpha _j \leftf_g_j \right) \\&+ \frac + \leftr_i + \frac\leftr_N_j \quad < \quad 0 \endaligned \endaligned$$
(13)
From the above mathematical analysis, it can be seen that LLM-Twin is efficient mainly in the following aspects: (1) Instead of synchronizing all state information \(S_i\) in real time, LLM-Twin only needs to synchronize the dynamic semantic information \(\widetildeS _d,i\) in it; (2) LLM-Twin does not need to perform parameter aggregation and iteration of participating entities to accomplish data sharing while training the model, instead, it accomplishes data sharing during model inference by searching the prompt database \(\widetildeH _i(t)\); (3) Instead of training the model with full parameters \(w_i(t)\), LLM-Twin only needs to fine-tune a very few parameters \(\widetildew _i(t)\).
Security analysis of LLM-Twin
Universally composable security analysis
In this section, we formalize LLM-Twin as a service protocol,
Protocol Description |
|
---|---|
Data Upload: |
|
1. Data provider A possesses data (E, P). |
|
2. A performs some local operation to transform (E, P) to \((E’, P’)\). |
|
3. A uploads \((E’, P’)\) to third-party C. |
|
Service Function Computation: |
|
1. Upon receiving \((E’, P’)\), C processes it to obtain F(E, P). |
|
Data Retrieval: |
|
1. Service requestor B sends a request q to C. |
|
2. C computes L(q, F(E, P)), retrieves P, and sends it to B. |
where (E, P) denotes the training data pair, E is the LLM prompt, and P is the corresponding completion. (E, P) is fine-tuned to \((E’, P’)\) by edges and uploaded to C. C denotes the DT server that obtains the DT model F(E, P) by merging the fine-tuning parameters, which can respond to B’s request q, and then respond with the completion P to B. In addition, the protocol has the following three security properties:
Security Properties |
|
---|---|
1. F(E, P) can be used to obtain P if the q corresponding to E is known. |
|
2. Knowing P alone, one cannot retrieve E. |
|
3. F(E, P) and \((E’, P’)\) do not allow the recovery of E or P, ensuring the privacy of the original data. |
It is obvious by the principle of LLM that q as a prompt can obtain the corresponding completion P. From the one-way security in the previous section, as shown in Fig. 4, it is not possible to derive the training-time sensitive data E by completion P, and it is not possible to recover the original E and P from the model and weights. Similarly according to the homomorphic design, as in Fig. 3, the LLM model and the fine-tuned weights can accomplish the service while guaranteeing E and P privacy. Based on the above protocol and security properties, we define the ideal-world LLM-Twin function \(\mathscr F_DATA\).
Ideal Functionality \(\mathscr F_DATA\) |
|
---|---|
Upon receiving \((“\varvecupload”, A, C, E, P)\) from A: |
|
1. Store (E, P) internally. |
|
2. Send \((“\text receipt”, A, C)\) to C. |
|
Upon receiving \((“\varvecrequest”, B, C, q)\) from B: |
|
1. Compute P using E and F(E, P). |
|
2. Send \((“\text response”, B, C, P)\) to B. |
|
If B is corrupted: |
|
1. Upon receiving \((“\text corrupt”, B)\) from \(\mathscr S\), mark B as corrupted. |
|
2. If simulator \(\mathscr S\) issues a \((“\text request”, B, C, q)\), simulate P or altered \(P’\) (if \(\mathscr S\) wants to simulate a cheating B) based on E and F(E, P) and send \((“\text response”, B, C, P’)\) to B. |
In ideal functionality \(\mathscr F_DATA\), A refers to the physical entity that performs edge fine-tuning, C refers to the DT model, and B refers to the service requester, e.g., other DTs. The important point is that \(\mathscr F_DATA\) is security as it is based on a fully trusted and ideal third party. In addition, external environments\(\mathscr Z\) and adversaries \(\mathscr A\) will observe and attack the protocol. As a result, the simulator \(\mathscr S\) is used to simulate the attack behavior of the adversary \(\mathscr A\) in the ideal world and interacts with the external environment \(\mathscr Z\), making it impossible for \(\mathscr Z\) to distinguish between the ideal-world protocols and the real LLM-Twin protocol.
Simulator |
|
---|---|
Simulator \(\mathscr S\) when corrupting A: |
|
1. A’s corruption isn’t significantly impactful because the transformation from (E, P) to \((E’, P’)\) occurs before the data’s interaction with C. The simulator simply follows the honest protocol on behalf of A. |
|
Simulator \(\mathscr S\) when corrupting C: |
|
1. Upon receiving \((E’, P’)\) from adversary \(\mathscr A\): |
|
\(\bullet\) Send \((“\text upload”, A, C, \text dummy\_E, \text dummy\_P)\) to \(\mathscr F_DATA\). |
|
2. When \(\mathscr A\) processes a request from B: |
|
\(\bullet\) Send \((“\text request”, B, C, q)\) to \(\mathscr F_DATA\). |
|
\(\bullet\) Upon receiving P from \(\mathscr F_DATA\), send it to \(\mathscr A\). |
|
Simulator \(\mathscr S\) when corrupting B: |
|
1. Upon B’s corruption, \(\mathscr S\) sends \((“\text corrupt”, B)\) to \(\mathscr F_DATA\). |
|
2. \(\mathscr S\) intercepts q from \(\mathscr A\). |
|
3. \(\mathscr S\) sends \((“\text request”, B, C, q)\) to \(\mathscr F_DATA\). |
|
4. When \(\mathscr F_DATA\) responds with P, \(\mathscr S\) relays this to \(\mathscr A\) or alters the message to \(P’\) if \(\mathscr S\) simulates a cheating B trying to alter the data. |
Based on the above simulator design, for every adversary \(\mathscr A\) in the real world, there exists an ideal-world simulator \(\mathscr S\) and an ideal function \(\mathscr F_DATA\) such that there is no environment \(\mathscr Z\) that can distinguish between the two worlds. Specifically, (1) Neither \(\mathscr A\) in the real world nor \(\mathscr S\) in the ideal world can extract original training data E from interactions involving only P. (2) \(\mathscr S\) can simulate any actions by altering the response to \(\mathscr A\) or by sending modified queries. The ideal world captures any dishonest behavior that B could exhibit in the real world. (3) Given the simulator’s capabilities and the ideal functionality, any environment \(\mathscr Z\) cannot distinguish between the real world. Therefore, the LLM-Twin security framework based on UC is introduced. Since the ideal world is security, and the ideal world is made indistinguishable from the real world by constructing ideal functions and simulators, it is demonstrated that the real-world protocol, i.e. LLM-Twin, is UC security.
Potential threats and countermeasures
In this section, we will first summarize the potential security threats in DTNs. Further, to better demonstrate the security advantages of LLM-twin framework, we will compare the security performance and countermeasures of LLM-twin and mainstream FL-based DTNs. We discuss potential security threats to DTNs from four aspects, data/device heterogeneity threat and single point of failure threat in functional safety; data poisoning, and privacy threat in information security.
-
1.
Data/device heterogeneity threat. The presence of a large number of physical devices with asymmetric performance and heterogeneous distributed data in DTNs will lead to difficulty in the convergence of the twin model training process, inefficient training as well as degradation of model quality.
-
2.
Single point of failure threat. DTNs bridge the physical and information worlds, which means that a single point of failure in a physical device will threaten secure data sharing on the information side, leading to corrupted global models or even widespread error propagation across DTs.
-
3.
Data poisoning. DT is based on a data-driven paradigm and DTNs prompt broader data sharing and global decision making. This means that an attacker can upload poisoned data or perform backdoor attacks by modifying or controlling a participant’s physical client, leading to the generation of low-quality or even malicious global models, which ultimately result in a wide-scale attack impact.
-
4.
Privacy threat. As the most important asset in DTNs, data will face a huge privacy threat. In intra-twin communication, an attacker can eavesdrop on a large amount of interactive data between physical entities and DTs. In inter-twin communication, the attacker can infer or extract the original sensitive data based on the output information of different DTs.
In response to the above threats, we will compare and analyze the security of LLM-twin-based and FL-based DTNs frameworks. First is the data/device heterogeneity threat. Traditional FL algorithms are based on the data assumption of Independent Identical Distribution (IID), which cannot cover the data distributions of all participants when dealing with Non-IID data, which can seriously affect the model performance. In particular, the efficiency of FL training is affected by each participating device, and physical devices with asymmetric performance will significantly reduce the efficiency of global FL training. Differently, the fine-tuning process of each participating device in LLM-twin is independent, where the training and information synchronization of a single participant does not affect other participants. In addition, local fine-tuning allows each participant to implement personalized training and build personalized DTs based on their requirements, which contributes to mitigating the data/device heterogeneity threat.
Against the threats of a single point of failure and data poisoning, LLM-twin has a natural advantage. The fine-tuned knowledge is only “mounted” to the LMs of DTNs without changing the original parameters and weights of the LMs. This means that in the LLM-twin framework, a single point of failure information and poisoning cannot directly affect the knowledge of the global model and cause extensive damage. Differently, the FL-based architecture will aggregate the weights of the participants and update the parameters of the global model at the DT side, which will make it difficult to avoid malicious data from corrupting the global model.
Against privacy threats, we discuss the one-way security and homomorphic security of LLM-twin in “Related works” section, which can effectively mitigate privacy leakage and inference attacks. It is worth mentioning that the FL-based DTNs architecture also achieves privacy protection and enhancement through distributed local training. The difference is that the traditional FL architecture lacks the protection of sensitive information extraction on the DT side. We design a sensitive information protection strategy based on the reversal curse for LLM-twin in terms of one-way security, which mitigates the attacker’s inference attack on sensitive information at the DT side and further enhances privacy protection.
link