DSL-Researches

An Analysis of Information Security in Random Linear Network Coding Network

Vu Loc Duy

Introduction

Network coding, which is first introduced in [1], has received a lot of attention in the recent years. Because of its huge benefit over the traditional store-and-forward network, network coding is expected to bring a a breakthrough to the development of information transmitting technology. Network coding can help transmitting information at a rate that approaches to the theoretical value stated in the famous "max-flow min-cut" theory. This benefit may reduce the network's sophisticated physical infrastructure, and result in a very efficient information transmitting method.

Most of the works related to network coding focus on figuring out the comprehensive coding algorithm to optimize network throughput and robustness [2]. A lot of others made a great contribution in how to construct a feasible network code [3]. Because network coding's core idea is the encoding or mixing input signal at each node, how to decode the encoded signal at the sink node to retrieve the original is not only the problem of construct the coding algorithm but also concerns with the network's structure. For example, network where the transmitting rate exceeds the network's min-cut is said to be infeasible. Regardless of such annoying problems, we are intrigued by the natural information security properties in the network coding.

The first secure network code model is proposed in [4], in which a collection of subsets of the channels in the network is given, and an eavesdropper is allowed to access any one (but not more than one) of these subsets without being able to obtain any information about the message transmitted. The network in [4] can be considered as a network generalization of the idea that a sender has to randomize the message to be transmitted in order to protect it from the eavesdropper. Basing on such idea, many works have been carried out to exploit the inherent security provided by random linear network coding and offers the advantage of reduced overhead in comparison to traditional end-to-end encryption of the entire data. As one of the remarkable contributions, [5] proposed a lightweight security framework for network coding name SPOC (Secure Practical Network Coding), in which, they stated that confidentiality is achieved by protecting (or "locking") the source coefficients required to decode the encoded data, without preventing intermediate nodes from running their standard net- work coding operations. In the same year, [6] succeeded in proving the confidentiality of a random, uniform plain-text message when sending it through RLNC using SPOC framework. They did this by obtaining bounds for mutual information between the payload of packets encoded with RLNC and either the coefficient matrix or the original plain-text. It turns out that these bounds converge to 0 when the field size of the finite field becomes large.

These above works made a great contribution in designing of secure RLNC protocols. However, they only consider the random, uniform input message, which basically does not exist in the real world. Originating from the same idea of randomizing input message, using the lightweight security framework SPOC, our work tended to analyze how secure RLNC is, when we send a message that look like a real plain-text message in the real world. Because the input message in our work is random but not uniform, we first formalize the letter frequency using Cocho/Beta function, then calculate the mutual information between the payload of encoded data and encoding matrix at the source node as having been done in [6].

Abstract

Aiming on analyzing the capability of natural RLNC to secure the source message, we construct a network model in which the encoding matrix are given only to the source and the authenticated sinks. Some capable eavesdroppers, who can access to all the intermediate nodes, try to rob some meaningful information from the source message by firstly deducing the encoding matrix from the encoded data, then using this encoding matrix to decode the encoded data. We consider the source messages which are some kinds of plain-text, whose letter frequency complies with the famous Coho/Beta function. The encoding coefficients, which are the elements of the encoding matrix, the letters in both original data and encoded data are the elements of a predefined finite field. We then try to simulate the characterization of mutual information between the encoded data and the encoding matrix. Mutual information between the encoding matrix and the encoded data is the one that indicates the possibility that the eavesdroppers can deduce the encoding matrix. Although only simulating some finite values of letter bit-number, our result succeeded in predicating the variance of mutual information to letter's bit-number. It follows that, although the natural RLNC method cannot secure the information from some capable eavesdroppers, it may be suitable for some local network that require low standard of security and hight transmitting band-with. Our result also revealed some clues about additional steps needed to improve information security.

Network Model

We model the network as a directed graph G = (V,E) where V is the set of nodes in G and E is the set of edges in G. Every edge in G represents a communication channel with the capacity of one letter per unit time. For every node a in V, there are some incoming channels and outgoing channels. We denote In(a) be the set of incoming channels to node a and Out(a) be the set of outgoing channels from node a. Source node is the only node that there is no incoming channel into it and sink nodes are those that have no outgoing channel. A message need transmitting will be generated at the source node, then be transmitting through the intermediate nodes to prescribed set of sink nodes.

In existing networks, information is transmitted from the source node to a set of sink nodes through a chain of intermediate nodes by a method called stored-and-forward. In this method, incoming signal from an input link is simply stored and a copy is forwarded to the next node via an output link. If an intermediate node is on the transmission paths toward multiple destinations, it sends one copy of the data packets onto each output link that lead to one of the destinations. The term "network coding" was first introduced in [1], and the advantage of network coding over stored-and-forward network was first demonstrated. Network coding allows the node to mix the input signal before outputting it, which allows multi-signals to be transmitted at a meantime and then increase overall network throughput.

Information Security in RLNC Network

Coding over a randomized network coding was firstly described in [7], where the linear coefficients for each link are independently and randomly chosen from the some finite field. The sinks only need to know the overall linear combination of source processes in each of their incoming signals. However, the random network coding is only proved to be feasible in [8]. Reference [8] stated that, when the number of elements in the finite field become large, the probability that the sinks can decode approaches to 1. Random Linear Network Coding Network (RLNC) brought out many ideas of improving information security by utilizing its characteristic. Because, incoming signals are independently, randomly encoded, it is very difficult for eavesdroppers to decode.

In this work, we adopt the model proposed in [6], where some eavesdroppers can access to all intermediate links. They try to obtain x by firstly deducing encoding matrix A then decoding y to get x. Therefore, to assure the network information security, we need to protect the encoding matrix A. We do not know exactly how the eavesdropper can deduce A from y, we just concern about the potentiality that they can deduce A. In the information theory, there is a function to evaluate this potentiality, which is called mutual information. The information in RLNC network is secure if mutual information between the encoding matrix A and the encoded data y is 0.

Problem Formulation

In most of computer network, sending plain-text information such as email, newspaper contents, etc is very popular, and securing such kinds of information turns out to be very important. Although some related works already considered this problem [6],[9], unfortunately, they neglected the non-uniformity of input letter frequency. In our research, we realized the importance of inspecting letter frequency, which has not been done before in the works related to RLNC network.

In this research, we consider the case when the original message is kind of an English message, for example, an English email, a speech, an English textbook content, ect. The frequency of each letter varies among different letters. The frequency of letters in text has often been studied for use in cryptography. Accurate average letter frequencies can only be gleaned by analyzing large amount of representative text [10]. In [11], they used the historical U.S. and Mexican presidential speeches to examine various functional forms of fitting rank-frequency distribution of letters, ranging from simple to more complicated ones with two or three free parameters. As one the results, Cocho/Beta function is said to be the best fitting function of letter frequency. Cocho/Beta function is not only used for English letter but also for some other one such as Spain, Mexican.

In this research, we consider the input text message, whose letter frequency complies with Cocho/Beta function.

Simulation Result

The mutual information does not converge to 0 but to a value around 0.56, which means the confidentiality is not guaranteed when we send a message with letter frequency complying with Cocho/Beta function. It is, in fact, unfair to state that transmitting with RLNC network is less secure than other method, because right now we will see the percentage of information leaked to eavesdroppers, and notice that this value dramatically drop to 0. As a further evaluation, we also forecast simple RLNC network could not guarantee information security with input data is not non-uniform. This result does not mean the inapplicability of RLNC network, but it once more affirms the necessity of adding some stricter conditions in order to restrict the activity of eavesdroppers. In this work, the conditions are very weak, because we assumed the eavesdroppers can access to all the intermediate nodes to steal encoded data. The RLNC network is then suitable for some local networks that require large band-width rather than a perfect security. Moreover, in the real world, there are some secret links that only the sender and the authorized receivers can access. Such links are usually in charge of transmitting secret key or information that are very important to decode data. We can also make meaningful information look chaos by adding meaningless, uniform, random, mutually independent information into it. Some are also thinking of applying a lightweight cryptography method to encrypt the data before transmitting [12],[13]. Although this method reduces the overall network throughput, it is one of our considerations in the future works.

Reference

[1] R. Ahlswede, N. Cai, S. Li, and R. Yeung, "Network information flow," IEEE Trans. Inform. Theory, vol. 46, no. 4, pp. 1204-1211, 2000.

[2] R. Koetter and M. Medard, "An algebraic approach to network coding," IEEE/ACM Trans. Netw., vol. 11, no. 5, pp. 782-795, 2003.

[3] T. Ho, M. Medard, R. Koetter, D. Karger, M. Effros, J. Shi, and B. Leong, "A random linear network coding approach to multicast," IEEE Trans. Inform. Theory, vol. 52, no. 10, pp. 4413-4430, 2006.

[4] N. Cai and R. Yeung, "Secure network coding," Proc. 2002 IEEE Int. Symp. Inf. Theory, 2002.

[5] J. P. Vilela, L. Lima, and J. Barros, "Lightweight security for network coding," Proc. of the IEEE Int. Conf. Com. (ICC 2008), 2008.

[6] L. Lima, J. Vilela, J. Barros, and M. Medard, "An information-theoretic cryptanalysis of network coding - is protecting the code enough," EEE Int. Symp. Inform. Theory. App, Auckland, New Zealand, 2008.

[7] T. Ho, D. Karger, M. Medard, and R. Koetter, "Network coding from a network flow perspective," Proceedings of the 2003 IEEE Int. Symp. Inform. Theory, 2003.

[8] T. Ho, M. Medard, J. Shi, M. Effros, and D. Karger, "On randomized network coding," Proceedings of 41st Annual Allerton Conference on Communication, Control, and Computing, 2003.

[9] L. Lima, M. Medard, and J. Barros, "Random linear network coding: A free cipher," IEEE Int. Symp. Inform. Theory, Nice, France, June 2007, pp. 546-550, 2007.

[10] T. Wall and L. Smithline, "English letter frequency table," [Online]. Available: http://www.math.cornell.edu/ mec/2003- 2004/cryptography/subs/frequencies.html, 2014.

[11] W. Li and P. Miramontes, "Fitting ranked english and spanish letter frequency distribution in u.s and mexican presidential speeches," Journal of Quantitative Linguistics, vol. 18, no. 4, pp. 359-380, 2011.

[12] E. Gabidulin, N. Pilipchuk, B. Honary, and H. Rashwan, "Information security in a random linear network coding network," Prob. Inform. Trans, vol. 49, no. 2, pp. 179-191, 2013.

[13] P. Zhang, Y. Jiang, C. Lin, Y. Fan, and X. Shen, "P-coding: Secure network coding against eavesdropping attacks," EEE Infocom. IEEE, 2010, pp. 2249-2257, 2010.

March 07, 2014