OpenStack is a popular open-source cloud management software adopted by many enterprises to deploy a small to medium size cloud computing. When building a cloud data center, system manager has to consider how to build the network.
For a tiny scale cloud for private usage, it is trivial to create a simple single network for all traffics regardless of the traffic characteristic. Whereas, larger scale cloud for multi tenants (e.g., in university or a small company) needs to consider more aspects, mostly security. System administrator does not want to let their tenants to access the physical infrastructure through a public intranet. Physical infrastructure has to be hidden for security reasons. In addition to the security, the network performance and availability have to be considered to design the network. In this regard, a larger scale cloud data center will adopt multiple networks for different purposes, such as management network, data network, tenant network, etc.
< Terms and Definitions >
Many documents on Internet suggest and explain network architecture with different purposes, but they are somewhat confusing and easy to be misunderstanding. In this article, let me clarify the terms of different networks used in many documents regarding OpenStack and possibly general datacenter networks.
1. Management network vs tenant network: used by whom?
Let's think about 'who' will use the network. Is it used by system administrator of OpenStack, or the tenants who want to use the VMs? If the network is used by system administrator who manages the data center, it is called 'management network'. Common usage of management network is to access each physical node to configure and maintain the OpenStack components such as Nova or Neutron.
On the other hand, 'tenant network' is used by the cloud tenants. Main traffic of tenant network will be between VMs and from outside to the VM used by end-users and the tenant. Note that this is regardless of 'internal' or 'external' network. We will discuss about the difference between tenants network and internal network later.
2. Control network vs data network: used for what?
Very similar terms compared to 'management' and 'tenants', but these terms focus more on 'purpose' of the network, rather than users. Control network is for control packets, while data network is for data. Think about an image data traffic from the Glance node to a compute node, or a VM migration between two compute nodes. They are for management, but also they are data traffic rather than a controlling traffic.
Control traffic will be a controlling command like 'send ImageA from Glance node to Compute1 node' instructed to the Glance and Compute1 nodes sent by the controller, while the actual image data from Glance to Compute1 will be a data traffic. Although both are originated from the cloud manager for management, the characteristic of the network traffic is different.
3. Internal vs external: IP address range or VM network?
These are more general terms used in common which is why it's often easy to get confused. In a general network term, internal network is a network with private IP address ranges (192.168.x.x or 10.x.x.x or 172.x.x.x, ...), whereas external network has an external IP address range so that they are open to public.
However, in cloud computing, they can have slightly different meaning. Internal network is a network for VMs that can be accessed from only inside of the VM network, while external network is a network that are exposed to outside of the VM network. This can be very confused especially for a data center built in intranet. Intranet itself already uses their private IP address which will be the 'external' network of the VMs in the data center.
For example, a company uses 192.168.x.x range for their intranet. All desktops and laptops use this IP address. One employee of this company (a tenant) creates a VM from the internal cloud. This VM should be able to get connected through the intranet IP range (192.168.x.x). Although this is internal network for the company, it can be the external network for VMs at the same time.
4. Physical vs virtual network.
Physical network is for physical nodes to communicate between the controller and other nodes, while virtual network is for VMs. This is a high-level term, so that all the virtual network will use the underneath physical network in reality. We separate them only to differentiate on a network layer.
5. Physical layer vs network layer.
The most common confusing concept in networking comes from the separation of layers. So far, we talked about networks in "network layer", a.k.a. "IP layer", not about a physical or link layer. Note that all the aforementioned networks can be served through a single physical medium by differentiating IP range with a proper router setting in the host.
6. Other terms: API, provider, guest, storage, logical, flat, etc, etc, etc network...
Depending on the article, there are so many terms used in various ways. Let's skip the explanation of each term, and look at the example to discuss about the network architecture.
< Example network architecture >
Let's build a data center with different networks. This data center is connected to both Internet and intranet directly, which means VMs can be accessed from both Internet and the intranet.
* Internet IP range: 123.4.x.x
* Intranet IP range: 172.16.x.x
These two networks are called "public", "provider", or "external" network, which means they are exposed to outside of the data center. VM can acquire either, or both of the IP address to let everyone all over the world (via internet) or within a company (via intranet) to connect to the VM.
* Provider physical network: External IP range only for VM (none for physical nodes)
This is not a IP network, but L2 network to connect between the VM's L3 to the external network. VMs will acquire an Internet or intranet IP address from a network node or external DHCP server and use this physical network to communicate to Internet or Intranet.
Note that the physical interface in the compute nodes will not get any IP address for this network. They only provide L2 service to connect between the hosted VMs and the external network. Thus, tenants cannot access to the physical nodes through this network, although it is exposed to the outside directly. In Neutron, this is configured with 'flat' type.
* Management physical network (Control network): 192.168.100.x
Within a data center, all physical nodes (controller, network, compute, Glance, etc nodes) are connected to this network for system management. Only the data center manager can access to this network.
* Virtual network using VxLAN: any IP range
This is a self-management network created/managed by a tenant. It's a virtual network using tunneling protocol such as VxLAN, thus any IP range can be assigned by a tenant. Note that this network can be accessed only within a VM in a same virtual network. Physical network used by these traffic can be either combined with the physical management network or separated as a tenant data network.
* Tenant data network (optional): 192.168.200.x
For virtual network traffic between VMs, we can set up a separate physical network only for tenant's data distinguished from the management network. If it's not physically separated, the virtual network traffic between VMs shares the management network.
* Management data network (optional)
If you want to separate a data traffic for management purpose from the control command traffic, it is also possible to use a different physical network for that purpose. In most cases, these control data traffics will share the physical management network including in this example (192.168.100.x). Sometimes it can be configured to use the tenant data network (192.168.200.x) for a specific reason including VM migration or image data separation.
* Storage data network (optional)
For further separation, storage data network can be physically separated to provide a stable and high-performance object or volume storage service. If it is not separated, it can be used by any of the data network or the physical management network.
In this example, we only consider the tenant data network. The other optional network
< Physical interfaces for the example >
* Router/DHCP server: provides connectivity between Internet, intranet and the data center network.
-eth0 - to outside Internet (public)
-eth1 - to outside intranet (public)
-eth2/3 - to Provider physical network for Internet (123.4.0.1) and intranet (172.16.0.1): GW for VMs
-eth4 - to Control network (192.168.100.1): NAT for control network and provide public access to the tenant's UI from the controller node.
In this configuration, eth2 and eth3 are both connected to the same physical network. Router needs more sophisticated settings for firewall and routing rules for more security. Basically it connects four different physical networks in this scenario: two public networks, provider network, and the secret control network. Forwarding rules should be set up to allow only incoming traffic from external (eth0/1) to reach VMs on eth2/3 or the controller node eth4. The outgoing traffic from eth2/3/4 can freely access to the Internet.
Note that eth2/3 can be detached from the router if you want to make Neutron to control them. In that case, an external network (123.4.x.x/172.16.x.x) are connected to only network node running Neutron L3 routing agent, and Neutron provides DHCP and routing for VMs.
* Controller node
-eth0 - Control network (192.168.100.x)
-eth1 - Internet/intranet through router to provide UIs.
Receives a VM creation request or any other requests from tenants through horizon, and sends the command to other physical nodes by calling their APIs through eth0 control network.
* Network node
-eth0 - Control network (192.168.100.x)
-eth1 - Tenant data network (192.168.200.x): manages virtual tunnel networks
-eth2 - Tenant provider network (No IP): manages
Network node is in charge of managing virtual network, e.g. providing DHCP for VxLAN, L3 routing between different VxLANs and controlling 'public' access to VMs.
* Compute nodes
-eth0 - Control network (192.168.100.x)
-eth1 - Tenant data network (192.168.200.x): VM-to-VM intra-DCN traffic, e.g. VxLAN
-eth2 - Tenant provider network (No IP): VM's external connectivity to Internet or intranet
Nova API is listening on eth0 to get any command from the controller. VMs hosted in the compute node use eth1 for intra-DCN traffic sending to another VM using the 'virtual network' IP range. These packets are encapsulated in VxLAN or other tunneling protocol and sent to another compute node using tenant data network. Note that 192.168.200.x is assigned only for physical compute nodes to find another compute node hosting the destination VM. Thus, 192.168.200.x cannot be accessible by VMs. VMs can access only other VMs because all the packets sent from VMs to eth1 are encapsulated into a tunneling protocol.
On the other hand, eth2 is directly connected to VMs without any tunneling protocol which makes it so called 'flat' network. Provider network is connected to VMs using this interface. When a VM needs to get an access to Internet, Neutron controller creates a virtual interface within a VM, which can acquire a public IP address from the router or Neutron. For a floating IP, the virtual interface is created within a Neutron controller that forwards the traffic to the assigned VM through this network.
< Conclusion >
There are so many terms that may confuse people. In this article I was trying to explain them as easy as possible, but they are still a bit confusing. The most important and easy to get confused is the mixed understanding of networks in different layers. Whenever you think of network architecture, you must separate the concept based on a network layer. Do not mix a L3 (IP) layer with an underneath L2 or physical layer. On a single physical layer, multiple networks can be operating in higher layer. Think about your desktop computer. It has only one IP address (L3) and MAC address (L2), but it can run so many applications using TCP/UDP port numbers. With a same reason, multiple IP network can exist in a same ethernet. You can assign many IP addresses to one NIC interface on Windows or Linux. If you can separate them and think more logically considering the features I discussed above, it won't be so confusing about what others talk about.
For more information, look at this OpenStack document. It is a bit outdated, but provides much more information than a recent document.
https://docs.openstack.org/liberty/networking-guide/deploy.html