Tuesday, 27 November 2018

Android Battery Drain issue - How to dig and find the root cause?

Mobile phones is getting more and more powerful silicons and processors, which causes more and more issues on battery management. It is unavoidable to consume more electric power for higher processing power. In this article, I'll investigate and present the basic concept of Android mobile phones and how to dig the battery drain issue. Simple answer that we can find in the internet would be : disable GPS/Location/Bluetooth/WiFi/4G/NFC/etc services, which somehow does not make much sense. You possess the powerful smartphone, and of course you want to maximise the usage of the expensive toy.

1. Screen on
Obviously, screen is the most energy-consuming components of any mobile devices. Longer the screen is on, higher power consumption rate you will see. The only possible way to reduce power consumption for screen-on is to reduce the brightness. Full brightness can consume 7 times more energy than the lowest brightness. Therefore, it is advisable to disable the automatic brightness and adjust it manually not exceeding certain brightness.

2. Screen off / sleep / deep sleep
It is obvious that battery is draining a lot during screen-on. More common issue that caused me chaotic was when my battery drained super fast even when the screen was off. I checked battery stats and all other settings in the system, but the Android built-in battery status doesn't provide much detail. I suspect Google Play Services consumes a lot of battery, but I cannot find what apps use what kind of Services.

For convenience, it is called "sleep mode" when the screen is off. Once the screen is off, the Android system is into sleep mode. Furthermore, if the sleep mode lasts long and no background app is running for certain time, Android is falling into deep-sleep mode. During deep-sleep mode, it uses minimum battery close to 0%.

For most cases, the battery drain issue happens when the phone is only in sleep mode, but cannot fall into the deep sleep mode. Your screen is on for 1~2 hours but the battery lasts only for 8~9 hours, means that some apps repeatedly wake up the phone in sleep mode on the background.

The easiest way to find out how long your phone was sleeping deeply is to use 3rd party battery status app, one of: AccuBattery, GSam Battery Monitor, or BetterBatteryStats (BBS).
Especially, GSam and BBS are very useful to detect which app keeps waking up your Android in sleep mode.

Install one of the 3 battery status apps (just one) and check the deep-sleep time and percentage. If your deep sleep time is 90%+, your battery will last for loooooong time. Above 50% would be normal if you install any social media apps, because they will wake your phone up periodically to check new updates.

3. Wakelock
Wakelock is a proper term in Android used by apps to wake up the mobile. If an app request Wakelock, then Android system will be awake from sleep mode and check whatever the app requested. Therefore, our target to prevent battery draining is how to reduce the number of wakelocks.

If you installed GSam or BBS properly with all permissions given with 'adb' command, it can show the detailed information of Wakelocks. In BBS, check "Partial lock" part to see which apps send Wakelocks. In GSam, go to app usage page and check each app. In the detail page, it shows Wakelock details which tells you what app makes the device awake from sleeping.

Once you found the apps with this method, there are plenty of options you can choose. For advanced users, you can probably set to ignore Wakelock from certain apps, which might cause some issues but increase the battery lasting time. Or, just uninstall the app. Or, disable permission to specific service from the app.

4. My Usecase
First of all, I installed AccuBattery to check overall battery consumption speed. It showed my battery consumption is about 15~30%/hour when screen is on, and 2~3%/hour when screen is off. However, from some time, the battery consumption for battery off increased to 7~8%/hour. I have to find the reason.

I then installed GSam to check the detailed usage of battery by each app, as well as to see which app keeps waking up the system. I found that if GSam and AccuBattery are installed at the same time, they interfere each other which somewhat result in faulty measurement. Basically those battery status apps themselves drain quite large amount of the battery, so it's not a good idea to keep both running.

In GSam, I checked the detailed of Google Play Services, and location service is called excessive number of times. Yes, those general advice is correct, that we have to turn off the location service to save the battery! However, I don't want to loose all the benefits that Google brings to me with the location service. Instead of turning Location Service off, I changed the GPS settings to 'Battery Saving' mode which uses only mobile network and WiFi information without using GPS. With this option, location service will drain much less energy.

Next, I disabled "Ok Google" voice recognition as I seldom use the voice command. Note that if you enable "Ok Google" feature, it always keeps turn on the microphone to detect the voice, which prevents the mobile falling into deep sleep. Maybe better option is to detect your voice only when the screen is on. Anyway, in my case, I disabled even for screen on. (Settings -> Google -> Search, Assistance & Voice -> Voice -> Voice Match)

In fact, there are a lot of settings in Google which can have a big impact on the battery drain. The biggest one would be location services, but other settings should be also carefully reviewed. If you don't think you're using those services, just disable it to save more battery. It's critical part.

Lastly, I found my WeChat also drains quite big amount of energy. Weirdly, it keeps accessing to the step-counter sensor, which should be used only by Google Fit or Samsung Health app. I dig into WeChat settings and found that a feature called "WeChat Run" is enabled. This app-in-app is counting the walking steps and comparing with my friends.I disable this feature, along with other features in the app.

Overall, those internet advice is correct. Disable GPS/Location/etc/etc will save the battery, and my final solution was in fact similar. However, the huge difference in this article is that you KNOW what app makes the battery draining before you disable or uninstall it. As different users have different pattern of mobile usage, it is critical to know what exactly causes the battery issue IN YOUR MOBILE.

For further information, please read these guides and manuals:
https://forum.xda-developers.com/galaxy-s8/help/guide-hunting-wakelocks-battery-drain-t3697324
http://blogger.gsamlabs.com/2011/11/badass-battery-monitor-users-guide.html
https://pdfs.semanticscholar.org/f098/ce049f0d537fac404fd45a58c9f8f4c0bca8.pdf

Friday, 12 October 2018

Windows 10 1709 (Oct 2018 ver.) migration from MBR HDD to GPT SSD (or another HDD)

You purchase a new lightening-fast SSD or super-sized HDD, and it's time to copy your Windows system to the new storage!

If your original HDD and new SSD are both partitioned with GPT, simply use a free version partition management software from EaseUS or Minitool. Free versions still support cloning a partition from GPT to GPT, or MBR to MBR.

However, if your old disk uses MBR and want to get benefits of the fancy GPT partition on your new disk, it can bring some headache. The easiest way is to purchase a "Pro" version of the previously mentioned commercialized software. Only paid versions support migrating (or cloning) a system partition from MBR to GPT.

If your budget is limited and you do not want to pay $$ for one-time system migration, then this is the article for you! There're simple steps.


1. Attach the new drive, detach the old drive, and create EFI partition on the new drive.

Boot with the Windows 10 installation media, and clean-install Windows on your new drive with GPT partition. This will create an EFI partition automatically on your new drive which allow to boot on UEFI/GPT mode.

Instead of installing the new Windows, you can simply try to manually partition the new drive for Recovery (500MB, NTFS, OEM Partition), EFI (100MB, FAT32, EFI), and Windows (NTFS). EFI partition will be used to install Windows booting information after cloning the old windows to the new windows partition.


2. Attach the old drive and clone the old Windows NTFS partition.

The easiest way is to boot with the old windows, install EaseUS Todo Backup free version, and clone the old drive (C:) to the newly created Windows partition of the new drive (overwrite). You don't need to use "System clone" which is supported only by paid software. Just use "data clone" function in the free version.
In fact, this will ruin the clean-installed windows on the new drive because you overwrite the windows partition with the old system. You cannot boot with the copied partition YET.

You can do this with Clonezilla, ntfsclone, Minitool, or any other disk cloning software.

3.Detach the old drive again, and recover the booting of the new drive.

Once the cloning is completed, it's time to recover the booting from the newly copied volume on the new drive. When you tried to boot with the new drive, it'll simply throw an error message.

Boot with the Windows Installation Media USB, select Repair, go to Advanced, and open Command Prompt. The prompt shows up with X:\ as a default drive.

Type these commands to assign a drive letter for EFI partition:
X:\> diskpart
list disk
sel disk 0  // if 0 is the new drive
list vol
sel vol 3 // if 3 is the EFI partition with 100MB FAT32
assign letter=G:
exit

Then, format G with the following command:
X:\> format G: /fs:FAT32

Now, you have to install EFI partition with a new boot for the cloned Windows partition.
Type the following command:
X:\> bcdboot C:\windows /s G: /f UEFI

Now your EFI partition is recovered with the cloned Windows.
Unplug the USB and reboot! Windows should be able to boot without any problem.

Once Windows is working well with the new drive, you can attach the old drive if you want to use it for other purposes.

Wednesday, 3 October 2018

Dell Optiplex 9020 SFF Upgrade Guide: Graphic Card (e.g. GTX 1050Ti) and SSD (Hard Drive)

Dell Optiplex mainboard layout from the service manual:

(Image source: https://topics-cdn.dell.com/pdf/optiplex-9020-desktop_owners-manual_en-us.pdf)

As you can see from the above layout, the mainboard has 2 PCI Express slots for powerful graphic cards, as well as 3 SATA connectors for additional SSD/HDD device.

1. Graphic Card

You can install up-to-date Low Profile graphic card to 9020 SFF machine, e.g. GTX 1050Ti 4GB LP.
However, note that only 4x PCI Express slot can be used for 1050Ti models because of the fan on the graphic card (16x PCI-E slot will be wasted). All 1050Ti LP products have a large fan and heat-sink taking an additional slot, which is the the 16x slot on 9020 SFF. Thus, if you want to use GTX 1050Ti, it is unavoidable to give up using additional device connecting to PCI-E slot. Of course if the new graphic card takes only one slot, both PCI-E slots can be exploited.

Regarding the performance for 4x PCI-E, compared to 16x PCI-E, most people on the internet say there will be little or no performance difference for 1050Ti class graphic card. Of course there will be quite a lot of difference for a higher graphic cards, such as GTX 1080, but 1050Ti is not powerful enough to get benefit from 16x PCI-E, meaning 4x PCI-E is fast enough to transfer all data between GPU and CPU.

I personally purchased and tried Gigabyte GTX 1050Ti OC LP (4GB) model on my 9020 SFF for 3 months so far, which is working perfectly without any issue. I haven't experienced any frozen or blue screen for this time even when I was running pretty heavy 3D games. Of course, don't expect to run the most recent 3D games in the highest configuration, which probably needs 1080Ti + 8th Gen i7 processor.

2. SSD/HDD

In most cases, the default 9020 SFF configuration comes with 1xDVD-RW drive and 1xHDD(or SSD), which are connected to two SATA data connectors (No 12 in the above photo), and one power connector (No 11). If so, there is one extra SATA data connector left blank (No 12 has 3 ports) which can be used for an additional SSD. Note that, if you want to 'replace' (no 'add') your SSD or HDD, simply remove the current one and insert a new one. The only tricky part will be mounting the drive to the chassis, which might need to use a sticky tape or 3rd party cradle.

If you want to install an new SSD in addition to the old drive, 2 more cables are necessary:

1) SATA power splitter (15pin) Y cable: one 15-pin male (that connects to the current cable's female part) and two 15-pin females (connect to the old drive and the new SSD).
(Image: Amazon)

2) SATA data cable
(Image: Amazon)

I found a YouTube video that made a custom power cable by soldering cables to the original cable, but it's totally unnecessary if you can get the power splitter cable. Do NOT buy any cable with the old 4-pin connectors. There are a lot of places you can get 15-pin SATA one male to 2 female power cables (from $1 on eBay..).

Once you have a new SSD drive and the two extra cables, connect the power splitter cable to the new and old drives and the data cable to the drive and the empty port on the main board (No 12).

In order to attach the new SSD to the chassis, you will need to use a sticky tape or purchase a 3rd party cradle. If you don't need the DVD-RW drive, it's even possible to remove the DVD-RW and use the space to put the SSD. In this case, up to 3 SSD/HDDs can be installed in total.

Conclusion

Dell OptiPlex 9020 SFF is such a small desktop provides full of performance and expandability. It comes with 4th Generation i5 or i7 CPU, Intel Q87 main chipset, 4x DDR3 1600 memory slots,  2x PCI-E slots, and 3xSATA ports.

If you're lucky enough to get this incredible machine at a cheap price (2nd-hand at around US$ 150-200), it's easy to upgrade the basic machine to a powerful Windows game platform by simply adding a good graphic card, extra SSD, and memory (if the basic machine has less RAM installed).
Enjoy any 3D game with this cheap and small machine!

Wednesday, 29 November 2017

OpenStack shrink image virtual disk size

When OpenStack creates a snapshot, the image is stored as qcow2 format with Glance, in the directory of /var/lib/glance/images/.

However, sometimes the virtual disk size of the image exceeds the disk size of a flavor, even for the same type of flavor of the original VM instance. This is because the created snapshot includes more than the main partition (/dev/sda1, /dev/sda2, ...) so that the total amount of disk exceeds the disk size of the original flavor.

Admin can change the size of these disk images using guestfish and virt-resize tools in guestfs library. Expanding disk size is easy, but shrinking is a bit more complicated. Follow this guideline to shrink the virtual size of OpenStack image files, or any image supported by guestfs.


1. Check the image detail

$ qemu-img info ./disk.img
file format: qcow2
virtual size: 11G (11811160064 bytes)
disk size: 2.0G
cluster_size: 65536

=> This image needs 11 GB disk size of a flavor, while the actual disk size is only 2.0G.
We will change the virtual size of 11 G to 5 G, so that it can be deployed in a smaller VM flavor.

2. Install guestfs-tools and lvm2

$ yum install lvm2 lib libguestfs-tools

If you're running this on a dedicated Glance server without libvirt, change the setting to bypass libvirt back-end:

export LIBGUESTFS_BACKEND=direct

3. Check the detailed partition info of the image.

$ virt-filesystems --long --parts --blkdevs -h -a ./disk.img
Name       Type       MBR  Size  Parent
/dev/sda1  partition  83   10G   /dev/sda
/dev/sda2  partition  83   512M  /dev/sda
/dev/sda   device     -    11G   -

$ virt-df ./disk.img
Filesystem            1K-blocks       Used  Available  Use%
disk.img:/dev/sda1      5016960    1616912    3383664   33%

=> /dev/sda1 is the main partition and used only ~2 GB. This partition will be resized to 4G, which will make the total size of the image 5G. 

Note that the size of main partition must be less than the intended size of the image even if there is only one partition. For example, if you want to make 5G image, the size of /dev/sda1 must be less than 5G, like 4G or 3.5G. 

4. Use guestfish for resizing

It is good idea to make a back up before proceeding, because the following commands may harm the image partition.

$ guestfish -a ./disk.img
><fs> run
><fs> list-filesystems
...
><fs> e2fsck-f /dev/sda1
><fs> resize2fs-size /dev/sda1 4G
><fs> exit

=> This changes only metadata, and does not apply to the actual image data yet.
=> Again, the size of /dev/sda1 must be smaller than the intended size of the entire image. We set 4G here, thus the final image can be 5G.

5. Use virt-resize for actual resizing.

First, you have to create an empty image to copy the shrunk image. Please specify the size of the entire image here.

$ qemu-img create -f qcow2 -o preallocation=metadata newdisk.qcow2 5G

Once the new image file is created, use virt-resize command to copy the data from old file to new image file.

$ virt-resize --shrink /dev/sda1 ./disk.img ./newdisk.qcow2

Now, you have a new image file with the same content of the old image. /dev/sda1 will be shrunk as well as extended to fill up the empty size of the newdisk.qcow2 image file. In this example, /dev/sda1 will become 4.5G at the end, because the rest of 5G image will be occupied by this partition.


6. Check the size and partition information of the new image.


$ qemu-img info ./newdisk.qcow2
file format: qcow2
virtual size: 5.0G
disk size: 1.9G
cluster_size: 65536

$ virt-filesystems --long --parts --blkdevs -h -a ./newdisk.qcow2
Name       Type       MBR  Size  Parent
/dev/sda1  partition  83   4.5G  /dev/sda
/dev/sda2  partition  83   512M  /dev/sda
/dev/sda   device     -    5.0G  -

$ virt-df ./newdisk.qcow2
Filesystem              1K-blocks       Used  Available  Use%
newdisk.qcow2:/dev/sda1   5522976    1616912    3889680   30%

7. Shrink the image file size

If the image file is too big, it is possible to reduce the size of the disk by using qemu-img util.

$ qemu-img convert -O qcow2 ./newdisk.qcow ./newdisk2.qcow


8. Upload the new image to OpenStack

The new image file cannot be recognised by OpenStack if you just swap the image file. It will generate an error message "Not authorized for image ....." when you create a VM with the image.

Instead of swapping the image file, use OpenStack command to register the new image.
$ openstack image create --disk-format qcow2 --container-format bare --private --file ./newdisk.qcow2 My_Image_Shrunk

=> This command will add the new image file as "My_Image_Shrunk" image, which can be used to create a new instance.


Saturday, 11 November 2017

Understanding OpenStack networking architecture concept: Easy description.

OpenStack is a popular open-source cloud management software adopted by many enterprises to deploy a small to medium size cloud computing. When building a cloud data center, system manager has to consider how to build the network.

For a tiny scale cloud for private usage, it is trivial to create a simple single network for all traffics regardless of the traffic characteristic. Whereas, larger scale cloud for multi tenants (e.g., in university or a small company) needs to consider more aspects, mostly security. System administrator does not want to let their tenants to access the physical infrastructure through a public intranet. Physical infrastructure has to be hidden for security reasons. In addition to the security, the network performance and availability have to be considered to design the network. In this regard, a larger scale cloud data center will adopt multiple networks for different purposes, such as management network, data network, tenant network, etc.


< Terms and Definitions >


Many documents on Internet suggest and explain network architecture with different purposes, but they are somewhat confusing and easy to be misunderstanding. In this article, let me clarify the terms of different networks used in many documents regarding OpenStack and possibly general datacenter networks.

1. Management network vs tenant network: used by whom?

Let's think about 'who' will use the network. Is it used by system administrator of OpenStack, or the tenants who want to use the VMs? If the network is used by system administrator who manages the data center, it is called 'management network'. Common usage of management network is to access each physical node to configure and maintain the OpenStack components such as Nova or Neutron.

On the other hand, 'tenant network' is used by the cloud tenants. Main traffic of tenant network will be between VMs and from outside to the VM used by end-users and the tenant. Note that this is regardless of 'internal' or 'external' network. We will discuss about the difference between tenants network and internal network later.

2. Control network vs data network: used for what?

Very similar terms compared to 'management' and 'tenants', but these terms focus more on 'purpose' of the network, rather than users. Control network is for control packets, while data network is for data. Think about an image data traffic from the Glance node to a compute node, or a VM migration between two compute nodes. They are for management, but also they are data traffic rather than a controlling traffic.

Control traffic will be a controlling command like 'send ImageA from Glance node to Compute1 node' instructed to the Glance and Compute1 nodes sent by the controller, while the actual image data from Glance to Compute1 will be a data traffic. Although both are originated from the cloud manager for management, the characteristic of the network traffic is different.

3. Internal vs external: IP address range or VM network?

These are more general terms used in common which is why it's often easy to get confused. In a general network term, internal network is a network with private IP address ranges (192.168.x.x or 10.x.x.x or 172.x.x.x, ...), whereas external network has an external IP address range so that they are open to public.

However, in cloud computing, they can have slightly different meaning. Internal network is a network for VMs that can be accessed from only inside of the VM network, while external network is a network that are exposed to outside of the VM network. This can be very confused especially for a data center built in intranet. Intranet itself already uses their private IP address which will be the 'external' network of the VMs in the data center.

For example, a company uses 192.168.x.x range for their intranet. All desktops and laptops use this IP address. One employee of this company (a tenant) creates a VM from the internal cloud. This VM should be able to get connected through the intranet IP range (192.168.x.x). Although this is internal network for the company, it can be the external network for VMs at the same time.

4. Physical vs virtual network.

Physical network is for physical nodes to communicate between the controller and other nodes, while virtual network is for VMs. This is a high-level term, so that all the virtual network will use the underneath physical network in reality. We separate them only to differentiate on a network layer.

5. Physical layer vs network layer.

The most common confusing concept in networking comes from the separation of layers. So far, we talked about networks in "network layer", a.k.a. "IP layer", not about a physical or link layer. Note that all the aforementioned networks can be served through a single physical medium by differentiating IP range with a proper router setting in the host.


6. Other terms: API, provider, guest, storage, logical, flat, etc, etc, etc network...

Depending on the article, there are so many terms used in various ways. Let's skip the explanation of each term, and look at the example to discuss about the network architecture.


< Example network architecture >

Let's build a data center with different networks. This data center is connected to both Internet and intranet directly, which means VMs can be accessed from both Internet and the intranet.

* Internet IP range: 123.4.x.x
* Intranet IP range: 172.16.x.x

These two networks are called "public", "provider", or "external" network, which means they are exposed to outside of the data center. VM can acquire either, or both of the IP address to let everyone all over the world (via internet) or within a company (via intranet) to connect to the VM.

* Provider physical network: External IP range only for VM (none for physical nodes)

This is not a IP network, but L2 network to connect between the VM's L3 to the external network. VMs will acquire an Internet or intranet IP address from a network node or external DHCP server and use this physical network to communicate to Internet or Intranet.
Note that the physical interface in the compute nodes will not get any IP address for this network. They only provide L2 service to connect between the hosted VMs and the external network. Thus, tenants cannot access to the physical nodes through this network, although it is exposed to the outside directly. In Neutron, this is configured with 'flat' type.

* Management physical network (Control network): 192.168.100.x

Within a data center, all physical nodes (controller, network, compute, Glance, etc nodes) are connected to this network for system management. Only the data center manager can access to this network.

* Virtual network using VxLAN: any IP range
This is a self-management network created/managed by a tenant. It's a virtual network using tunneling protocol such as VxLAN, thus any IP range can be assigned by a tenant. Note that this network can be accessed only within a VM in a same virtual network. Physical network used by these traffic can be either combined with the physical management network or separated as a tenant data network.

* Tenant data network (optional): 192.168.200.x

For virtual network traffic between VMs, we can set up a separate physical network only for tenant's data distinguished from the management network. If it's not physically separated, the virtual network traffic between VMs shares the management network.

* Management data network (optional)

If you want to separate a data traffic for management purpose from the control command traffic, it is also possible to use a different physical network for that purpose. In most cases, these control data traffics will share the physical management network including in this example (192.168.100.x). Sometimes it can be configured to use the tenant data network (192.168.200.x) for a specific reason including VM migration or image data separation.

* Storage data network (optional)

For further separation, storage data network can be physically separated to provide a stable and high-performance object or volume storage service. If it is not separated, it can be used by any of the data network or the physical management network.

In this example, we only consider the tenant data network. The other optional network


< Physical interfaces for the example >

* Router/DHCP server: provides connectivity between Internet, intranet and the data center network. 

-eth0 - to outside Internet (public)
-eth1 - to outside intranet (public)
-eth2/3 - to Provider physical network for Internet (123.4.0.1) and intranet (172.16.0.1): GW for VMs
-eth4 - to Control network (192.168.100.1): NAT for control network and provide public access to the tenant's UI from the controller node.

In this configuration, eth2 and eth3 are both connected to the same physical network. Router needs more sophisticated settings for firewall and routing rules for more security. Basically it connects four different physical networks in this scenario: two public networks, provider network, and the secret control network. Forwarding rules should be set up to allow only incoming traffic from external (eth0/1) to reach VMs on eth2/3 or the controller node eth4. The outgoing traffic from eth2/3/4 can freely access to the Internet.

Note that eth2/3 can be detached from the router if you want to make Neutron to control them. In that case, an external network (123.4.x.x/172.16.x.x) are connected to only network node running Neutron L3 routing agent, and Neutron provides DHCP and routing for VMs.

* Controller node

-eth0 - Control network (192.168.100.x)
-eth1 - Internet/intranet through router to provide UIs.

Receives a VM creation request or any other requests from tenants through horizon, and sends the command to other physical nodes by calling their APIs through eth0 control network. 


* Network node

-eth0 - Control network (192.168.100.x)
-eth1 - Tenant data network (192.168.200.x): manages virtual tunnel networks
-eth2 - Tenant provider network (No IP): manages

Network node is in charge of managing virtual network, e.g. providing DHCP for VxLAN, L3 routing between different VxLANs and controlling 'public' access to VMs.


* Compute nodes

-eth0 - Control network (192.168.100.x)
-eth1 - Tenant data network (192.168.200.x): VM-to-VM intra-DCN traffic, e.g. VxLAN
-eth2 - Tenant provider network (No IP): VM's external connectivity to Internet or intranet

Nova API is listening on eth0 to get any command from the controller. VMs hosted in the compute node use eth1 for intra-DCN traffic sending to another VM using the 'virtual network' IP range. These packets are encapsulated in VxLAN or other tunneling protocol and sent to another compute node using tenant data network. Note that 192.168.200.x is assigned only for physical compute nodes to find another compute node hosting the destination VM. Thus, 192.168.200.x cannot be accessible by VMs. VMs can access only other VMs because all the packets sent from VMs to eth1 are encapsulated into a tunneling protocol.

On the other hand, eth2 is directly connected to VMs without any tunneling protocol which makes it so called 'flat' network. Provider network is connected to VMs using this interface. When a VM needs to get an access to Internet, Neutron controller creates a virtual interface within a VM, which can acquire a public IP address from the router or Neutron. For a floating IP, the virtual interface is created within a Neutron controller that forwards the traffic to the assigned VM through this network.

< Conclusion >

There are so many terms that may confuse people. In this article I was trying to explain them as easy as possible, but they are still a bit confusing. The most important and easy to get confused is the mixed understanding of networks in different layers. Whenever you think of network architecture, you must separate the concept based on a network layer. Do not mix a L3 (IP) layer with an underneath L2 or physical layer. On a single physical layer, multiple networks can be operating in higher layer. Think about your desktop computer. It has only one IP address (L3) and MAC address (L2), but it can run so many applications using TCP/UDP port numbers. With a same reason, multiple IP network can exist in a same ethernet. You can assign many IP addresses to one NIC interface on Windows or Linux. If you can separate them and think more logically considering the features I discussed above, it won't be so confusing about what others talk about.

For more information, look at this OpenStack document. It is a bit outdated, but provides much more information than a recent document.
https://docs.openstack.org/liberty/networking-guide/deploy.html

Thursday, 26 October 2017

OpenStack monitoring - using Ceilometer and Gnocchi

OpenStack has its own monitoring tool - Ceilometer. It has to be installed separately because it's not shipped with OpenStack by default.

On the recent version of OpenStack, Ceilometer has been separated into two projects, Ceilometer and Gnocchi. Ceilometer is in charge of polling monitored matrices, while Gnocchi is to collect the data and deliver to the user. OpenStack named them as "Telemetry Service" which in fact is combination of Ceilometer, Gnocchi, and other software modules.

Because of this complex history, some articles and answers about how to use Ceilometer on Internet are outdated and do not apply to the current version.
In this article, we will look at how to install Ceilometer and Gnocchi on OpenStack Pike (most recent version) with some examples, .

1. Install Gnocchi and Ceilometer on Controller node
Gnocchi is in charge of collecting and storing the monitored data, and providing the data to the user. Simply speaking, its role is same as a database system which stores and retrieve data. In fact, Gnocchi uses database system to store the data and/or the index of the data.

Follow the instruction with a caution of gnocchi:
https://docs.openstack.org/ceilometer/pike/install/install-controller.html#

As the document is outdated, it does not include installation process of gnocchi. If you encounter any problem because of gnocchi, install gnocchi separately using its own document:
http://gnocchi.xyz/install.html#id1

Although Gnocchi is started from Ceilometer project, it is now a separate project from Ceilometer. Be aware of this, and whenever you encounter any issue regarding Gnocchi, find the solution on Gnocchi site, not from Ceilometer resources.

Gnocchi is composed of several components. gnocchi-metricd and gnocchi-statd are services running background to collect data from Ceilometer and other monitoring tools. If these services are not running properly, you can still use gnocchi client to retrieve the resource list, but the measured data will be empty because it cannot collect the monitored data.

While metricd and statd are in charge of data collection, WSGI application running through Apache httpd is providing API for Gnocchi client. This web application uses a port 8041 by default, which is also set up as end-point for OpenStack.

Gnocchi-client is in charge of communicating with the gnocchi API using port 8041, to retrieve the monitored and stored data from gnocchi storage.

During the installation, you may choose how to store gnocchi data as well as how to index them. Default DevStack setting is to store the data as a file, and use mysql for indexing them.

If you want to monitor other services such as Neutron, Glance, etc, the above link also has an instruction how to configure to monitor them.

2. Install Ceilometer on Compute nodes
Follow the installation guide provided by OpenStack:
https://docs.openstack.org/ceilometer/pike/install/install-compute-rdo.html

Note that ceilometer-compute should be installed on every compute node which is in charge of monitor the compute service (cpu, memory, and so on for VM instances).

3. Check Gnocchi and Ceilometer and troubleshooting
Once all installation is done, you should be able to use gnocchi client to retrieve the monitored data.
Follow the verify instruction:
https://docs.openstack.org/ceilometer/pike/install/verify.html

If "gnocchi resource list" command does not work, there is a problem on Gnocchi API running on httpd.
One of the possible reason is related to Redis server, which is used by gnocchi API and other OpenStack services to communicate each other. Check port 6379 which supposed to be listening by Redis server.

If gnocchi resource list is working but "gnocchi measures show ..." returns empty result, it means gnocchi is not collecting any data from Ceilometer. First of all, check gnocchi-statd and gnocchi-metricd. If they are not running properly, gnocchi cannot gather data. Also, check ceilometer settings to make sure that it is monitoring and reporting correctly.

If gnocchi measures are not updating correctly for some reason, it's good practice to update gnocchi / ceilometer / database schema using these commands:

$ gnocchi-upgrade
$ ceilometer-upgrade --skip-metering-database

4. Monitor hosts (hypervisors)
Ceilometer monitors only VM instances in a host by default. If you want to monitor the compute hosts themselves for VM provisioning, add the following lines into nova.conf.

[DEFAULT]
compute_monitors = cpu.virt_driver,numa_mem_bw.virt_driver

After the successful configuration, "gnocchi resource list" will show one more resource called "nova_compute".

If Gnocchi reports there is no such a resource, it's probably because your Ceilometer version is old. In the old Ceilometer, it did not create a nova_compute resource in Gnocchi. Check your Ceilometer log, there will be error messages like:

metric compute.node.cpu.iowait.percent is not handled by Gnocchi

If so, update your Ceilometer version, or fix it by changing Ceilometer source code and /etc/ceilometer/gnocchi_resources.yaml file.

Refer to this commit message:
https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=5e430aeeff8e7c641e4b19ba71c59389770297ee


5. Sample commands and results

To retrieve all monitored VM instances (one resource ID corresponds to one VM instance):
$ gnocchi resource list -t instance -c id -c user_id -c flavor_name
+--------------------------------------+----------------------------------+-------------+
| id                                   | user_id                          | flavor_name |
+--------------------------------------+----------------------------------+-------------+
| 2e3aa7f0-4280-4d2a-93fb-59d6853e7801 | e78bd5c6d4434963a5a42924889109da | m1.nano     |
| a10ebdc8-c8bd-452c-958c-d811baaf0899 | e78bd5c6d4434963a5a42924889109da | m1.nano     |
| 08c6ea86-fe1f-4636-b59e-2b1414c978a0 | e78bd5c6d4434963a5a42924889109da | m1.nano     |
+--------------------------------------+----------------------------------+-------------+

To retrieve the CPU utilization of 3rd VM instance from above result:
$ gnocchi measures show cpu_util --resource-id 08c6ea86-fe1f-4636-b59e-2b1414c978a0
+---------------------------+-------------+----------------+
| timestamp                 | granularity |          value |
+---------------------------+-------------+----------------+
| 2017-10-25T14:55:00+00:00 |       300.0 | 0.177451422033 |
| 2017-10-25T15:00:00+00:00 |       300.0 |   0.1663312144 |
| 2017-10-25T15:05:00+00:00 |       300.0 |   0.1778018934 |
+---------------------------+-------------+----------------+

To retrieve the resource ID of compute hosts:
$ gnocchi resource list -t nova_compute -c id -c host_name
+--------------------------------------+------------------+
| id                                   | host_name        |
+--------------------------------------+------------------+
| 52978e00-6322-5498-9c9a-40fc5dca9571 | compute.devstack |
+--------------------------------------+------------------+

To retrieve the CPU utilization of the compute host:
$ gnocchi measures show compute.node.cpu.percent --resource-id 52978e00-6322-5498-9c9a-40fc5dca9571
+---------------------------+-------------+-------+
| timestamp                 | granularity | value |
+---------------------------+-------------+-------+
| 2017-10-25T15:10:00+00:00 |       300.0 |  83.0 |
| 2017-10-25T15:15:00+00:00 |       300.0 |  17.4 |
| 2017-10-25T15:20:00+00:00 |       300.0 |  14.8 |
| 2017-10-25T15:25:00+00:00 |       300.0 |  15.5 |
+---------------------------+-------------+-------+


6. Default configurations from DevStack

/etc/gnocchi/gnocchi.conf :

[metricd]
metric_processing_delay = 5

[storage]
file_basepath = /opt/stack/data/gnocchi/
driver = file
coordination_url = redis://localhost:6379

[statsd]
user_id = XXXX
project_id = XXXX
resource_id = XXXX

[keystone_authtoken]
memcached_servers = 192.168.50.111:11211
signing_dir = /var/cache/gnocchi
cafile = /opt/stack/data/ca-bundle.pem
project_domain_name = Default
project_name = service
user_domain_name = Default
password = XXXX
username = gnocchi
auth_url = http://192.168.50.111/identity
auth_type = password

[api]
auth_mode = keystone

[indexer]
url = mysql+pymysql://root:XXXX@127.0.0.1/gnocchi?charset=utf8


/etc/ceilometer/ceilometer.conf :

[DEFAULT]
transport_url = rabbit://stackrabbit:XXXX@192.168.50.111:5672/

[oslo_messaging_notifications]
topics = notifications

[coordination]
backend_url = redis://localhost:6379

[notification]
pipeline_processing_queues = 2
workers = 2
workload_partitioning = True

[cache]
backend_argument = url:redis://localhost:6379
backend_argument = distributed_lock:True
backend_argument = db:0
backend_argument = redis_expiration_time:600
backend = dogpile.cache.redis
enabled = True

[service_credentials]
auth_url = http://192.168.50.111/identity
region_name = RegionOne
password = XXXX
username = ceilometer
project_name = service
project_domain_id = default
user_domain_id = default
auth_type = password

[keystone_authtoken]
memcached_servers = 192.168.50.111:11211
signing_dir = /var/cache/ceilometer
cafile = /opt/stack/data/ca-bundle.pem
project_domain_name = Default
project_name = service
user_domain_name = Default
password = XXXX
username = ceilometer
auth_url = http://192.168.50.111/identity
auth_type = password

/etc/ceilometer/polling.yaml :

---
sources:
    - name: all_pollsters
      interval: 120
      meters:
        - "*"

Tuesday, 24 October 2017

Install DevStack on CentOS 7, without dependency errors

DevStack is a single machine based OpenStack version which can be simply used for any development on OpenStack. However, installing DevStack is not an easy process especially when you face a dependency error (as like any other dependency problem...). In this article, I will describe how to install DevStack on CentOS 7 without any dependency error.

1. Install CentOS 7.
Download the Minimal ISO version of CentOS 7 and install. If you install CentOS/DevStack on a virtual machine, my personal recommended settings are:
- CPU: 2 cores
- RAM: 4GB
- HDD: 10GB

2. Set up network
CentOS 7 uses NetworkManager by default, which means network can be set up by running 'nmtui' tool. If you don't want to use NetworkManager, disable it and use the traditional way by editing /etc/sysconfig/network-scripts/ifcfg-* files.

3. Install git and add user 'stack'.
First, install git software using yum:

# sudo yum install -y git

Then, add a non-root user 'stack' to run devstack. Details can be found on the official document:
https://docs.openstack.org/devstack/latest/#add-stack-user

Switch to the user 'stack' using su command:

# sudo su - stack

4. Clone DevStack git repository
Clone DevStack git repository using the following command:

# git clone https://git.openstack.org/openstack-dev/devstack
# cd devstack

5. Change to the recent stable branch *IMPORTANT*
Use the following command to change to the 'most recent' stable branch.

# git checkout stable/pike

Find the latest branch here:

When git repository is cloned, it syncs to the latest master branch of DevStack which might include some bugs resulting in installation error. I strongly suggest to use the latest 'stable' branch instead of master branch. Master branch includes all new features and functions which might cause a problem.

Also, do NOT use any OLD stable branches. Use only the most recent stable branch (stable/pike as of October 2017), because DevStack always selects the most recent yum package repository on their process regardless of the selected git branch.

6. Create local.conf
Create a local.conf file in devstack directory. The contents should be:

[[local|localrc]]
ADMIN_PASSWORD=secret
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
HOST_IP=127.0.0.1

"secret" and the local ip address should be changed. Also, add extra features if needed. For example, if ceilometer is necessary, add the following line in the local.conf file.

enable_plugin ceilometer https://git.openstack.org/openstack/ceilometer stable/pike

7. Snapshot the virtual machine (Optional)
If you're installing DevStack on a VM, this is the best time to snapshot the VM because './stack.sh' script messes up all the configuration and package installation of our clean-installed CentOS 7.

8. Start the installation script

# ./stack

This command will start the installation of DevStack. The script uses python, pip, git, yum, and other tools to install DevStack, which means it simply changes all the package installation and dependencies based on DevStack git repository and the recent OpenStack release.

If this script ended up with any error, e.g. dependency error, version error, or failure of any module installation, then restore using the snapshot of Step 7, and then go back to Step 5 to check what branch is used for git repository. Make sure the git branch is changed to the MOST RECENT OpenStack release.

Android Battery Drain issue - How to dig and find the root cause?

Mobile phones is getting more and more powerful silicons and processors, which causes more and more issues on battery management. It is unav...