Limit process resources with cgroups in Linux
- Initial analysis
- How Control Groups Are Organized
- Limitig apache process to 80% CPU usage
- Implementation in Puppet configuration management
- Benchmarking
cgroups
Examples- Systemd example
- References
Sharing here an older article from 2017 I had written to limit the CPU usage of the httpd
apache web server workload to not reach 100% usage and take the VM completely unreachable and unusable.
This was before I learned that I could do this with containers, which have cgroups
as part of their construct. The article focuses on the httpd
service on a CentOS 6 server but it can be extened for any kind of init
. A systemd example is included in the bottom.
Article follows below.
Initial analysis
There are many ways where we can implement limit and throttle CPU usage.
nice
andchrt
- are used to set scheduling priorities for processes. This is to limit their usage, not to throttle them. The processes will use as much of the CPU as the scheduler is willing to give them. An idle process (chrt -i 0) can still consume 100% CPU if there are no other processes requesting CPU time. To implement renicing of apache process a new script needs to be implemented to measure the CPU usage of every Apache process and renice it accordingly. This could affect customer experience and as a solution was skipped.cpulimit
- will throttle the CPU usage of a process, but it cannot aggregate the CPU utilization of its children processes, so the CPU will still aggregate to full usage of all cores of CPUs anywaycgroups
- are designed to limit and/or audit system resources.cgroups
have the power to limit but also to throttle CPU usage (as well as other things) for a process from the second it is launched. Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behavior.
CentOS 6 provides this new kernel feature: control groups, which are called by their shorter name cgroups
. cgroups
allow you to allocate resources - such as CPU time, system memory, network bandwidth, or combinations of these resources - among user-defined groups of tasks (processes) running on a system. You can monitor the cgroups
you configure, deny cgroups
access to certain resources, and even reconfigure your cgroups
dynamically on a running system. The cgconfig
(control group config) service can be configured to start up at boot time and re-establish your predefined cgroups
, thus making them persistent across reboots.
By using cgroups
, system administrators gain fine-grained control over allocating, prioritizing, denying, managing, and monitoring system resources [1]. Hardware resources can be appropriately divided up among tasks and users, increasing overall efficiency.
How Control Groups Are Organized
cgroups
are organized hierarchically, like processes, and child cgroups
inherit some of the attributes of their parents. However, there are differences between the two models.
The Linux Process Model
All processes on a Linux system are child processes of a common parent: the init
process, which is executed by the kernel at boot time and starts other processes (which may in turn start child processes of their own). Because all processes descend from a single parent, the Linux process model is a single hierarchy, or tree.
Additionally, every Linux process except init
inherits the environment (such as the PATH variable) and certain other attributes (such as open file descriptors) of its parent process.
The Cgroup Model
cgroups
are similar to processes in that:
- they are hierarchical, and
- child
cgroups
inherit certain attributes from their parentcgroup
.
The fundamental difference is that many different hierarchies of cgroups
can exist simultaneously on a system. If the Linux process model is a single tree of processes, then the cgroup
model is one or more separate, unconnected trees of tasks (i.e. processes).
Multiple separate hierarchies of cgroups
are necessary because each hierarchy is attached to one or more subsystems. A subsystem represents a single resource, such as CPU time or memory. Linux kernel provides ten cgroup
subsystems, listed below by name and function. a hierarchy tree for RedHat or CentOS 6 is shown below:
[root@centos6-local ~]# tree /cgroup
/cgroup
├── blkio
│ ├── blkio.io_merged
│ ├── blkio.io_merged_recursive
│ ├── ...
│ ├── release_agent
│ └── tasks
├── cpu
│ ├── cgroup.clone_children
│ ├── ...
│ ├── cpu.stat
│ ├── limitcpu10
│ │ ├── cgroup.clone_children
│ │ ...
│ │ └── tasks
│ ├── notify_on_release
│ ├── release_agent
│ └── tasks
├── cpuacct
│ ├── cgroup.clone_children
│ ├── cgroup.procs
│ ├── ...
│ └── tasks
├── cpuset
│ ├── cgroup.clone_children
│ ├── cgroup.procs
│ ├── ...
│ ├── release_agent
│ └── tasks
├── devices
│ ├── cgroup.clone_children
│ ├── cgroup.procs
│ ├── ...
│ └── tasks
├── freezer
│ ├── cgroup.clone_children
│ ├── cgroup.procs
│ ├── ...
│ └── tasks
├── memory
│ ├── cgroup.clone_children
│ ├── cgroup.event_control
│ ├── ...
│ ├── release_agent
│ └── tasks
└── net_cls
├── cgroup.clone_children
├── ...
├── release_agent
└── tasks
Available Subsystems in the Linux kernel
blkio
- this subsystem sets limits on input/output access to and from block devices such as physical drives (disk, solid state, or USB).cpu
- this subsystem uses the scheduler to providecgroup
tasks access to the CPU.cpuacct
— this subsystem generates automatic reports on CPU resources used by tasks in acgroup
.cpuset
— this subsystem assigns individual CPUs (on a multicore system) and memory nodes to tasks in acgroup
.devices
— this subsystem allows or denies access to devices by tasks in acgroup
.freezer
— this subsystem suspends or resumes tasks in acgroup
.memory
— this subsystem sets limits on memory use by tasks in acgroup
and generates automatic reports on memory resources used by those tasks.net_cls
— this subsystem tags network packets with a class identifier (classid) that allows the Linux traffic controller (tc) to identify packets originating from a particularcgroup
task.net_prio
— this subsystem provides a way to dynamically set the priority of network traffic per network interface.ns
— the namespace subsystem.
CFS Tunable Parameters
Completely Fair Scheduler (CFS) — a proportional share scheduler which divides the CPU time (CPU bandwidth) proportionately between groups of tasks (cgroups
) depending on the priority/weight of the task or shares assigned to cgroups
.
In CFS, a cgroup
can get more than its share of CPU if there are enough idle CPU cycles available in the system, due to the work conserving nature of the scheduler. This is usually the case for cgroups
that consume CPU time based on relative shares. Ceiling enforcement can be used for cases when a hard limit on the amount of CPU that a cgroup
can utilize is required (that is, tasks cannot use more than a set amount of CPU time).
The following options can be used to configure ceiling enforcement or relative sharing of CPU:
Ceiling Enforcement Tunable Parameters
cpu.cfs_period_us
specifies a period of time in microseconds (“µs” are represented here as “us”) for how regularly a cgroup
’s access to CPU resources should be reallocated. If tasks in a cgroup
should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us
to 200000 and cpu.cfs_period_us
to 1000000.
The upper limit of the cpu.cfs_quota_us
parameter is 1 second and the lower limit is 1000 microseconds. cpu.cfs_quota_us
specifies the total amount of time in microseconds for which all tasks in a cgroup
can run during one period (as defined by cpu.cfs_period_us
).
As soon as tasks in a cgroup
use up all the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period.
If tasks in a cgroup
should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us
to 200000 and cpu.cfs_period_us
to 1000000. Note that the quota and period parameters operate on a CPU basis.
To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us
to 200000 and cpu.cfs_period_us
to 100000. Setting the value in cpu.cfs_quota_us
to -1
means that the cgroup
does not adhere to any CPU time restrictions. This is also the default value for every cgroup
(except the root cgroup
).
cpu.stat
reports CPU time statistics using the following values:
nr_periods
— number of period intervals (as specified in cpu.cfs_period_us) that have elapsed.nr_throttled
— number of times tasks in acgroup
have been throttled (that is, not allowed to run because they have exhausted all of the available time as specified by their quota).throttled_time
— the total time duration (in nanoseconds) for which tasks in acgroup
have been throttled.
Limitig apache process to 80% CPU usage
Limiting apache and its child process can be implemented by putting all child processes into a custom control group and aggregate the total resources available.
Let’s test with a VM with 4 vCPUs. Initially we need a control group with groups options to limit the total CPU utilization to 80%. This is just a placeholder control group, we’re still not using it. We need 80% of (4 * 100%) CPUs = 320. This is the number of maximum CPU usage we should expect from 80% limitation in a 4 CPU system 320/400 = 0.8
The /etc/cgconfig.d/
directory is reserved for storing configuration files for specific applications and use cases. These files should be created with the .conf suffix and adhere to the same syntax rules as /etc/cgconfig.conf
. Translating this into groups, the config will be as follows and located in /etc/cgconfig.d/limitcpu80.conf
[root@centos6-server ~]# cat /etc/cgconfig.d/limitcpu80.conf
#limit to 80% cpu
group limitcpu80{
cpu {
# Limit a destination to 80% of 4 CPU.
cpu.cfs_quota_us = "32000";
cpu.cfs_period_us = "10000";
}
}
To make sure we are limiting apache services we will need to start this service in a cgroup
. Services that can be started in cgroups
must
- use a
/etc/sysconfig/servicename
file, or - use the daemon() function from
/etc/init.d/functions
to start the service
To make an eligible service start in a cgroup
, we will edit its file in the /etc/sysconfig
directory to include an entry in the form CGROUP_DAEMON="cpu:limitcpu80"
where subsystem is a subsystem associated with a particular hierarchy in this case “cpu”, and control_group
is our cgroup
in that hierarchy.
Implementation in Puppet configuration management
In your local puppet/http
module you can include the following two files.
template: limitcpu80.conf.erb
#limit to 80% cpu
group limitcpu80 {
cpu {
# Limit a destination to 80% CPU usage.
cpu.cfs_quota_us = "<%= @cfs_quota_us %>";
cpu.cfs_period_us = "<%= @cfs_period_us %>";
}
}
Since the number of CPUs in a VM can vary we need to implement a formula for the cfs_quota_us
variable. Initially we define the cpu_factor
which will get us the factor of 80% of all CPUs in a given system.
$cpu_factor = $facts['processors']['count']*0.8*10000
The actual class now will look like this
class manifest: cgrouplimitcpu80.pp
class httpd::cgrouplimitcpu80 (
Boolean $cgroups_limit_enabled = $httpd::cgroups_limit_enabled,
$cpu_factor = $facts['processors']['count']*0.8*10000
$cfs_quota_us = inline_template('<%= @cpu_factor.to_i %>')
$cfs_period_us = 10000,
){
if $cgroups_limit_enabled {
package { 'libcgroup':
ensure => installed,
}
service { 'cgconfig':
ensure => running,
subscribe => Package['libcgroup'],
enable => true,
}
service { 'cgred':
ensure => running,
subscribe => [ Package['libcgroup'], Service['cgconfig'], ],
enable => true,
}
file { '/etc/cgconfig.d/limitcpu80.conf':
ensure => file,
content => template('httpd/limitcpu80.conf.erb'),
}
Service['cgred'] ~> Service['httpd']
}
# Disabled by default
else {
service { 'cgconfig':
ensure => stopped,
enable => false,
}
service { 'cgred':
ensure => stopped,
enable => false,
}
Service['cgred'] ~> Service['httpd']
}
}
Enabling now this cgroup
is done at the /etc/syscinfig/httpd
file. The definition of /etc/syscinfig/httpd
can be stored in your templates folder.
template/sysconfig/httpd.erb
...
<% if @cgroups_limit_enabled -%>
# Starting a Service in a Control Group
# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/starting_a_process
<% unless @cgroup_daemon.nil? -%>
CGROUP_DAEMON="<%= @cgroup_daemon %>"
<% else -%>
#CGROUP_DAEMON=
<% end -%>
<% end -%>
Now we can update files in hiera by including the new class in the desired profile with
class { '::httpd::cgrouplimitcpu80': }
Then to enable cgroups for apache you should specify the feature in the respective hierarchy file eg:
httpd::cgroups_limit_enabled: true
Benchmarking
Two sets of benchmarking were performed with ab
to test the cgroups
implementation:
- Full throttle
- Normal average traffic
Monitoring of the httpd process was performed by the following command
[root@centos6-server ~]# while true ; do top -bn1 | awk '$12=="httpd" {s+=$9;} END {print s "% CPU";}' | column -t; done
Full throttle testing
Apache benchmark was issued with the following parameters to allow keepalives with 300 concurrent connections for 10,000,000 queries. Test was performed from between two servers, one client and one running httpd
with a 10 Gbps connection and with a longer period of time simulating a real case scenario.
[root@centos6-local ~]# ab -n 10000000 -c 300 -k -H "Host: my.host.name" http://my.ip.add.ress/
Test results
Apache benchmark results with and without cgroups. There is already lower CPU usage:
without cgroups
- load max: 247.51, 216.04, 160.03
- max httpd CPU: __381% __CPU
with cgroups
- load max: 249.88, 221.50, 137.20
- max httpd CPU: 303% CPU
It was also noticed that during benchmarks with ab
, a high cpu time was used by the cgrules
daemon itself thereof contributing to a slight higher total overall load:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3893 root 20 0 19160 6076 1484 R 97.0 0.0 1:07.47 cgrulesengd
Normal Average testing
From a production Load Balancer in production I was able to get the average concurrent connections for a web server and it was ~75. Then I run the benchmark again for ~2 hrs.
[root@centos6-server ~]# ab -n 10000000 -c 75 -H "Host: my.host.name" http://my.ip.add.ress/
the load is pretty stable and in the same level of servers:
[root@centos6-server ~]# uptime
13:17:57 up 1 day, 1:26, 1 user, load average: 2.77, 2.62, 2.33
The cgroups
daemon “cgrulesengd” now reports ~ 11% cpu usage
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7 root 20 0 19000 6000 1520 S 11.3 0.0 14:35.86 cgrulesengd
To find out the number of cgroups
is throttling the cpu we can check the cpu.stat file in
[root@centos6-server ~]# cat /cgroup/cpu/limitcpu80/cpu.stat
nr_periods 84017
nr_throttled 12198
throttled_time 65691839723
cgroups
Examples
Example CPU limits usage
The following examples assume you have an existing hierarchy of cgroups
configured and the CPU subsystem mounted on your system.
To allow one cgroup
to use 25% of a single CPU and a different cgroup
to use 75% of that same CPU, use the following commands:
[root@centos6-server ~]# echo 250 > /cgroup/cpu/blue/cpu.shares
[root@centos6-server ~]# echo 750 > /cgroup/cpu/red/cpu.shares
To limit a cgroup to fully utilize a single CPU, use the following commands:
[root@centos6-server ~]# echo 10000 > /cgroup/cpu/red/cpu.cfs_quota_us
[root@centos6-server ~]# echo 10000 > /cgroup/cpu/red/cpu.cfs_period_us
To limit a cgroup to utilize 10% of a single CPU, use the following commands:
[root@centos6-server ~]# echo 10000 > /cgroup/cpu/red/cpu.cfs_quota_us
[root@centos6-server ~]# echo 100000 > /cgroup/cpu/red/cpu.cfs_period_us
On a multi-core system, to allow a cgroup to fully utilize two CPU cores, use the following commands:
[root@centos6-server ~]# echo 200000 > /cgroup/cpu/red/cpu.cfs_quota_us
[root@centos6-server ~]# echo 100000 > /cgroup/cpu/red/cpu.cfs_period_us
cgroups
considerations
In regards to cgroups implementation we should take into consideration the fact the now apache will use maximum of 80% of CPU so the relative monitoring in place should be adapted to catch when apache will reach this new limit limit.
Systemd example
If we had to implement the same limitation with systemd init then instead we would need to create a drop-in systemd file with the cgroups
configs. The implementation is with cgroups2
so the syntax is a bit different. We do not need to calculate anymore the ration of cpu.cfs_quota_us
/cpu.cfs_period_us
. An implementation example on a CentOS 7 system could look something like this:
[root@centos7-server ~]# cat/etc/systemd/system/multi-user.target.wants/httpd.service.d/limitcpu80.conf
[Service]
CPUAccounting=True
CPUQuota=80%
References
- INTRODUCTION TO CONTROL GROUPS
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/ch01#sec-How_Control_Groups_Are_Organized
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-relationships_between_subsystems_hierarchies_control_groups_and_tasks
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpuacct
- http://manpages.ubuntu.com/manpages/yakkety/man5/cgconfig.conf.5.html
- http://kennystechtalk.blogspot.ca/2015/04/throttling-cpu-usage-with-linux-cgroups.html