[emanicslab] systemd-journal: CPU 100%

Wed Jun 11 16:00:54 CEST 2014

Hello David,

Well, well..... as far as I know, what can happen without the
systemd-tmpfiles-clean.timer is that the disk can get full since this
process dictates when the tmpfiles-clean should be executed.

You can also play around with the configuration file below, commenting
some stuff if desired:
/usr/lib/systemd/system/systemd-tmpfiles-clean.timer

Important: sometimes there are many slices within a node, but, just
ONE slice is causing the 100% CPU. So, check with the "journalctl"
within the SLICES to check what's the matter.

Let us know the outcome.

Cheers,
Guilherme

On Wed, Jun 11, 2014 at 3:52 PM, David Hausheer
<hausheer at ps.tu-darmstadt.de> wrote:
> Thanks Guilherme!
>
> Yes, the SLICE internal systemd seems to be the cause for the CPU 100%
> problem. That is as far as I got, too.
>
> I will try to work-around that you proposed, Guilherme, later today. Will
> this "systemctl stop systemd-tmpfiles-clean.timer" break anything important?
> Unfortunately, I am not a systemd expert.
>
> But any work-around for this problem will be more than helpful!
>
> Best regards
> David
>
>
> On 11.06.2014 15:38, Guilherme Sperb Machado wrote:
>>
>> Hello EmanicsLab, David, Thierry,
>>
>> I've set up a testbed environment for MyPLC, the same one that
>> EmanicsLab is running.
>>
>> Lately, I've seen that my nodes got 100% CPU after a while... and I
>> observed the following:
>>
>> - the node reached permanent 100% of CPU *only* when there's an LXC
>> slice running on the node
>> - the node did not present ANY abnormal log messages in the journal log
>> - the node never got 100% of CPU when running without any LXC slice
>> being instantiated
>>
>> Therefore, I started to investigate a bit further and found out the
>> following:
>>
>> - actually it's the SLICE internal systemd that causes the physical
>> CPU to run on 100%
>>
>> Then, investigating a bit more, I found many log entries (with the
>> "journalctl") within slices that were complaining about this:
>>
>> Jun 11 12:53:57 myplc-node1-vm.mgmt.local systemd[1]: Starting Cleanup
>> of Temporary Directories...
>> Jun 11 12:53:58 myplc-node1-vm.mgmt.local systemd-journal[11]:
>> Suppressed 13476 messages from /system
>> Jun 11 12:53:58 myplc-node1-vm.mgmt.local systemd[1]:
>> systemd-tmpfiles-clean.service start request repeated too quickly,
>> refusing to start.
>> Jun 11 12:53:58 myplc-node1-vm.mgmt.local systemd-journal[11]:
>> Suppressed 12326 messages from /system
>> Jun 11 12:53:58 myplc-node1-vm.mgmt.local systemd[1]: Failed to start
>> Cleanup of Temporary Directories.
>>
>> Soooo, I did the following (within the slice):
>>
>> el_ipv6test1 at myplc-node1-vm.mgmt.local# systemctl stop
>> systemd-tmpfiles-clean.timer
>>
>> Then, the CPU utilization of the MyPLC node got back to normal.
>>
>> David, Thierry, can you try such thing and let me know if the timer of
>> tmpfiles clean process MIGHT BE the cause of the systemd-journal CPU
>> on 100% ? Maybe, another process INSIDE one or more slices can be the
>> cause of the CPU on 100%. :-)
>>
>> If so, I'm more than happy to help on the matter.
>>
>> Cheers,
>> Guilherme
>>
>
> --
> Prof. Dr. David Hausheer
>
> Technische Universitaet Darmstadt
> Dept. of Electrical Engineering & Information Technology
>
> Rundeturmstr. 10, Building S3/20, Room 225
> 64283 Darmstadt, Germany
> Phone: +49 6151 16 4280
> Fax: +49 6151 16 6152
> E-Mail: hausheer at ps.tu-darmstadt.de
> Web: http://www.ps.tu-darmstadt.de/

-- 
----------------------------------------------------
MSc. Guilherme Sperb Machado
Researcher & PhD. candidate at UZH (Universität Zürich)
http://www.csg.uzh.ch/staff/machado/
----------------------------------------------------