Summary of the Fedora 11 - 20 Seconds Boot Feature - Test Day
This is an analysis of the bootcharts generated on the 20 Seconds Boot Feature - Test Day.
On the 20 Seconds Boot Feature Test Day a lot of helpful people generated bootcharts from the latest rawhide build in various install configurations like "Desktop", "Server" and "Minimal".
To extract the information from the bootchart logs and analyze them, I wrote a quick and dirty python script.
The network initscript has a variable duration due to external dependencies, like network card negotiation with the switch or getting the IP from the DHCP server (more than 50 seconds sometimes!!!!). Because of this, I have subtracted the duration of it from the resulting time.
Resulting Tables with png bootcharts:
- QA:Testcase_bootchart_personal
- QA:Testcase_bootchart_desktop (with full desktop)
- QA:Testcase_bootchart_full_desktop
- QA:Testcase_bootchart_server
- QA:Testcase_bootchart_minimal
What we can see from the result, of course disk I/O speed and CPU speed are the major factors, which determine the boot speed.
Other I/O factors include:
- disk fragmentation
- type of filesystem used
But that it is not what we are looking for. We want to eliminate unnecessary bottle necks, where neither CPU or I/O is the limiting factor or where unexpected I/O or CPU activities are happening.
For example, as we can see in https://fedoraproject.org/w/uploads/3/3b/Bootchart-a02fad60-dcf8-48d7-9de0-4d67a050313d-desktop.png restorecon is called in rc.sysinit on /dev , nash-hotplug is running and CPU is 100% for 4 seconds with practically no disk I/O.
Fortunately this is fixed already in latest mkinitrd by http://git.fedorahosted.org/git/?p=mkinitrd;a=commitdiff;h=d539443a6ebe8868a73daffa88adfc408838bc21;hp=9253a1b55376b491bf612f00ec1d3edc66502a3f
This bootchart is especially useful, because the whole boot process is taking so long, because the disk and CPU is so slow.
What we also see, in comparison to other distributions, is the long time spent before rc.sysinit is even started. First there is the kernel loading and initialization and second there is nash with a whole udev implementation and plymouth started. While nash is nice to have, because a lot of variants of root devices are supported, efforts for a unified cross distribution initrd are underway. We will see, if that helps or increases boot time in the future.
For example on my Dell, the time until rc.sysinit is started varies between 5 and 10 seconds. So this fluctuation has to be recognized when comparing bootcharts (even from the same distribution and machine).
Next thing started after nash is rc.sysinit. There have bugzillas filed against some applications, which do not clean up their state directories, resulting in a long "find and remove" action in rc.sysinit (var/run/gdm/auth-for* directories not removed.... slows boot)
The big thing started in rc.sysinit is, of course, start_udev. start_udev first removes all files in /dev, creates all basic nodes, runs restorecon on them and begins replaying all kernel hotplug events. This triggers module loading and the creation of all hardware device nodes for the hardware found on the system. Module loading is taking the most part of I/O and CPU here. If you consider to speedup modprobe, keep in mind that even loading all modules via insmod by hand takes nearly as much time as using modprobe with dependency resolution and database reading. On my systems the difference between using insmod and modprobe was not recognizable at all. So, I would conclude that most of the time is spent reading the module files from disk and most CPU is spent in the kernel with module and hardware initialization.
Of course some points make your system noticeable slower in this part:
- use of /etc/udev/makedev.d/. If there are files in /etc/udev/makedev.d, MAKEDEV is called, which in turn reads a lot of files from /etc/makedev.d, which takes a lot of time on some systems.
- too many scripts in /lib/udev/rules.d or /etc/udev/rules.d. On some systems you do not need all rule files. Some rules are executed on every device added, just to catch a corner case, which might never happen on your system.
- some 3rd party rules are very badly designed and implement a kind of busy polling. Watch out for them and report them to the package owner and/or on bugzilla.
In the example bootchart we can also immediately see the effect of some components, which are slow or can be started on demand:
- rpc services (disturbing hacks to start nfs-utils services on demand (not entirely serious))
- cupsd ((RFC, PATCH) start cups on demand, using xinetd)
- fedora-system-keyboard, which starts the whole python interpreter with a lot of I/O happening for that. (https://bugzilla.redhat.com/show_bug.cgi?id=483817)
- dellWirelessCtl has a lot of I/O here.
- HAL is growing and growing and takes more and more time to startup.
- NetworkManager starting wpa_supplicant even for static network interfaces which do not require authentication and DHCP. So the service of wpa_supplicant is never used. (https://bugzilla.redhat.com/show_bug.cgi?id=482823)
- Xorg is having a significant time with no I/O and full CPU
- Bluetooth and IRDA being started (start the bluetooth service via udev)
- cpuspeed with no I/O and CPU
- microcode has a sleep (microcode_ctl busy polling microcode_ctl ships a pointless init script)
An Ideal Boot Setup
In an ideally boot setup CPU and I/O is maxed out the whole boot process.
Ways of how to achieve this:
- readahead all files used to fill the memory disk cache in parallel to the normal boot process
- avoid sleeps
- run services which do not depend on each other in parallel
An ideal init process would take the premier goal (in the desktop case it would be the login screen), start everything what is needed for it with first priority and start the rest in parallel or delayed with low (I/O and CPU) priority.
What is needed to achieve that:
- Dependencies! This is why I once started the LSB Header bugzilla tracker. https://bugzilla.redhat.com/show_bug.cgi?id=246824
- An init process, which knows about the dependencies and the primary goal and sets up the I/O and CPU priorities.
What you can do now to speed up your boot process
- install and run prelink
- install and run readahead / preload
- maybe defragment a heavy fragmented filesystem?
What to keep in mind as a developer
- try to start services on demand (might be hardware dependent, use udev/hal or just look in /sys if the hardware is even there)
- do not cause heavy I/O on start up (like reading a whole big databases in memory or reading big log files, or using an interpreter (like fedora-setup-keyboard)), or use a low I/O priority
- do not use sleep 1, if you are busy polling, use "sleep 0.2" or find another mechanism
- If your udev rule does not have to be executed at boot time, consider using the ENV{STARTUP}!="1" condition.
What did the test day achieve?
As an immediate outcome of the test day, I modified readahead and released readahead-1.4.8 with the following changes:
- set low I/O priority (no more regression in boot time with readahead)
- moved data files from /etc/readahead.d to /var/lib/readahead
- don't start readahead from readahead.event, if system has less than 384MB
Harald Hoyer

Thanks