Personal tools
You are here: Home Personal My Blog Summary of the Fedora 11 - 20 Seconds Boot Feature - Test Day

Summary of the Fedora 11 - 20 Seconds Boot Feature - Test Day

— filed under: , , , , ,
by Harald Hoyer last modified Mar 02, 2009 17:57

This is an analysis of the bootcharts generated on the 20 Seconds Boot Feature - Test Day.

On the 20 Seconds Boot Feature Test Day a lot of helpful people generated bootcharts from the latest rawhide build in various install configurations like "Desktop", "Server" and "Minimal".

To extract the information from the bootchart logs and analyze them, I wrote a quick and dirty python script.

The network initscript has a variable duration due to external dependencies, like network card negotiation with the switch or getting the IP from the DHCP server (more than 50 seconds sometimes!!!!). Because of this, I have subtracted the duration of it from the resulting time.

Resulting Tables with png bootcharts:

What we can see from the result, of course disk I/O speed and CPU speed are the major factors, which determine the boot speed.

Other I/O factors include:

  • disk fragmentation
  • type of filesystem used

But that it is not what we are looking for. We want to eliminate unnecessary bottle necks, where neither CPU or I/O is the limiting factor or where unexpected I/O or CPU activities are happening.

For example, as we can see in https://fedoraproject.org/w/uploads/3/3b/Bootchart-a02fad60-dcf8-48d7-9de0-4d67a050313d-desktop.png restorecon is called in rc.sysinit on /dev , nash-hotplug is running and CPU is 100% for 4 seconds with practically no disk I/O.

Fortunately this is fixed already in latest mkinitrd by http://git.fedorahosted.org/git/?p=mkinitrd;a=commitdiff;h=d539443a6ebe8868a73daffa88adfc408838bc21;hp=9253a1b55376b491bf612f00ec1d3edc66502a3f

This bootchart is especially useful, because the whole boot process is taking so long, because the disk and CPU is so slow.

What we also see, in comparison to other distributions, is the long time spent before rc.sysinit is even started. First there is the kernel loading and initialization and second there is nash with a whole udev implementation and plymouth started. While nash is nice to have, because a lot of variants of root devices are supported, efforts for a unified cross distribution initrd are underway. We will see, if that helps or increases boot time in the future.

For example on my Dell, the time until rc.sysinit is started varies between 5 and 10 seconds. So this fluctuation has to be recognized when comparing bootcharts (even from the same distribution and machine).

Next thing started after nash is rc.sysinit. There have bugzillas filed against some applications, which do not clean up their state directories, resulting in a long "find and remove" action in rc.sysinit (var/run/gdm/auth-for* directories not removed.... slows boot)

The big thing started in rc.sysinit is, of course, start_udev. start_udev first removes all files in /dev, creates all basic nodes, runs restorecon on them and begins replaying all kernel hotplug events. This triggers module loading and the creation of all hardware device nodes for the hardware found on the system. Module loading is taking the most part of I/O and CPU here. If you consider to speedup modprobe, keep in mind that even loading all modules via insmod by hand takes nearly as much time as using modprobe with dependency resolution and database reading. On my systems the difference between using insmod and modprobe was not recognizable at all. So, I would conclude that most of the time is spent reading the module files from disk and most CPU is spent in the kernel with module and hardware initialization.

Of course some points make your system noticeable slower in this part:

  • use of /etc/udev/makedev.d/. If there are files in /etc/udev/makedev.d, MAKEDEV is called, which in turn reads a lot of files from /etc/makedev.d, which takes a lot of time on some systems.
  • too many scripts in /lib/udev/rules.d or /etc/udev/rules.d. On some systems you do not need all rule files. Some rules are executed on every device added, just to catch a corner case, which might never happen on your system.
  • some 3rd party rules are very badly designed and implement a kind of busy polling. Watch out for them and report them to the package owner and/or on bugzilla.

In the example bootchart we can also immediately see the effect of some components, which are slow or can be started on demand:

An Ideal Boot Setup

In an ideally boot setup CPU and I/O is maxed out the whole boot process.

Ways of how to achieve this:

  • readahead all files used to fill the memory disk cache in parallel to the normal boot process
  • avoid sleeps
  • run services which do not depend on each other in parallel

An ideal init process would take the premier goal (in the desktop case it would be the login screen), start everything what is needed for it with first priority and start the rest in parallel or delayed with low (I/O and CPU) priority.

What is needed to achieve that:

What you can do now to speed up your boot process

  • install and run prelink
  • install and run readahead / preload
  • maybe defragment a heavy fragmented filesystem?

What to keep in mind as a developer

  • try to start services on demand (might be hardware dependent, use udev/hal or just look in /sys if the hardware is even there)
  • do not cause heavy I/O on start up (like reading a whole big databases in memory or reading big log files, or using an interpreter (like fedora-setup-keyboard)), or use a low I/O priority
  • do not use sleep 1, if you are busy polling, use "sleep 0.2" or find another mechanism
  • If your udev rule does not have to be executed at boot time, consider using the ENV{STARTUP}!="1" condition.

What did the test day achieve?

As an immediate outcome of the test day, I modified readahead and released readahead-1.4.8 with the following changes:

  • set low I/O priority (no more regression in boot time with readahead)
  • moved data files from /etc/readahead.d to /var/lib/readahead
  • don't start readahead from readahead.event, if system has less than 384MB
Document Actions
  • Print this
  • Hits: 008307

Thanks

Posted by Adam Williamson at Mar 03, 2009 07:17
Thanks for all your work both to help make the test day a success and implement improvements, Harald - it's great stuff.

What you can do now to speed up your boot process

Posted by asp at Mar 03, 2009 09:29
4) Suspend, never shut down.

Boot time

Posted by Philip at Mar 04, 2009 11:04
As post-kernel boot time is reduced, the kernel boot is becoming significant. Can you perhaps post some suggestions about looking at what is causing the delay, how to debug and where to report (obviously kernel maillist is one place). My new Dell Inspiron desktop seems to take about 10s for kernel (no onscreen messages) and about 15s for init->gdm.

Proof of concept unknown feasibility attainability

Posted by Thomas at May 27, 2009 07:47
Thank you for this write up, and also to all involved.

"too many scripts in /lib/udev/rules.d or /etc/udev/rules.d. On some systems you do not need all rule files. Some rules are executed on every device added, just to catch a corner case, which might never happen on your system."

Can these be removed manually? (i.e. with rm)

I just read about the 'Minimal platform' as well in the release notes. Something which might also boost speed times is the ability, for power users, to automatically configure the kernel and modules loaded for a specific system, for example compile a custom kernel at the install. With a safeguard that if hardware changes this gets picked up, or must be set manually (with automatic detection) with safeguard(hence the power user declaration until foolproof method is deviced) Maybe something like your hardware has changed you must recompile (or reconfigure) the kernel and modules loaded.

Also Fedora 10 on the live cd comes with everything installed by default on the install (i.e. nfs, apache, etc) I would propose, in this case, to ask the user which things they would like to configure, similar to the install cd. For example "Will you be using a printer on your system (any time soon)?" Of course the exact sentences to use would be defined better. Something which is clear and simple to the basic user (perhaps with an expanded field for the advanced user). So far very impressed with Fedora 10. First time user. The bloat though took me a couple of weeks to read all the documentation to set up the box I wanted. I would have preferred a system with opt-in instead of opt-out. Also from the point of security this is, in my opinion, safer for beginning user. Of course I fully realize the trade of this means in ease of use, or plug and play use. This is not an easy decision to make. Maybe an explanation of the security consequences of installing a particular application. Although I find the Red Hat documentation to often be excellent in this aspect. I have not delved into the Fedora wiki's enough to give a judgement on this. Overall I am very impressed with the Fedora website.

Of course using a opt-in method could start with a quick system, but then end up with a slower system as time progresses. With opt-out you at least get the satisfaction that the system gets quicker. The psychological effect on product popularity is a factor to be considered in a customer satisfaction evaluation (a stated in the goals). However, an optimal boot speed with all possible services should be attained. Of course a system running less services will arguably always be faster than a system with more services.

In KDE KDM there is the preloadKde function (in the kdm config). So things such as the Network Manager (or network management of choice) could they be loaded from here? The same for some modules such as wireless. In other divide the loading of modules between before the login manager and after. Or make the login manager a module. This could result in an almost instanteous login screen. On systems with enough ram, possibly make use of a ram disk, which writes to disk. Although this might be troubled by security and stability issues.

Also starting modules when an application is actually launched. So take for example a log function that sends mail. Don't start sendmail until the log function calls for it, and the run it as a daemon as long as the system is up. Not this is just an example.
This might be only wishful on some system setups. Firewall should ofcourse always be started before any network calls are made.

These are just some ideas. To the reader please keep in mind that I am very new to using Fedora Linux, so I am not aware of all existing developments. These are just some first impressions that I had with what I consider to be a very excellent implemented linux distro. The sheer vastness of the project, is something to be considered when learning this distro. Overall very impressed so far.