Unix Internals - California State University, Northridge
Unix Internals
Process Scheduling 5
Unix scheduler
• policy
• implementation
context switch
• hardware execution context of current process is saved in it’s
process control block (PCB)
which is traditionally part of the
u area of the process
• context of the next process is retrieved from it’s PCB,
loaded into the hardware registers
• architecture-specific tasks
o flush data cache after context switch
o flush instruction cache after context switch
o flush address translation cache after context switch
o pipelined architectures (e.g., RISC)
flush instruction pipeline prior to context switch
hardware clock
• fixed time interval interrupt clock tick 10 milliseconds
• major tick n clock ticks n depends upon specific Unix variant
• clock frequency
o number of ticks per second
o constant HZ defined in param.h file
• kernel functions measure time in number of ticks
clock interrupt handler
• highly system-dependent
• priority second only to power-failure interrupt
• updates CPU usage statistics for current process
• performs scheduler-related functions
o priority recomputation
o time-slice expiration handling
• sends SIGXCPU signal to current process
if it has exceeded its CPU usage quota
• updates
o time-of-day clock
o SVR4 variable lbolt – number of ticks elapsed since system last booted
• handles callouts
• wakes system processes when required
o swapper
o pagedaemon
• handles alarms
callouts
• records a function that the kernel will invoke after a specific number of ticks
• int to_ID = timeout(void(*fn)(), caddr_t arg, long delta);
o fn() – kernel function to be invoked
o arg – argument to be passed to the function
o delta – time interval measured in number of ticks
• kernel invokes callout in system context
o function cannot sleep
o function cannot access the process context
• void untimeout( int to_ID);
• periodic tasks
o retransmission of network packets
o scheduler functions
o memory management functions
o monitoring devices to avoid losing interrupts
o polling devices
• normal kernel operation – do not execute at interrupt priority
o clock handler checks if any callouts are due
o sets flag indicating that callout handler must run
o when system returns from base interrupt priority
▪ checks flag
▪ if flag is set, system invokes callout handler
▪ handler invokes each callout that is due to run
o once due, callouts run,
but only after all pending interrupts have been serviced
pending callout list
• checking time
o every CPU tick
o high interrupt priority
o optimize checking time
• insertion time
o low priority
o much less frequently than one per tick
• implementation
o 4.3BSD sort list in order of “time to fire”
▪ stores differential time of expiration for each entry
difference between its time to fire and that of the previous callout
▪ decrements time difference in first entry at each clock
o sort list in order of “time to fire”
▪ stores absolute time of expiration for each entry
▪ compares current absolute time with time in first entry
o timing wheel
▪ fixed-size circular array of callout queues
▪ at each tick, clock interrupt handler advances current time pointer
to the next element in the array,
wrapping around at the end of the array
▪ if any callouts are in the queue, their expiration time is checked
▪ new callouts are inserted into the queue which is N elements away from current queue, where N is the time to fire measured in ticks
▪ timing wheel hashes callouts based on the expiration time
▪ within each queue, callouts may be sorted or unsorted
▪ sorting callouts
• reduces time required to process non-empty queues but
• increases insertion time
alarms
• real time alarm
o measures actual elapsed time
o notifies process via SIGALRM signal
• profiling alarm
o measures amount of time process has been executing
o notifies process via SIGPROF signal
• virtual-time alarm
• monitors time spent in user mode
• notifies process via SIGVALRM signal
BSD Unix
setitimer() system call
• allows process to request any type of alarm
• specify time interval in microseconds
SVRx
alarm() system call
• requests real time alarm
• specify time interval in whole number of seconds
SVR4
hrtsys() system call
• allows time to be specified in microseconds
high resolution of real time alarms does not imply high accuracy
kernel delivers SIGALRM signal to process,
process will not respond to signal until it is next scheduled to run
high-resolution timers used by high priority processes are less likely to have scheduling delays
but if current process is executing in kernel mode and does not reach an preemption point
profiling & virtual alarms
clock interrupt handler charges whole tick to process even though it uses only a portion of it
time measured by these alarms reflect number of clock interrupts that have occurred while process was running
long run – average out good indicator of time used
for any single alarm, significant inaccuracy
scheduler goals
interactive process
reduce average time and variance between user action and application response sufficiently so that users cannot readily detect the delay
batch process
measure of scheduling efficiency is the task completion time in the presence of other activity
real time process -- time critical
predictable scheduling behavior with guaranteed bounds on response times
kernel functions – paging, interrupt handling, process management
execute promptly when required
mix of processes
all application processes must continue to progress
no application should prevent another from progressing unless user has explicitly permitted it
system should always be able to receive and process interactive user input
Traditional Unix Scheduling
SVR3, 4.3BSD
• time-sharing, interactive environments
• multiple users
• several batch & foreground processes simultaneously
• improve response times of interactive users
• ensuring that low-priority, background jobs do not starve
• priority-based scheduling
• each process has a scheduling priority that changes with time
• scheduler selects highest-priority runnable process
• uses preemptive time-slicing to schedule processes of equal priority
• dynamically varies process priority based on their CPU usage patterns
• kernel is nonpreemptible
• running process
o may voluntarily relinquish CPU when blocking on a resource
o can be preempted when it returns to user mode
process priorities
kernel priority numbers 0 – 49
user mode priority numbers 50 – 127
proc structure
p_pri current scheduling priority
p_usrpri user-mode priority
p_cpu measure of recent CPU usage
p_nice user-controllable nice factor
• scheduler uses p_pri to decide which process to schedule
• process in user mode ( p_pri == p_usrpri
• process wakes up after blocking in a system call
• scheduler uses p_pri to store the temporary kernel priority
sleep priority
• terminal input 28
• disk I/O 20
• process wakes up after blocking
o kernel sets p_pri ( sleep priority of the event or resource
o process scheduled ahead of other user processes
• process completes system call, about to return to user mode
• kernel sets scheduling priority p_pri ( p_usrpri
user-mode priority p_usrpri depends upon
• recent CPU usage p_cpu
• user-controllable nice factor p_nice range 0 – 39 , default 20
1 user mode: Δ↑ p_nice ( Δ↓ priority
o superuser: Δ↓ p_nice ( Δ↑ priority
process priority algorithm
• process creation ( p_cpu = 0
• clock tick ( clock handler increments p_cpu for current process (max 127)
• every second ( kernel invokes schedcpu() -- scheduled by callout
o reduces p_cpu of each process by decay factor
o SVR3
• fixed decay factor ½
• undesirable side effect – system load Δ↑ ( Δ↑ priorities
o 4.3BSD computes
• decay = (2 * load_average) / (2 * load_average + 1);
• load_average == average number of runnable processes over last second
• p_usrpri = PUSER + ( p_cpu / 4 ) + ( 2 * p_nice );
• PUSER is the baseline user priority of 50
scheduler implementation
whichgs (global variable – bitmask 32 cells – one bit for each queue)
|0 |0 |0 |1 |0 |1 |0 |… |
qs (32 rows)
|0 – 3 |
|4 -- 7 |
|8 -- 11 |
|12--15 |
|16--19 |
|20--23 |
| |
|… |
|124-127 |
proc structures
runnable processes
context switch - swtch() function
• examines whichgs to find index of the first set bit
• bit identifies scheduler queue containing highest priority runnable process
• removes process from head of queue & switches context to it
• proc structure -- p_addr field points to page table entries of the u area
• process control block is a part of the u area
• VAX-11
o special instructions – 32 bit fields
▪ FFS – find First Set
▪ FFC – Find First Clear
o special instructions – double linked lists
▪ INSQHI – insert elements into doubly linked lists
▪ REMQHI– remove elements from doubly linked lists
o special instructions – process context
▪ LDPCTX – load process context
▪ SVPCTX – save process context
run queue manipulation
• highest priority process always runs
unless current process is executing in kernel mode
• fixed time quantum -- 4.3BSD 100 milliseconds
• every 100 ms, kernel invokes roundrobin() function
to schedule next process from same queue
• if higher priority process were runnable,
preferentially scheduled without waiting for roundrobin() function
• if other runnable processes are on a lower priority queue,
the current process continues to run even though
its quantum has expired
• every second, schedcpu() recomputes the priority of each process
o removes the process from the queue
o recomputed process priority
o inserts process into a (possibly) different run queue
• every four ticks, clock interrupt handler computes
priority of the current process
• context switch
o current process (voluntary context switch)
▪ blocks on a resource
▪ exits
o priority recomputation ( priority of another process > current priority
o current process or interrupt handler wakes up a higher priority process
• voluntary context switch
o kernel directly calls swtch() from sleep() or exit() functions
• involuntary switches
o system is in kernel mode, cannot directly preempt the process
o kernel sets runrun flag
o when process is about to return to user mode,
kernel checks the runrun flag
o if runrun flag is set, kernel transfers control to swtch() routine
limitations
• large number of processes ( inefficient to recompute priorities every second
• no way to guarantee CPU resources to specific group of processes
• no guarantee of response time
• little control over priorities by applications
• nonpreemptive kernel -- priority inversion -- higher-priority process may have to wait significant amount of time for lower-priority process to complete
SVR4 Scheduler
scheduling class defines scheduling policy for set of processes
SVR4 scheduling classes
• time-sharing
• real-time
class-independent routines -- implement common services
• context switching
• run queue manipulation
• preemption
• procedural interface
class-dependent functions – pure virtual functions
• priority recomputation
o real-time class – fixed priorities
o time-sharing class -- varies process priorities dynamically
• inheritance
class independent layer
dqactmap (global variable – bitmask 160 cells – one bit for each queue)
|0 |0 |1 |0 |0 |1 |0 |… |
dispq (160 rows)
|160 |
|159 |
|158 |
|157 |
|156 |
|155 |
| |
|… |
|0 |
proc structures
runnable processes
setfrontdq() – insert process at front of queue,
e.g., process preempted before time quantum had expired
setbackdq() – insert process at back of queue, e.g., newly runnable process
dispdeq() – remove process from queue
dispatch latency
delay between the
time that process becomes runnable
and
time that process actually begins running
SVR4 kernel preemption points
preemption point -- places in kernel code where
• all kernel data structures are in stable state
• kernel is about to embark on a lengthy computation
at preemption point
• kernel macro PREEMPT() checks kprunrun flag
• if kprunrun flag is set,
o real-time process is requesting access to run
o PREEMPT() macro calls preempt() kernel routine to preempt process
• bounds the amount of time that real-time process must wait before being scheduled
• preempt() kernel routine
o invokes CL_PREEMPT operation
to perform class-dependent processing
then
o calls swtch() to initiate context switch
• swtch() function
o calls pswtch() to perform machine-independent part of context switch
o pswtch()
▪ clears runrun and kprunrun flags,
▪ selects highest priority runnable process
▪ removes highest priority runnable process from the dispatch queue
▪ updates dqactmap
▪ sets state of process to SONPROC (running on processor)
▪ updates memory management registers to map the u area
▪ updates virtual address translation registers of the new process
o invokes assembly language code to
▪ manipulate register content
▪ flush translation buffers
▪ etc
interface to scheduling classes
generic interface provides virtual functions that are implemented by each scheduling class
rt_classfuncs
rt_init
sys_classfunc
sys_init
ts_classfunc
ts_init
proc structures proc structures
classfuncs structure
vector of pointers to the functions
that implement the class-dependent interface for any class
global class table
contains one entry for each class
• class name
• pointer to an initialization function
• pointer to the classfuncs vector for that class
p_cid index into global class table class ID
p_clfuncs pointer to the classfuncs vector for the process class
p_clproc pointer to class-dependent private data structure
accessing class-dependent functions
• set of macros are used to
o resolve generic interface function calls and to
o invoke the correct class-dependent function
• #define CL_SLEEP(procp, clproc, … ) \
(*(procp)->p_clfuncs->cl_sleep)(clprocp, … )
scheduling classes
determine the
• policies for priority computation
o range of priorities for the processes
o if and under what conditions the priority can change
o size of the time slice each time a process runs
o whether time slice size will depend upon the priority class
e.g., infinite time quantum for real-time tasks
• scheduling of the processes in the class
entry points of the class-dependent interface
CL_TICK clock interrupt handler initiates call
monitors time slice, recomputes priorities,
handles time quantum expiration, etc.
CL_FORK
CL_FORKRET fork() initiates call
CL_FORK initializes child-specific data structure
CL_FORKRET may set runrun, allowing child process to run before parent
CL_ENTERCLASS
CL_EXITCLASS call is initiated when a process enters or exits
a scheduling class; responsible for allocating & deallocating the class-dependent data structures
CL_SLEEP sleep() initiates call; may recompute process priority
CL_WAKEUP wakeprocs() initiates call; places the process on the
appropriate run queue; may set runrun or kprunrun
scheduling class determines the specific actions to be accomplished by a particular function, e.g., clock interrupt handler calls CL_TICK routine for the class to which the process belongs; the routine determines the exact actions to be performed
Time-Sharing Class
• default class
• dynamic priorities
• round-robin scheduling is used for processes with equal priorities
• static dispatcher parameter table controls process priorities and time slices
• time slice depends upon scheduling priority
• ↓Δ priority ( ↑Δ time slice
|index |globpri |quantum |tqexp |slpret |maxwait |lwait |
|0 |0 |100 |0 |10 |5 |10 |
|1 |1 |100 |0 |11 |5 |11 |
|… |… |… |… |… |… |… |
|15 |15 |80 |7 |25 |5 |25 |
|… |… |… |… |… |… |… |
|40 |40 |20 |30 |50 |5 |50 |
|… |… |… |… |… |… |… |
|59 |59 |10 |49 |59 |5 |59 |
ts_globpri global priority
ts_quantum time quantum
ts_tqexp ts_cpupri value when time quantum expires
before using ts_maxwait seconds
ts_slpret ts_cpupri value when returning to user mode after sleeping
ts_maxwait seconds to wait for time quantum expiry
ts_lwait ts_cpupri value when time quantum does not expire
before using ts_maxwait seconds
• event-driven scheduling ,i.e., process priority changed in response
to specific event related to process
• scheduler
o reduces priority each time it uses its entire time slice
o boosts priority if
▪ process blocks on event or resource
▪ takes a long time to use its time slice
struct tsproc class-dependent data
ts_timeleft time remaining in quantum
ts_cpupri system part of priority
ts_upri user part of priority (nice value), range [ -20, +19 ], default 0
ts_umdpri user mode priority == min { max { ts_cpupri + ts_upri, 0 } 59 }
ts_diswait clock time (seconds) since start of quantum
process returns from sleeping in kernel
• priority is kernel priority determined by sleep condition
• return to user mode ( priority restored from ts_umdpri
• ts_umdpri may be changed by priocntl(), but only a superuser increase it
• ts_cpupri is adjusted according to dispatcher parameter table
dispatcher parameter table access
• current ts_cpupri ( ts_tqexp, to_slpret, ts_lwait ( new ts_cpupri
• ts__umdpri ( ts_globpri, ts_quantum, ts_maxwait
Real-Time Class
• priority range [100 – 159]
• fixed priority
• fixed quantum
• bounded dispatch latency
• bounded response time
• real-time processes are scheduled before any kernel services
process executing in kernel mode; real-time process becomes runnable;
real-time process must wait until current process is about to
return to user mode or until it reaches a kernel preemption point
process explicitly makes a priocntl call to change priority or time quantum
only super processes can enter real-time class;
priocntl – specify priority and time quantum
default parameter table
↓Δ priority ( ↑Δ time slice
struct rtproc
• current time quantum
• time remaining in quantum
• current priority
dispatch latency
time between
• process becomes runnable
• process begins to run
response time
time between
• occurrence of event
• response to event
time required by
• interrupt handler to process the event
• dispatch latency
• real-time process to respond to the event
preemption points
divide kernel algorithms into small bounded work units
real-time process becomes runnable (
rt_wakeup() [ class-dependent wakeup processing routine]
• sets kernel flag kprunrun
current process executing kernel code
• reaches preemption point
• checks flag
• initiates context switch to real-time process
latency bounds guaranteed only when
real-time process is highest-priority runnable process on system
• real-time process is running on system
• higher-priority real-time process becomes available
• higher-priority real-time process is preferentially scheduled
• latency calculation of lower-priority process
is recalculated after process regains CPU
priocntl system call -- single process
• change priority class of process
• set ts_upri for time-sharing process
• reset priority and quantum for real-time process
• obtaining current values of scheduling parameters
priocntlset system call -- set of related process
• change priority class of processes
• set ts_upri for time-sharing processes
• reset priority and quantum for real-time processes
• obtaining current values of scheduling parameters
applies to all processes
• in system
• in process group or session
• in scheduling class
• owned by particular user
• having same parent
Analysis
• allows addition of scheduling classes to system
• time-sharing class changes priorities based on events related to process
• fast and scalable dynamic priority calculation
• event-driven scheduling
favors I/O bound and interactive jobs over CPU-bound jobs
• interactive job with large computations ( system will be unresponsive
• total system load and job mix characteristics determines system response
• system administer can alter system behavior by changing dispatcher table settings and rebuilding kernel; such retuning may be required to keep system efficient and responsive
• adding a scheduling class does not require access to kernel source code
o provide implementation of each class-dependent scheduling function
o initialize classfuncs vector to point to these functions
o provide initialization function to perform setup tasks, e.g., allocating internal data structures
o add entry for this class in the class table in a master configuration file, typically located in the master.d directory of the kernel build directory
o entry contains pointers to the initialization function and classfuncs vector
o rebuild kernel
• priocntl system call restricted to superuser
• no good way for a time-sharing class process to switch to a different class
• no provision for deadline-driven scheduling
• code path between preemption points is too long for many time-critical processes
• hard real-time systems require a fully preemptible kernel
• extremely difficult to tune system for a mixed set of applications
• system load varies constantly hence it is not reasonable to require careful manual tuning each time the load changes
Solaris 2.x Scheduling
multithreaded, symmetric multiprocessing operating system
preemptive kernel
fully preemptive kernel
global kernel data structures are protected by synchronization objects
• mutual exclusion locks (mutexes)
• semaphores
implementation of interrupts using special kernel threads
use the standard synchronization primitives of the kernel
• rarely needs to raise the interrupt level to protect critical regions
• few nonpreemptible code segments
• higher-priority process is scheduled as soon as it is runnable
interrupt threads run at the highest priority
scheduling classes can be dynamically loaded
priorities of the interrupt threads are recomputed to ensure that they remain at the highest possible level
blocked thread can only be restarted on the same processor
multiprocessor support
single dispatch queue for all processors
slelected threads can be restricted to a single, specific processor
processors communicate by sending cross-processor interrupts
per-processor data structure -- scheduling variables
cpu_thread thread currently running on this processor
cpu_dispthread thread last selected to run on this processor
cpu_idle idle thread for this processor
cpu_runrun preemption flag used for time-sharing threads
cpu_kprunrun kernel preemption flag set by real-time threads
cpu_chosen_level priority of thread that is going to preempt
the current thread on this processor
Thread T1, priority 120 Processor P1
Thread T2, priority 130 Processor P2
Thread T3, priority 100 Processor P3
Thread T4, priority 132 Processor P4
Thread T5, priority 135 Processor P5
Thread T6, priority 130
Thread T7, priority 115
• event on processor P1 makes thread T6 runnable
• kernel
o places T6 on dispatch queue
o calls cpu_choose() to find processor with the lowest priority thread
cpu_choose()
▪ marks selected processor for preemption
▪ sets cpu_chosen_level to priority T6 value130
▪ send cross-processor interrupt
• event on processor P2 makes thread T7 runnable
• kernel
o places T7 on dispatch queue
o calls cpu_choose() to find processor with the lowest priority thread
cpu_choose()
▪ compares cpu_chosen_level to priority T7 value115
▪ does not mark selected processor for preemption
• processor P3
o handles the interrupt
o preempts thread T3
hidden scheduling
priority inversion
implementation of priority inheritance
limitations of priority inheritance
turnstiles
analysis
Mach Scheduling
[pic][pic]
-----------------------
P
P
P
P
P
P
P
P
P
P
preemption points
• pathname parsing routine lookuppn()
before beginning to parse
each individual pathname component
• open() system call
before creating a file if it does not exist
• memory subsystem
before freeing process pages
time-sharing
system
real-time
$%4KZtš³äêV r x ‚ ž ¯ Ë ä ú ' ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- unix internals california state university northridge
- csci 2021 computer organization and architecture
- the problems you re having may not be the problems you
- georgia institute of technology
- hardware rochester institute of technology
- place title here using all upper case
- lab7 timer mod polling
- technical specification for a 3 to 15 kva
- university of florida
Related searches
- california state university system
- california state university second bachelor s
- california state university tuition
- california state university jobs
- california state university system schools
- california state university system wiki
- california state university application log in
- california state university campuses list
- california state university log in
- california state university application deadline
- california state university tuition fee
- california state university fees