{"id":335,"date":"2015-06-19T11:53:40","date_gmt":"2015-06-19T11:53:40","guid":{"rendered":"http:\/\/blogs.uct.ac.za\/blog\/big-bytes\/2015\/06\/19\/slurm-job-preemption"},"modified":"2015-08-14T10:01:18","modified_gmt":"2015-08-14T08:01:18","slug":"slurm-job-preemption","status":"publish","type":"post","link":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/","title":{"rendered":"SLURM job preemption"},"content":{"rendered":"<p>SLURM provides a preemption mechanism to deal with situations where cluster become overloaded. This can be configured in several ways:<\/p>\n<p><b>FIFO:<\/b><br \/>\nThis is the most simplistic method of queueing in which there is no preemption, jobs come in, queue and are dealt with in that order. Backfill scheduling is also enabled by default and this allows advanced scheduling of jobs as long as they won&#8217;t delay starting jobs ahead of them in the queue.<\/p>\n<p><b>Preemption via Thread Control:<\/b><br \/>\nHere priority is based on the partition that the user submits jobs to. Core sharing is disabled and PreemptMode is set to SUSPEND. Below is a case where all cores used&#8230;<\/p>\n<pre>JOBID PARTITION     NAME     USER ST    TIME  NODELIST\r\n 1807     uctlo MemTestB      bob  R    5:28  hpc402\r\n 1808     uctlo MemTestB      bob  R    5:25  hpc402\r\n 1809     uctlo MemTestB      bob  R    5:25  hpc402\r\n 1811     ucthi MemTestA     andy  R    4:53  hpc401\r\n 1812     ucthi MemTestA     andy  R    3:07  hpc401\r\n 1813     ucthi MemTestA     andy  R    1:30  hpc401\r\n<\/pre>\n<p>User andy submits a job to the ucthi partition&#8230;<\/p>\n<pre>JOBID PARTITION     NAME     USER ST    TIME  NODELIST\r\n 1808     uctlo MemTestB      bob  R    5:34  hpc402\r\n 1809     uctlo MemTestB      bob  R    5:34  hpc402\r\n 1811     ucthi MemTestA     andy  R    5:02  hpc401\r\n 1812     ucthi MemTestA     andy  R    3:16  hpc401\r\n 1813     ucthi MemTestA     andy  R    1:39  hpc401\r\n 1814     ucthi MemTestA     andy  R    0:01  hpc402\r\n 1807     uctlo MemTestB      bob  S    5:36  hpc402\r\n<\/pre>\n<p>One of bob&#8217;s jobs is now suspended while user andy&#8217;s job runs. However here bob&#8217;s job 1807 is still consuming RAM which may be a problem if user Andy&#8217;s job also needs lots of RAM. In the case where RAM is an issue it is best to cancel and resubmit low priority jobs.<\/p>\n<p>Here core sharing disabled and PreemptMode is set to REQUEUE. User bob is running jobs on all available cores on uctlo partition&#8230;<\/p>\n<pre>JOBID PARTITION     NAME     USER ST    TIME  NODELIST\r\n 1794     uctlo MemTestB      bob  R    0:33  hpc401\r\n 1795     uctlo MemTestB      bob  R    0:30  hpc401\r\n 1796     uctlo MemTestB      bob  R    0:30  hpc401\r\n 1797     uctlo MemTestB      bob  R    0:30  hpc402\r\n 1798     uctlo MemTestB      bob  R    0:30  hpc402\r\n 1799     uctlo MemTestB      bob  R    0:27  hpc402\r\n<\/pre>\n<p>User andy starts submitting jobs to the ucthi partition, bob&#8217;s jobs are cancelled&#8230;<\/p>\n<pre>JOBID PARTITION     NAME     USER ST    TIME  NODELIST\r\n 1794     uctlo MemTestB      bob CG    0:00  hpc401\r\n 1795     uctlo MemTestB      bob  R    0:49  hpc401\r\n 1796     uctlo MemTestB      bob  R    0:49  hpc401\r\n 1797     uctlo MemTestB      bob  R    0:49  hpc402\r\n 1798     uctlo MemTestB      bob  R    0:49  hpc402\r\n 1799     uctlo MemTestB      bob  R    0:46  hpc402\r\n 1801     ucthi MemTestA     andy  R    0:01  hpc401\r\n<\/pre>\n<p>3 of bob&#8217;s jobs have been cancelled and resubmitted in pending state while andy&#8217;s 3 jobs are now running&#8230;<\/p>\n<pre> 1794     uctlo MemTestB      bob PD    0:00  (BeginTime)\r\n 1795     uctlo MemTestB      bob PD    0:00  (BeginTime)\r\n 1796     uctlo MemTestB      bob PD    0:00  (BeginTime)\r\n 1797     uctlo MemTestB      bob  R    0:59  hpc402\r\n 1798     uctlo MemTestB      bob  R    0:59  hpc402\r\n 1799     uctlo MemTestB      bob  R    0:56  hpc402\r\n 1801     ucthi MemTestA     andy  R    0:11  hpc401\r\n 1802     ucthi MemTestA     andy  R    0:06  hpc401\r\n 1803     ucthi MemTestA     andy  R    0:02  hpc401\r\n<\/pre>\n<p>In this instance it is possible to oversubscribe the cores on a node and hence memory limits need to be protected by setting memory as a consumable resource. With this scheduling methods jobs may still not run if memory is set as a consumable resource and there is insufficient RAM per core. A default RAM\/core\/job is set in the configuration file, but users can set there own requirements within the bounds of their account settings.<\/p>\n<p><b>Preemption via Gang scheduling:<\/b><br \/>\nHere users andy and bob have filled up all available cores and some of andy&#8217;s jobs are suspended.<\/p>\n<pre>JOBID PARTITION     NAME     USER ST    TIME  NODELIST\r\n 1815  ucthimem MemTestB      bob  R    0:50  hpc406\r\n 1816  ucthimem MemTestB      bob  R    0:47  hpc406\r\n 1817  ucthimem MemTestB      bob  R    0:47  hpc407\r\n 1820  ucthimem MemTestA   alewis  R    0:19  hpc407\r\n 1821  ucthimem MemTestA   alewis  R    0:16  hpc408\r\n 1822  ucthimem MemTestA   alewis  R    0:16  hpc408\r\n 1823  ucthimem MemTestA   alewis  S    0:00  hpc406\r\n 1824  ucthimem MemTestA   alewis  S    0:00  hpc406\r\n<\/pre>\n<p>After a time slice has passed some of bob&#8217;s jobs are suspended and andy&#8217;s jobs run.<\/p>\n<pre>JOBID PARTITION     NAME     USER ST    TIME  NODELIST\r\n 1817  ucthimem MemTestB      bob  R    0:47  hpc407\r\n 1820  ucthimem MemTestA     andy  R    0:19  hpc407\r\n 1821  ucthimem MemTestA     andy  R    0:16  hpc408\r\n 1822  ucthimem MemTestA     andy  R    0:16  hpc408\r\n 1823  ucthimem MemTestA     andy  R    0:00  hpc406\r\n 1824  ucthimem MemTestA     andy  R    0:00  hpc406\r\n 1815  ucthimem MemTestB      bob  S    0:50  hpc406\r\n 1816  ucthimem MemTestB      bob  S    0:47  hpc406\r\n<\/pre>\n<p>This is repeated until all jobs are completed. The time slice between job suspensions is 60 seconds by default. In the case of user andy submitting to a higher priority partition this scheduling scheme reverts to standard job preemption and bob&#8217;s jobs are suspended indefinitely until cores become free. Once again cores can be oversubscribed and memory needs to be protected.<\/p>\n<p><b>Multifactor preemption:<\/b><br \/>\nThis is the most complex form of preemption. Here the job priority is based upon a complex factoring algorithm<\/p>\n<pre>Job_priority =\r\n(PriorityWeightAge) * (age_factor) +\r\n(PriorityWeightFairshare) * (fair-share_factor) +\r\n(PriorityWeightJobSize) * (job_size_factor) +\r\n(PriorityWeightPartition) * (partition_factor) +\r\n(PriorityWeightQOS) * (QOS_factor)\r\n<\/pre>\n<p>You are encouraged to read the <a href=\"http:\/\/slurm.schedmd.com\/priority_multifactor.html\" target=\"_blank\">official documentation<\/a> on how this can be configured. Below is a simple example based only on user QOS for priority. User bob is consuming all available cores&#8230;<\/p>\n<pre>JOBID PARTITION        NAME  USER  ST    TIME  NODELIST\r\n 1893     ucthi   CoreTestB   bob   R    0:09  hpc402\r\n 1892     ucthi   CoreTestB   bob   R    0:12  hpc402\r\n 1889     ucthi   CoreTestB   bob   R    0:15  hpc401\r\n 1890     ucthi   CoreTestB   bob   R    0:15  hpc401\r\n 1891     ucthi   CoreTestB   bob   R    0:15  hpc402\r\n 1888     ucthi   CoreTestB   bob   R    0:18  hpc401\r\n<\/pre>\n<p>User andy submits a job which won&#8217;t run as there are no resources free. As andy&#8217;s job has a higher priority sufficient jobs of bob&#8217;s are cancelled to allow it to run&#8230;<\/p>\n<pre>JOBID PARTITION        NAME  USER  ST    TIME  NODELIST\r\n 1889     ucthi   CoreTestB   bob  CG    0:00  hpc401\r\n 1890     ucthi   CoreTestB   bob  CG    0:00  hpc401\r\n 1888     ucthi   CoreTestB   bob  CG    0:00  hpc401\r\n 1894     ucthi   CoreTestA  andy  PD    0:00  (Resources)\r\n 1893     ucthi   CoreTestB   bob   R    0:13  hpc402\r\n 1892     ucthi   CoreTestB   bob   R    0:16  hpc402\r\n 1891     ucthi   CoreTestB   bob   R    0:19  hpc402\r\n<\/pre>\n<p>Once andy&#8217;s job is running bob&#8217;s jobs are requeued&#8230;<\/p>\n<pre>JOBID PARTITION        NAME  USER  ST    TIME  NODELIST\r\n 1888     ucthi   CoreTestB   bob  PD    0:00  (BeginTime)\r\n 1889     ucthi   CoreTestB   bob  PD    0:00  (BeginTime)\r\n 1890     ucthi   CoreTestB   bob  PD    0:00  (BeginTime)\r\n 1894     ucthi   CoreTestA  andy   R    0:03  hpc401\r\n 1893     ucthi   CoreTestB   bob   R    0:17  hpc402\r\n 1892     ucthi   CoreTestB   bob   R    0:20  hpc402\r\n 1891     ucthi   CoreTestB   bob   R    0:23  hpc402\r\n<\/pre>\n<p>After a short time the backfill scheduler allows one of bob&#8217;s restarted jobs to run. This is because andy&#8217;s job needs 2 nodes but bob&#8217;s jobs only need one node.<\/p>\n<pre>JOBID PARTITION        NAME  USER  ST    TIME  NODELIST\r\n 1890     ucthi   CoreTestB   bob  PD    0:00  (Priority)\r\n 1889     ucthi   CoreTestB   bob  PD    0:00  (Resources)\r\n 1894     ucthi   CoreTestA  andy   R    2:41  hpc401\r\n 1888     ucthi   CoreTestB   bob   R    2:16  hpc401\r\n 1893     ucthi   CoreTestB   bob   R    2:55  hpc402\r\n 1892     ucthi   CoreTestB   bob   R    2:58  hpc402\r\n 1891     ucthi   CoreTestB   bob   R    3:01  hpc402\r\n<\/pre>\n<p>Unfortunately this preemption mode is not compatible with thread suspension.<\/p>\n","protected":false},"excerpt":{"rendered":"<div>SLURM provides a preemption mechanism to deal with situations where cluster become overloaded. This can be configured in several ways:<\/div>\n<div><strong>FIFO:<\/strong><\/div>\n<div>This is the most simplistic method of queueing in which there is no preemption, jobs come in, queue and are dealt with in that order. Backfill scheduling is also enabled by default and this allows advanced scheduling of jobs as long as they won&#8217;t delay starting jobs ahead of them in the queue.<\/div>\n<div><strong>Preemption via Thread Control:<\/strong><\/div>\n<div>Here priority is based on the partition that the user submits jobs to. Core sharing is disabled and PreemptMode is set to SUSPEND. Below is a case where all cores used&#8230;<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1807 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:28 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1808 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:25 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1809 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:25 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1811 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;4:53 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1812 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;3:07 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1813 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;1:30 &nbsp;hpc401<\/span><\/div>\n<div>User andy submits a job to the ucthi partition&#8230;<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1808 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:34 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1809 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:34 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1811 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;5:02 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1812 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;3:16 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1813 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;1:39 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1814 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:01 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1807 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;5:36 &nbsp;hpc402<\/span><\/div>\n<div>One of bob&#8217;s jobs is now suspended while user andy&#8217;s job runs. However here bob&#8217;s job 1807 is still consuming RAM which may be a problem if user Andy&#8217;s job also needs lots of RAM. In the case where RAM is an issue it is best to cancel and resubmit low priority jobs.<\/div>\n<div>&nbsp;<\/div>\n<div>Here core sharing disabled and PreemptMode is set to REQUEUE. User bob is running jobs on all available cores on uctlo partition&#8230;<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:33 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:27 &nbsp;hpc402<\/span><\/div>\n<div>User andy starts submitting jobs to the ucthi partition, bob&#8217;s jobs are cancelled&#8230;<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob CG &nbsp; &nbsp;0:00 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:46 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1801 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:01 &nbsp;hpc401<\/span><\/div>\n<div>3 of bob&#8217;s jobs have been cancelled and resubmitted in pending state while andy&#8217;s 3 jobs are now running&#8230;<\/div>\n<div><span>&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)<\/span><\/div>\n<div><span>&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)<\/span><\/div>\n<div><span>&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)<\/span><\/div>\n<div><span>&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:59 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:59 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:56 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1801 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:11 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1802 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:06 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1803 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:02 &nbsp;hpc401<\/span><\/div>\n<div>In this instance it is possible to oversubscribe the cores on a node and hence memory limits need to be protected by setting memory as a consumable resource. With this scheduling methods jobs may still not run if memory is set as a consumable resource and there is insufficient RAM per core. A default RAM\/core\/job is set in the configuration file, but users can set there own requirements within the bounds of their account settings.<\/div>\n<div><\/div>\n<div><strong>Preemption via Gang scheduling:<\/strong><\/div>\n<div>Here users andy and bob have filled up all available cores and some of andy&#8217;s jobs are suspended.<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1815 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:50 &nbsp;hpc406<\/span><\/div>\n<div><span>&nbsp;1816 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc406<\/span><\/div>\n<div><span>&nbsp;1817 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc407<\/span><\/div>\n<div><span>&nbsp;1820 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:19 &nbsp;hpc407<\/span><\/div>\n<div><span>&nbsp;1821 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408<\/span><\/div>\n<div><span>&nbsp;1822 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408<\/span><\/div>\n<div><span>&nbsp;1823 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;S &nbsp; &nbsp;0:00 &nbsp;hpc406<\/span><\/div>\n<div><span>&nbsp;1824 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;S &nbsp; &nbsp;0:00 &nbsp;hpc406<\/span><\/div>\n<div>After a time slice has passed some of bob&#8217;s jobs are suspended and andy&#8217;s jobs run.<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1817 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc407<\/span><\/div>\n<div><span>&nbsp;1820 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:19 &nbsp;hpc407<\/span><\/div>\n<div><span>&nbsp;1821 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408<\/span><\/div>\n<div><span>&nbsp;1822 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408<\/span><\/div>\n<div><span>&nbsp;1823 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:00 &nbsp;hpc406<\/span><\/div>\n<div><span>&nbsp;1824 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:00 &nbsp;hpc406<\/span><\/div>\n<div><span>&nbsp;1815 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;0:50 &nbsp;hpc406<\/span><\/div>\n<div><span>&nbsp;1816 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;0:47 &nbsp;hpc406<\/span><\/div>\n<div>This is repeated until all jobs are completed. The time slice between job suspensions is 60 seconds by default. In the case of user andy submitting to a higher priority partition this scheduling scheme reverts to standard job preemption and bob&#8217;s jobs are suspended indefinitely until cores become free. Once again cores can be oversubscribed and memory needs to be protected.<\/div>\n<div><\/div>\n<div><strong>Multifactor preemption:<\/strong><\/div>\n<div>This is the most complex form of preemption. Here the job priority is based upon a complex factoring algorithm<\/div>\n<div><span>Job_priority =<\/span><\/div>\n<div><span>(PriorityWeightAge) * (age_factor) +<\/span><\/div>\n<div><span>(PriorityWeightFairshare) * (fair-share_factor) +<\/span><\/div>\n<div><span>(PriorityWeightJobSize) * (job_size_factor) +<\/span><\/div>\n<div><span>(PriorityWeightPartition) * (partition_factor) +<\/span><\/div>\n<div><span>(PriorityWeightQOS) * (QOS_factor)<\/span><\/div>\n<div><\/div>\n<div>You are encouraged to read the&nbsp;<a href=\"http:\/\/slurm.schedmd.com\/priority_multifactor.html\">official documentation on how this can be configured<\/a>. Below is a simple example based only on user QOS for priority. User bob is consuming all available cores&#8230;<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:09 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:12 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:18 &nbsp;hpc401<\/span><\/div>\n<div>User andy submits a job which won&#8217;t run as there are no resources free. As andy&#8217;s job has a higher priority sufficient jobs of bob&#8217;s are cancelled to allow it to run&#8230;<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Resources)<\/span><\/div>\n<div><span>&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:13 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:16 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:19 &nbsp;hpc402<\/span><\/div>\n<div>Once andy&#8217;s job is running bob&#8217;s jobs are requeued&#8230;<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)<\/span><\/div>\n<div><span>&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)<\/span><\/div>\n<div><span>&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)<\/span><\/div>\n<div><span>&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp; R &nbsp; &nbsp;0:03 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:17 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:20 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:23 &nbsp;hpc402<\/span><\/div>\n<div>After a short time the backfill scheduler allows one of bob&#8217;s restarted jobs to run. This is because andy&#8217;s job needs 2 nodes but bob&#8217;s jobs only need one node.<\/div>\n<div><span>JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST<\/span><\/div>\n<div><span>&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Priority)<\/span><\/div>\n<div><span>&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Resources)<\/span><\/div>\n<div><span>&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp; R &nbsp; &nbsp;2:41 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:16 &nbsp;hpc401<\/span><\/div>\n<div><span>&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:55 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:58 &nbsp;hpc402<\/span><\/div>\n<div><span>&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;3:01 &nbsp;hpc402<\/span><\/div>\n<div>Unfortunately this preemption mode is not compatible with thread suspension.<\/div>\n<div><\/div>\n<div><\/div>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,5],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>SLURM job preemption - UCT HPC<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"SLURM job preemption - UCT HPC\" \/>\n<meta property=\"og:description\" content=\"SLURM provides a preemption mechanism to deal with situations where cluster become overloaded. This can be configured in several ways:FIFO:This is the most simplistic method of queueing in which there is no preemption, jobs come in, queue and are dealt with in that order. Backfill scheduling is also enabled by default and this allows advanced scheduling of jobs as long as they won&#039;t delay starting jobs ahead of them in the queue.Preemption via Thread Control:Here priority is based on the partition that the user submits jobs to. Core sharing is disabled and PreemptMode is set to SUSPEND. Below is a case where all cores used...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1807 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:28 &nbsp;hpc402&nbsp;1808 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:25 &nbsp;hpc402&nbsp;1809 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:25 &nbsp;hpc402&nbsp;1811 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;4:53 &nbsp;hpc401&nbsp;1812 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;3:07 &nbsp;hpc401&nbsp;1813 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;1:30 &nbsp;hpc401User andy submits a job to the ucthi partition...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1808 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:34 &nbsp;hpc402&nbsp;1809 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:34 &nbsp;hpc402&nbsp;1811 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;5:02 &nbsp;hpc401&nbsp;1812 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;3:16 &nbsp;hpc401&nbsp;1813 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;1:39 &nbsp;hpc401&nbsp;1814 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:01 &nbsp;hpc402&nbsp;1807 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;5:36 &nbsp;hpc402One of bob&#039;s jobs is now suspended while user andy&#039;s job runs. However here bob&#039;s job 1807 is still consuming RAM which may be a problem if user Andy&#039;s job also needs lots of RAM. In the case where RAM is an issue it is best to cancel and resubmit low priority jobs.&nbsp;Here core sharing disabled and PreemptMode is set to REQUEUE. User bob is running jobs on all available cores on uctlo partition...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:33 &nbsp;hpc401&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc401&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc401&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc402&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc402&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:27 &nbsp;hpc402User andy starts submitting jobs to the ucthi partition, bob&#039;s jobs are cancelled...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc401&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc401&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc402&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc402&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:46 &nbsp;hpc402&nbsp;1801 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:01 &nbsp;hpc4013 of bob&#039;s jobs have been cancelled and resubmitted in pending state while andy&#039;s 3 jobs are now running...&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:59 &nbsp;hpc402&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:59 &nbsp;hpc402&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:56 &nbsp;hpc402&nbsp;1801 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:11 &nbsp;hpc401&nbsp;1802 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:06 &nbsp;hpc401&nbsp;1803 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:02 &nbsp;hpc401In this instance it is possible to oversubscribe the cores on a node and hence memory limits need to be protected by setting memory as a consumable resource. With this scheduling methods jobs may still not run if memory is set as a consumable resource and there is insufficient RAM per core. A default RAM\/core\/job is set in the configuration file, but users can set there own requirements within the bounds of their account settings.Preemption via Gang scheduling:Here users andy and bob have filled up all available cores and some of andy&#039;s jobs are suspended.JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1815 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:50 &nbsp;hpc406&nbsp;1816 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc406&nbsp;1817 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc407&nbsp;1820 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:19 &nbsp;hpc407&nbsp;1821 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1822 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1823 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;S &nbsp; &nbsp;0:00 &nbsp;hpc406&nbsp;1824 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;S &nbsp; &nbsp;0:00 &nbsp;hpc406After a time slice has passed some of bob&#039;s jobs are suspended and andy&#039;s jobs run.JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1817 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc407&nbsp;1820 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:19 &nbsp;hpc407&nbsp;1821 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1822 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1823 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:00 &nbsp;hpc406&nbsp;1824 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:00 &nbsp;hpc406&nbsp;1815 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;0:50 &nbsp;hpc406&nbsp;1816 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;0:47 &nbsp;hpc406This is repeated until all jobs are completed. The time slice between job suspensions is 60 seconds by default. In the case of user andy submitting to a higher priority partition this scheduling scheme reverts to standard job preemption and bob&#039;s jobs are suspended indefinitely until cores become free. Once again cores can be oversubscribed and memory needs to be protected.Multifactor preemption:This is the most complex form of preemption. Here the job priority is based upon a complex factoring algorithmJob_priority =(PriorityWeightAge) * (age_factor) +(PriorityWeightFairshare) * (fair-share_factor) +(PriorityWeightJobSize) * (job_size_factor) +(PriorityWeightPartition) * (partition_factor) +(PriorityWeightQOS) * (QOS_factor)You are encouraged to read the&nbsp;official documentation on how this can be configured. Below is a simple example based only on user QOS for priority. User bob is consuming all available cores...JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:09 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:12 &nbsp;hpc402&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc401&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc401&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc402&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:18 &nbsp;hpc401User andy submits a job which won&#039;t run as there are no resources free. As andy&#039;s job has a higher priority sufficient jobs of bob&#039;s are cancelled to allow it to run...JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Resources)&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:13 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:16 &nbsp;hpc402&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:19 &nbsp;hpc402Once andy&#039;s job is running bob&#039;s jobs are requeued...JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp; R &nbsp; &nbsp;0:03 &nbsp;hpc401&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:17 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:20 &nbsp;hpc402&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:23 &nbsp;hpc402After a short time the backfill scheduler allows one of bob&#039;s restarted jobs to run. This is because andy&#039;s job needs 2 nodes but bob&#039;s jobs only need one node.JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Priority)&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Resources)&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp; R &nbsp; &nbsp;2:41 &nbsp;hpc401&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:16 &nbsp;hpc401&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:55 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:58 &nbsp;hpc402&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;3:01 &nbsp;hpc402Unfortunately this preemption mode is not compatible with thread suspension.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/\" \/>\n<meta property=\"og:site_name\" content=\"UCT HPC\" \/>\n<meta property=\"article:published_time\" content=\"2015-06-19T11:53:40+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-08-14T08:01:18+00:00\" \/>\n<meta name=\"author\" content=\"Andrew Lewis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Lewis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/\"},\"author\":{\"name\":\"Andrew Lewis\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\"},\"headline\":\"SLURM job preemption\",\"datePublished\":\"2015-06-19T11:53:40+00:00\",\"dateModified\":\"2015-08-14T08:01:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/\"},\"wordCount\":534,\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"articleSection\":[\"hpc\",\"SLURM\"],\"inLanguage\":\"en-ZA\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/\",\"name\":\"SLURM job preemption - UCT HPC\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\"},\"datePublished\":\"2015-06-19T11:53:40+00:00\",\"dateModified\":\"2015-08-14T08:01:18+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/#breadcrumb\"},\"inLanguage\":\"en-ZA\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucthpc.uct.ac.za\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"SLURM job preemption\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"name\":\"UCT HPC\",\"description\":\"University of Cape Town High Performance Computing\",\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-ZA\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\",\"name\":\"University of Cape Town High Performance Computing\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"contentUrl\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"width\":450,\"height\":423,\"caption\":\"University of Cape Town High Performance Computing\"},\"image\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\",\"name\":\"Andrew Lewis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"caption\":\"Andrew Lewis\"},\"sameAs\":[\"http:\/\/blogs.uct.ac.za\/blog\/big-bytes\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"SLURM job preemption - UCT HPC","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/","og_locale":"en_US","og_type":"article","og_title":"SLURM job preemption - UCT HPC","og_description":"SLURM provides a preemption mechanism to deal with situations where cluster become overloaded. This can be configured in several ways:FIFO:This is the most simplistic method of queueing in which there is no preemption, jobs come in, queue and are dealt with in that order. Backfill scheduling is also enabled by default and this allows advanced scheduling of jobs as long as they won't delay starting jobs ahead of them in the queue.Preemption via Thread Control:Here priority is based on the partition that the user submits jobs to. Core sharing is disabled and PreemptMode is set to SUSPEND. Below is a case where all cores used...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1807 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:28 &nbsp;hpc402&nbsp;1808 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:25 &nbsp;hpc402&nbsp;1809 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:25 &nbsp;hpc402&nbsp;1811 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;4:53 &nbsp;hpc401&nbsp;1812 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;3:07 &nbsp;hpc401&nbsp;1813 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;1:30 &nbsp;hpc401User andy submits a job to the ucthi partition...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1808 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:34 &nbsp;hpc402&nbsp;1809 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;5:34 &nbsp;hpc402&nbsp;1811 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;5:02 &nbsp;hpc401&nbsp;1812 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;3:16 &nbsp;hpc401&nbsp;1813 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;1:39 &nbsp;hpc401&nbsp;1814 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:01 &nbsp;hpc402&nbsp;1807 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;5:36 &nbsp;hpc402One of bob's jobs is now suspended while user andy's job runs. However here bob's job 1807 is still consuming RAM which may be a problem if user Andy's job also needs lots of RAM. In the case where RAM is an issue it is best to cancel and resubmit low priority jobs.&nbsp;Here core sharing disabled and PreemptMode is set to REQUEUE. User bob is running jobs on all available cores on uctlo partition...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:33 &nbsp;hpc401&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc401&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc401&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc402&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:30 &nbsp;hpc402&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:27 &nbsp;hpc402User andy starts submitting jobs to the ucthi partition, bob's jobs are cancelled...JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc401&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc401&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc402&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:49 &nbsp;hpc402&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:46 &nbsp;hpc402&nbsp;1801 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:01 &nbsp;hpc4013 of bob's jobs have been cancelled and resubmitted in pending state while andy's 3 jobs are now running...&nbsp;1794 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1795 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1796 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1797 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:59 &nbsp;hpc402&nbsp;1798 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:59 &nbsp;hpc402&nbsp;1799 &nbsp; &nbsp; uctlo MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:56 &nbsp;hpc402&nbsp;1801 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:11 &nbsp;hpc401&nbsp;1802 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:06 &nbsp;hpc401&nbsp;1803 &nbsp; &nbsp; ucthi MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:02 &nbsp;hpc401In this instance it is possible to oversubscribe the cores on a node and hence memory limits need to be protected by setting memory as a consumable resource. With this scheduling methods jobs may still not run if memory is set as a consumable resource and there is insufficient RAM per core. A default RAM\/core\/job is set in the configuration file, but users can set there own requirements within the bounds of their account settings.Preemption via Gang scheduling:Here users andy and bob have filled up all available cores and some of andy's jobs are suspended.JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1815 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:50 &nbsp;hpc406&nbsp;1816 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc406&nbsp;1817 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc407&nbsp;1820 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:19 &nbsp;hpc407&nbsp;1821 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1822 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1823 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;S &nbsp; &nbsp;0:00 &nbsp;hpc406&nbsp;1824 &nbsp;ucthimem MemTestA &nbsp; alewis &nbsp;S &nbsp; &nbsp;0:00 &nbsp;hpc406After a time slice has passed some of bob's jobs are suspended and andy's jobs run.JOBID PARTITION &nbsp; &nbsp; NAME &nbsp; &nbsp; USER ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1817 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;R &nbsp; &nbsp;0:47 &nbsp;hpc407&nbsp;1820 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:19 &nbsp;hpc407&nbsp;1821 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1822 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:16 &nbsp;hpc408&nbsp;1823 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:00 &nbsp;hpc406&nbsp;1824 &nbsp;ucthimem MemTestA &nbsp; &nbsp; andy &nbsp;R &nbsp; &nbsp;0:00 &nbsp;hpc406&nbsp;1815 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;0:50 &nbsp;hpc406&nbsp;1816 &nbsp;ucthimem MemTestB &nbsp; &nbsp; &nbsp;bob &nbsp;S &nbsp; &nbsp;0:47 &nbsp;hpc406This is repeated until all jobs are completed. The time slice between job suspensions is 60 seconds by default. In the case of user andy submitting to a higher priority partition this scheduling scheme reverts to standard job preemption and bob's jobs are suspended indefinitely until cores become free. Once again cores can be oversubscribed and memory needs to be protected.Multifactor preemption:This is the most complex form of preemption. Here the job priority is based upon a complex factoring algorithmJob_priority =(PriorityWeightAge) * (age_factor) +(PriorityWeightFairshare) * (fair-share_factor) +(PriorityWeightJobSize) * (job_size_factor) +(PriorityWeightPartition) * (partition_factor) +(PriorityWeightQOS) * (QOS_factor)You are encouraged to read the&nbsp;official documentation on how this can be configured. Below is a simple example based only on user QOS for priority. User bob is consuming all available cores...JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:09 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:12 &nbsp;hpc402&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc401&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc401&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:15 &nbsp;hpc402&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:18 &nbsp;hpc401User andy submits a job which won't run as there are no resources free. As andy's job has a higher priority sufficient jobs of bob's are cancelled to allow it to run...JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;CG &nbsp; &nbsp;0:00 &nbsp;hpc401&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Resources)&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:13 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:16 &nbsp;hpc402&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:19 &nbsp;hpc402Once andy's job is running bob's jobs are requeued...JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(BeginTime)&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp; R &nbsp; &nbsp;0:03 &nbsp;hpc401&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:17 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:20 &nbsp;hpc402&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;0:23 &nbsp;hpc402After a short time the backfill scheduler allows one of bob's restarted jobs to run. This is because andy's job needs 2 nodes but bob's jobs only need one node.JOBID PARTITION &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp;USER &nbsp;ST &nbsp; &nbsp;TIME &nbsp;NODELIST&nbsp;1890 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Priority)&nbsp;1889 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp;PD &nbsp; &nbsp;0:00 &nbsp;(Resources)&nbsp;1894 &nbsp; &nbsp; ucthi &nbsp; CoreTestA &nbsp;andy &nbsp; R &nbsp; &nbsp;2:41 &nbsp;hpc401&nbsp;1888 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:16 &nbsp;hpc401&nbsp;1893 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:55 &nbsp;hpc402&nbsp;1892 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;2:58 &nbsp;hpc402&nbsp;1891 &nbsp; &nbsp; ucthi &nbsp; CoreTestB &nbsp; bob &nbsp; R &nbsp; &nbsp;3:01 &nbsp;hpc402Unfortunately this preemption mode is not compatible with thread suspension.","og_url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/","og_site_name":"UCT HPC","article_published_time":"2015-06-19T11:53:40+00:00","article_modified_time":"2015-08-14T08:01:18+00:00","author":"Andrew Lewis","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Lewis","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/#article","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/"},"author":{"name":"Andrew Lewis","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e"},"headline":"SLURM job preemption","datePublished":"2015-06-19T11:53:40+00:00","dateModified":"2015-08-14T08:01:18+00:00","mainEntityOfPage":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/"},"wordCount":534,"publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"articleSection":["hpc","SLURM"],"inLanguage":"en-ZA"},{"@type":"WebPage","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/","url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/","name":"SLURM job preemption - UCT HPC","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/#website"},"datePublished":"2015-06-19T11:53:40+00:00","dateModified":"2015-08-14T08:01:18+00:00","breadcrumb":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/#breadcrumb"},"inLanguage":"en-ZA","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2015\/06\/19\/slurm-job-preemption\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucthpc.uct.ac.za\/"},{"@type":"ListItem","position":2,"name":"SLURM job preemption"}]},{"@type":"WebSite","@id":"https:\/\/ucthpc.uct.ac.za\/#website","url":"https:\/\/ucthpc.uct.ac.za\/","name":"UCT HPC","description":"University of Cape Town High Performance Computing","publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-ZA"},{"@type":"Organization","@id":"https:\/\/ucthpc.uct.ac.za\/#organization","name":"University of Cape Town High Performance Computing","url":"https:\/\/ucthpc.uct.ac.za\/","logo":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/","url":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","contentUrl":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","width":450,"height":423,"caption":"University of Cape Town High Performance Computing"},"image":{"@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e","name":"Andrew Lewis","image":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","caption":"Andrew Lewis"},"sameAs":["http:\/\/blogs.uct.ac.za\/blog\/big-bytes"]}]}},"_links":{"self":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/335"}],"collection":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/comments?post=335"}],"version-history":[{"count":4,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/335\/revisions"}],"predecessor-version":[{"id":2019,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/335\/revisions\/2019"}],"wp:attachment":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/media?parent=335"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/categories?post=335"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/tags?post=335"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}