{"id":817,"date":"2013-08-29T13:07:37","date_gmt":"2013-08-29T11:07:37","guid":{"rendered":"http:\/\/oldblogs.uct.ac.za\/blog\/big-bytes\/2013\/08\/29\/asymmetrical-core-reqeuest"},"modified":"2015-08-14T11:43:15","modified_gmt":"2015-08-14T09:43:15","slug":"asymmetrical-core-request","status":"publish","type":"post","link":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/","title":{"rendered":"Asymmetrical core request"},"content":{"rendered":"<div>One of the frustrations users experience on a cluster made up of large numbers of homogeneous servers is that they often share this space with a diverse population of researchers and software. \u00a0While it's possible to divide resources into groups to segregate usage patterns this will always lead to some wasting of resources. \u00a0Conversely allowing users to submit jobs to any server can lead to the following situation with parallel jobs: a user needs 90 cores and hence needs to split their job over 3 servers. \u00a0Logically this would mean 30 cores per server using the following directive:<\/div>\r\n<div><span style=\"font-family: arial, helvetica, sans-serif;\"><strong>-l nodes=3:ppn=30<\/strong><\/span><\/div>\r\n<div>This works fine as long as there are 30 cores free per server. \u00a0However in a situation where servers 1 to 2 have 40 cores free each but server 3 has only 15 cores free, in this case the job will hang until server 3 has 30 cores free, but by then servers 1 to 2 may be running other jobs and hence the user may wait a long time for their requirements to be met and the job to run. \u00a0One way around this would be to tailor a directive for this specific situation:<\/div>\r\n<div><span style=\"font-family: arial, helvetica, sans-serif;\"><strong>-l nodes=server1:ppn=40+server2:ppn=40+server3:ppn=10<\/strong><\/span><\/div>\r\n<div>This will work but is tedious to set up, inelegant and impractical if the user wants to run a large number of jobs. \u00a0Torque supports a directive strategy where one can request the same number of cores as nodes:<\/div>\r\n<div><strong><span style=\"font-family: arial, helvetica, sans-serif;\">-l nodes=90<\/span><\/strong><\/div>\r\n<div>which actually provides 90 cores not 90 nodes. \u00a0This works fine as long as the number of nodes (cores) requested does not exceed the actual number of nodes (servers) in the cluster. \u00a0In the above case as soon as the user requested -l nodes=5 the request is denied as Torque considers this to exceed the number of resources. \u00a0There is a hack for this however, if one sets the following parameter in QMGR:<\/div>\r\n<div><strong><span style=\"font-family: arial, helvetica, sans-serif;\">set server resources_available.nodect=N<\/span><\/strong><\/div>\r\n<div><strong><span style=\"font-family: arial, helvetica, sans-serif;\">set queue UCTlong resources_available.nodect=N<\/span><\/strong><\/div>\r\n<div>Where N is the total number of cores in the cluster. \u00a0Now the user can request <strong>-l nodes=90<\/strong> and Torque provides 90 cores according to the queuing strategy. \u00a0One needs to be aware that distribution of the job over servers will be automatic hence total memory usage patterns and overall network latency will potentially be unpredictable.<\/div>\r\n<div>Many thanks to Graham Inggs from UCT's Chemical Engineering department for this work around.<\/div>","protected":false},"excerpt":{"rendered":"<div>One of the frustrations users experience on a cluster made up of large numbers of homogeneous servers is that they often share this space with a diverse population of researchers and software. &nbsp;While it&#8217;s possible to divide resources into groups to segregate usage patterns this will always lead to some wasting of resources. &nbsp;Conversely allowing users to submit jobs to any server can lead to the following situation with parallel jobs: a user needs 90 cores and hence needs to split their job over 3 servers. &nbsp;Logically this would mean 30 cores per server using the following directive:<\/div>\n<div><span><strong>-l nodes=3:ppn=30<\/strong><\/span><\/div>\n<div>This works fine as long as there are 30 cores free per server. &nbsp;However in a situation where servers 1 to 2 have 40 cores free each but server 3 has only 15 cores free, in this case the job will hang until server 3 has 30 cores free, but by then servers 1 to 2 may be running other jobs and hence the user may wait a long time for their requirements to be met and the job to run. &nbsp;One way around this would be to tailor a directive for this specific situation: &nbsp;<\/div>\n<div><span><strong>-l nodes=server1:ppn=40+server2:ppn=40+server3:ppn=10<\/strong><\/span><\/div>\n<div>This will work but is tedious to set up, inelegant and impractical if the user wants to run a large number of jobs. &nbsp;Torque supports a directive strategy where one can request the same number of cores as nodes:<\/div>\n<div><strong><span>-l nodes=90<\/span><\/strong><\/div>\n<div>which actually provides 90 cores not 90 nodes. &nbsp;This works fine as long as the number of nodes (cores) requested does not exceed the actual number of nodes (servers) in the cluster. &nbsp;In the above case as soon as the user requested -l nodes=5 the request is denied as Torque considers this to exceed the number of resources. &nbsp;There is a hack for this however, if one sets the following parameter in QMGR:<\/div>\n<div><strong><span>set server resources_available.nodect=N<\/span><\/strong><\/div>\n<div><strong><span>set queue UCTlong resources_available.nodect=N<\/span><\/strong>&nbsp;<\/div>\n<div>Where N is the total number of cores in the cluster. &nbsp;Now the user can request <strong>-l nodes=90<\/strong> and Torque provides 90 cores according to the queuing strategy. &nbsp;One needs to be aware that distribution of the job over servers will be automatic hence total memory usage patterns and overall network latency will potentially be unpredictable.<\/div>\n<div>Many thanks to Graham Inggs from UCT&#8217;s Chemical Engineering department for this work around.<\/div>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Asymmetrical core request - UCT HPC<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Asymmetrical core request - UCT HPC\" \/>\n<meta property=\"og:description\" content=\"One of the frustrations users experience on a cluster made up of large numbers of homogeneous servers is that they often share this space with a diverse population of researchers and software. &nbsp;While it&#039;s possible to divide resources into groups to segregate usage patterns this will always lead to some wasting of resources. &nbsp;Conversely allowing users to submit jobs to any server can lead to the following situation with parallel jobs: a user needs 90 cores and hence needs to split their job over 3 servers. &nbsp;Logically this would mean 30 cores per server using the following directive:-l nodes=3:ppn=30This works fine as long as there are 30 cores free per server. &nbsp;However in a situation where servers 1 to 2 have 40 cores free each but server 3 has only 15 cores free, in this case the job will hang until server 3 has 30 cores free, but by then servers 1 to 2 may be running other jobs and hence the user may wait a long time for their requirements to be met and the job to run. &nbsp;One way around this would be to tailor a directive for this specific situation: &nbsp;-l nodes=server1:ppn=40+server2:ppn=40+server3:ppn=10This will work but is tedious to set up, inelegant and impractical if the user wants to run a large number of jobs. &nbsp;Torque supports a directive strategy where one can request the same number of cores as nodes:-l nodes=90which actually provides 90 cores not 90 nodes. &nbsp;This works fine as long as the number of nodes (cores) requested does not exceed the actual number of nodes (servers) in the cluster. &nbsp;In the above case as soon as the user requested -l nodes=5 the request is denied as Torque considers this to exceed the number of resources. &nbsp;There is a hack for this however, if one sets the following parameter in QMGR:set server resources_available.nodect=Nset queue UCTlong resources_available.nodect=N&nbsp;Where N is the total number of cores in the cluster. &nbsp;Now the user can request -l nodes=90 and Torque provides 90 cores according to the queuing strategy. &nbsp;One needs to be aware that distribution of the job over servers will be automatic hence total memory usage patterns and overall network latency will potentially be unpredictable.Many thanks to Graham Inggs from UCT&#039;s Chemical Engineering department for this work around.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/\" \/>\n<meta property=\"og:site_name\" content=\"UCT HPC\" \/>\n<meta property=\"article:published_time\" content=\"2013-08-29T11:07:37+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-08-14T09:43:15+00:00\" \/>\n<meta name=\"author\" content=\"Andrew Lewis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Lewis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/\"},\"author\":{\"name\":\"Andrew Lewis\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\"},\"headline\":\"Asymmetrical core request\",\"datePublished\":\"2013-08-29T11:07:37+00:00\",\"dateModified\":\"2015-08-14T09:43:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/\"},\"wordCount\":388,\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"articleSection\":[\"hpc\"],\"inLanguage\":\"en-ZA\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/\",\"name\":\"Asymmetrical core request - UCT HPC\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\"},\"datePublished\":\"2013-08-29T11:07:37+00:00\",\"dateModified\":\"2015-08-14T09:43:15+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/#breadcrumb\"},\"inLanguage\":\"en-ZA\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucthpc.uct.ac.za\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Asymmetrical core request\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"name\":\"UCT HPC\",\"description\":\"University of Cape Town High Performance Computing\",\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-ZA\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\",\"name\":\"University of Cape Town High Performance Computing\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"contentUrl\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"width\":450,\"height\":423,\"caption\":\"University of Cape Town High Performance Computing\"},\"image\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\",\"name\":\"Andrew Lewis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"caption\":\"Andrew Lewis\"},\"sameAs\":[\"http:\/\/blogs.uct.ac.za\/blog\/big-bytes\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Asymmetrical core request - UCT HPC","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/","og_locale":"en_US","og_type":"article","og_title":"Asymmetrical core request - UCT HPC","og_description":"One of the frustrations users experience on a cluster made up of large numbers of homogeneous servers is that they often share this space with a diverse population of researchers and software. &nbsp;While it's possible to divide resources into groups to segregate usage patterns this will always lead to some wasting of resources. &nbsp;Conversely allowing users to submit jobs to any server can lead to the following situation with parallel jobs: a user needs 90 cores and hence needs to split their job over 3 servers. &nbsp;Logically this would mean 30 cores per server using the following directive:-l nodes=3:ppn=30This works fine as long as there are 30 cores free per server. &nbsp;However in a situation where servers 1 to 2 have 40 cores free each but server 3 has only 15 cores free, in this case the job will hang until server 3 has 30 cores free, but by then servers 1 to 2 may be running other jobs and hence the user may wait a long time for their requirements to be met and the job to run. &nbsp;One way around this would be to tailor a directive for this specific situation: &nbsp;-l nodes=server1:ppn=40+server2:ppn=40+server3:ppn=10This will work but is tedious to set up, inelegant and impractical if the user wants to run a large number of jobs. &nbsp;Torque supports a directive strategy where one can request the same number of cores as nodes:-l nodes=90which actually provides 90 cores not 90 nodes. &nbsp;This works fine as long as the number of nodes (cores) requested does not exceed the actual number of nodes (servers) in the cluster. &nbsp;In the above case as soon as the user requested -l nodes=5 the request is denied as Torque considers this to exceed the number of resources. &nbsp;There is a hack for this however, if one sets the following parameter in QMGR:set server resources_available.nodect=Nset queue UCTlong resources_available.nodect=N&nbsp;Where N is the total number of cores in the cluster. &nbsp;Now the user can request -l nodes=90 and Torque provides 90 cores according to the queuing strategy. &nbsp;One needs to be aware that distribution of the job over servers will be automatic hence total memory usage patterns and overall network latency will potentially be unpredictable.Many thanks to Graham Inggs from UCT's Chemical Engineering department for this work around.","og_url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/","og_site_name":"UCT HPC","article_published_time":"2013-08-29T11:07:37+00:00","article_modified_time":"2015-08-14T09:43:15+00:00","author":"Andrew Lewis","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Lewis","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/#article","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/"},"author":{"name":"Andrew Lewis","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e"},"headline":"Asymmetrical core request","datePublished":"2013-08-29T11:07:37+00:00","dateModified":"2015-08-14T09:43:15+00:00","mainEntityOfPage":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/"},"wordCount":388,"publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"articleSection":["hpc"],"inLanguage":"en-ZA"},{"@type":"WebPage","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/","url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/","name":"Asymmetrical core request - UCT HPC","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/#website"},"datePublished":"2013-08-29T11:07:37+00:00","dateModified":"2015-08-14T09:43:15+00:00","breadcrumb":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/#breadcrumb"},"inLanguage":"en-ZA","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/08\/29\/asymmetrical-core-request\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucthpc.uct.ac.za\/"},{"@type":"ListItem","position":2,"name":"Asymmetrical core request"}]},{"@type":"WebSite","@id":"https:\/\/ucthpc.uct.ac.za\/#website","url":"https:\/\/ucthpc.uct.ac.za\/","name":"UCT HPC","description":"University of Cape Town High Performance Computing","publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-ZA"},{"@type":"Organization","@id":"https:\/\/ucthpc.uct.ac.za\/#organization","name":"University of Cape Town High Performance Computing","url":"https:\/\/ucthpc.uct.ac.za\/","logo":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/","url":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","contentUrl":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","width":450,"height":423,"caption":"University of Cape Town High Performance Computing"},"image":{"@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e","name":"Andrew Lewis","image":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","caption":"Andrew Lewis"},"sameAs":["http:\/\/blogs.uct.ac.za\/blog\/big-bytes"]}]}},"_links":{"self":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/817"}],"collection":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/comments?post=817"}],"version-history":[{"count":2,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/817\/revisions"}],"predecessor-version":[{"id":2081,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/817\/revisions\/2081"}],"wp:attachment":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/media?parent=817"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/categories?post=817"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/tags?post=817"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}