{"id":1110,"date":"2011-05-31T00:19:22","date_gmt":"2011-05-30T22:19:22","guid":{"rendered":"http:\/\/oldblogs.uct.ac.za\/blog\/big-bytes\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster"},"modified":"2015-08-14T14:04:04","modified_gmt":"2015-08-14T12:04:04","slug":"odd-queuing-behaviour-in-grid-cluster","status":"publish","type":"post","link":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/","title":{"rendered":"Odd queuing behaviour in Grid cluster"},"content":{"rendered":"This evening we observed some weirdness on our Grid cluster.\u00a0 Users could submit jobs but they were immediately queued.\u00a0 Initially it was suspected that only one worker node was affected, however we soon realized that all three worker nodes were exhibiting the same issue.\u00a0 Oddly some jobs (short term test jobs submitted via EUMed) were running.\r\n\r\nRestarting the pbs_server daemon on the head node had no effect, other than to cause all worker nodes to register a down status.\u00a0 Checking the worker nodes revealed that all pbs_mom daemons were in a running but dead state.\u00a0 Restarting all pbs_mom daemons allowed some jobs to be submitted, however this was only on 2 of the worker nodes.\u00a0 It was then noted that there was an old SAGrid job that was still in a queued state from several days ago.\u00a0 Killing this job put the queues back into a happy state.\r\n\r\nNot sure exactly what the issue was, possibly a malformed queue submission or JDL causing a hang up in the scheduler.\u00a0 Currently we are considering increasing the level of monitoring to test the state of the pbs daemons.","protected":false},"excerpt":{"rendered":"<p>    This evening we observed some weirdness on our Grid cluster.&nbsp; Users could submit jobs but they were immediately queued.&nbsp; Initially it was suspected that only one worker node was affected, however we soon realized that all three worker no&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[17,14],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Odd queuing behaviour in Grid cluster - UCT HPC<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Odd queuing behaviour in Grid cluster - UCT HPC\" \/>\n<meta property=\"og:description\" content=\"This evening we observed some weirdness on our Grid cluster.&nbsp; Users could submit jobs but they were immediately queued.&nbsp; Initially it was suspected that only one worker node was affected, however we soon realized that all three worker no...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/\" \/>\n<meta property=\"og:site_name\" content=\"UCT HPC\" \/>\n<meta property=\"article:published_time\" content=\"2011-05-30T22:19:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-08-14T12:04:04+00:00\" \/>\n<meta name=\"author\" content=\"Andrew Lewis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Lewis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/\"},\"author\":{\"name\":\"Andrew Lewis\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\"},\"headline\":\"Odd queuing behaviour in Grid cluster\",\"datePublished\":\"2011-05-30T22:19:22+00:00\",\"dateModified\":\"2015-08-14T12:04:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/\"},\"wordCount\":194,\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"articleSection\":[\"gLite\",\"torque\"],\"inLanguage\":\"en-ZA\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/\",\"name\":\"Odd queuing behaviour in Grid cluster - UCT HPC\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\"},\"datePublished\":\"2011-05-30T22:19:22+00:00\",\"dateModified\":\"2015-08-14T12:04:04+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/#breadcrumb\"},\"inLanguage\":\"en-ZA\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucthpc.uct.ac.za\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Odd queuing behaviour in Grid cluster\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"name\":\"UCT HPC\",\"description\":\"University of Cape Town High Performance Computing\",\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-ZA\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\",\"name\":\"University of Cape Town High Performance Computing\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"contentUrl\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"width\":450,\"height\":423,\"caption\":\"University of Cape Town High Performance Computing\"},\"image\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\",\"name\":\"Andrew Lewis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"caption\":\"Andrew Lewis\"},\"sameAs\":[\"http:\/\/blogs.uct.ac.za\/blog\/big-bytes\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Odd queuing behaviour in Grid cluster - UCT HPC","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/","og_locale":"en_US","og_type":"article","og_title":"Odd queuing behaviour in Grid cluster - UCT HPC","og_description":"This evening we observed some weirdness on our Grid cluster.&nbsp; Users could submit jobs but they were immediately queued.&nbsp; Initially it was suspected that only one worker node was affected, however we soon realized that all three worker no...","og_url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/","og_site_name":"UCT HPC","article_published_time":"2011-05-30T22:19:22+00:00","article_modified_time":"2015-08-14T12:04:04+00:00","author":"Andrew Lewis","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Lewis","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/#article","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/"},"author":{"name":"Andrew Lewis","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e"},"headline":"Odd queuing behaviour in Grid cluster","datePublished":"2011-05-30T22:19:22+00:00","dateModified":"2015-08-14T12:04:04+00:00","mainEntityOfPage":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/"},"wordCount":194,"publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"articleSection":["gLite","torque"],"inLanguage":"en-ZA"},{"@type":"WebPage","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/","url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/","name":"Odd queuing behaviour in Grid cluster - UCT HPC","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/#website"},"datePublished":"2011-05-30T22:19:22+00:00","dateModified":"2015-08-14T12:04:04+00:00","breadcrumb":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/#breadcrumb"},"inLanguage":"en-ZA","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2011\/05\/31\/odd-queuing-behaviour-in-grid-cluster\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucthpc.uct.ac.za\/"},{"@type":"ListItem","position":2,"name":"Odd queuing behaviour in Grid cluster"}]},{"@type":"WebSite","@id":"https:\/\/ucthpc.uct.ac.za\/#website","url":"https:\/\/ucthpc.uct.ac.za\/","name":"UCT HPC","description":"University of Cape Town High Performance Computing","publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-ZA"},{"@type":"Organization","@id":"https:\/\/ucthpc.uct.ac.za\/#organization","name":"University of Cape Town High Performance Computing","url":"https:\/\/ucthpc.uct.ac.za\/","logo":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/","url":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","contentUrl":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","width":450,"height":423,"caption":"University of Cape Town High Performance Computing"},"image":{"@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e","name":"Andrew Lewis","image":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","caption":"Andrew Lewis"},"sameAs":["http:\/\/blogs.uct.ac.za\/blog\/big-bytes"]}]}},"_links":{"self":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/1110"}],"collection":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/comments?post=1110"}],"version-history":[{"count":2,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/1110\/revisions"}],"predecessor-version":[{"id":2237,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/1110\/revisions\/2237"}],"wp:attachment":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/media?parent=1110"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/categories?post=1110"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/tags?post=1110"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}