{"id":2630,"date":"2016-01-13T10:27:51","date_gmt":"2016-01-13T08:27:51","guid":{"rendered":"http:\/\/srvcnthpc001.uct.ac.za\/?p=2630"},"modified":"2016-01-13T15:30:43","modified_gmt":"2016-01-13T13:30:43","slug":"jan-2016-uct-hpc-maintenance","status":"publish","type":"post","link":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/","title":{"rendered":"Jan 2016 UCT HPC maintenance"},"content":{"rendered":"<p>What follows is a critical report on the cluster upgrade. Our plan was to upgrade the cluster operating systems on all servers and to bring the FHGFS file system up to the latest supported release, BGFS. Additionally we planned to upgrade to the latest Mellanox and Cuda drivers for our Infiniband and GPU cards respectively. The cluster was taken off-line on Monday morning at 09:00 and was back up again by Tuesday afternoon.<\/p>\n<p><strong>What went right<\/strong><br \/>\n&#8211; All servers were upgraded from SLES11 sp3 to sp4 with the exception of node 600.<br \/>\n&#8211; All Infiniband and GPU drivers were upgraded.<\/p>\n<p><strong>What went wrong<\/strong><br \/>\n&#8211; During the upgrade worker node 600 experienced an error and the operating system was damaged. This is being re-installed.<br \/>\n&#8211; Several minor issues were experienced in upgrading the Infiniband drivers but these were resolved.<br \/>\n&#8211; The FHGFS upgrade did not go as planned. The new release (BGFS) no longer includes several critical features which we require and the compile of the SLES client daemon failed. We have reverted to the older version (FHGFS) but at a slightly higher patch level.<\/p>\n<p><strong>Additional information<\/strong><br \/>\nWe have added the latest version of OpenMPI to the cluster. The natively installed version of openmpi is no longer available in sp4. This is actually a good thing as 1.6.5 is ancient, however it means that without modification to scripts or profiles users&#8217; mpi jobs will fail. To get around this issue users need to specify the version of mpi they need by placing one of the following line in their .bashrc file:<br \/>\n<em>module add mpi\/openmpi-1.8.8<br \/>\nmodule add mpi\/openmpi-1.10.1<\/em><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What follows is a critical report on the cluster upgrade. Our plan was to upgrade the cluster operating systems on all servers and to bring the FHGFS file system up to the latest supported release, BGFS. Additionally we planned to upgrade to the latest Mellanox and Cuda drivers for our Infiniband and GPU cards respectively&#8230;.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[9,10,11],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Jan 2016 UCT HPC maintenance - UCT HPC<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Jan 2016 UCT HPC maintenance - UCT HPC\" \/>\n<meta property=\"og:description\" content=\"What follows is a critical report on the cluster upgrade. Our plan was to upgrade the cluster operating systems on all servers and to bring the FHGFS file system up to the latest supported release, BGFS. Additionally we planned to upgrade to the latest Mellanox and Cuda drivers for our Infiniband and GPU cards respectively....\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/\" \/>\n<meta property=\"og:site_name\" content=\"UCT HPC\" \/>\n<meta property=\"article:published_time\" content=\"2016-01-13T08:27:51+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-01-13T13:30:43+00:00\" \/>\n<meta name=\"author\" content=\"Andrew Lewis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Lewis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/\"},\"author\":{\"name\":\"Andrew Lewis\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\"},\"headline\":\"Jan 2016 UCT HPC maintenance\",\"datePublished\":\"2016-01-13T08:27:51+00:00\",\"dateModified\":\"2016-01-13T13:30:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/\"},\"wordCount\":270,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"articleSection\":[\"application\",\"MPI\",\"operating system\"],\"inLanguage\":\"en-ZA\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/\",\"name\":\"Jan 2016 UCT HPC maintenance - UCT HPC\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\"},\"datePublished\":\"2016-01-13T08:27:51+00:00\",\"dateModified\":\"2016-01-13T13:30:43+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#breadcrumb\"},\"inLanguage\":\"en-ZA\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucthpc.uct.ac.za\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Jan 2016 UCT HPC maintenance\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"name\":\"UCT HPC\",\"description\":\"University of Cape Town High Performance Computing\",\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-ZA\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\",\"name\":\"University of Cape Town High Performance Computing\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"contentUrl\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"width\":450,\"height\":423,\"caption\":\"University of Cape Town High Performance Computing\"},\"image\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e\",\"name\":\"Andrew Lewis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g\",\"caption\":\"Andrew Lewis\"},\"sameAs\":[\"http:\/\/blogs.uct.ac.za\/blog\/big-bytes\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Jan 2016 UCT HPC maintenance - UCT HPC","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/","og_locale":"en_US","og_type":"article","og_title":"Jan 2016 UCT HPC maintenance - UCT HPC","og_description":"What follows is a critical report on the cluster upgrade. Our plan was to upgrade the cluster operating systems on all servers and to bring the FHGFS file system up to the latest supported release, BGFS. Additionally we planned to upgrade to the latest Mellanox and Cuda drivers for our Infiniband and GPU cards respectively....","og_url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/","og_site_name":"UCT HPC","article_published_time":"2016-01-13T08:27:51+00:00","article_modified_time":"2016-01-13T13:30:43+00:00","author":"Andrew Lewis","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Lewis","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#article","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/"},"author":{"name":"Andrew Lewis","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e"},"headline":"Jan 2016 UCT HPC maintenance","datePublished":"2016-01-13T08:27:51+00:00","dateModified":"2016-01-13T13:30:43+00:00","mainEntityOfPage":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/"},"wordCount":270,"commentCount":0,"publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"articleSection":["application","MPI","operating system"],"inLanguage":"en-ZA","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/","url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/","name":"Jan 2016 UCT HPC maintenance - UCT HPC","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/#website"},"datePublished":"2016-01-13T08:27:51+00:00","dateModified":"2016-01-13T13:30:43+00:00","breadcrumb":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#breadcrumb"},"inLanguage":"en-ZA","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2016\/01\/13\/jan-2016-uct-hpc-maintenance\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucthpc.uct.ac.za\/"},{"@type":"ListItem","position":2,"name":"Jan 2016 UCT HPC maintenance"}]},{"@type":"WebSite","@id":"https:\/\/ucthpc.uct.ac.za\/#website","url":"https:\/\/ucthpc.uct.ac.za\/","name":"UCT HPC","description":"University of Cape Town High Performance Computing","publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-ZA"},{"@type":"Organization","@id":"https:\/\/ucthpc.uct.ac.za\/#organization","name":"University of Cape Town High Performance Computing","url":"https:\/\/ucthpc.uct.ac.za\/","logo":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/","url":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","contentUrl":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","width":450,"height":423,"caption":"University of Cape Town High Performance Computing"},"image":{"@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/c183ad1c0a1063124a72d63963ae9c7e","name":"Andrew Lewis","image":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9652c9c73beeab594b8dc2383a880048?s=96&d=mm&r=g","caption":"Andrew Lewis"},"sameAs":["http:\/\/blogs.uct.ac.za\/blog\/big-bytes"]}]}},"_links":{"self":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/2630"}],"collection":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/comments?post=2630"}],"version-history":[{"count":3,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/2630\/revisions"}],"predecessor-version":[{"id":2633,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/2630\/revisions\/2633"}],"wp:attachment":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/media?parent=2630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/categories?post=2630"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/tags?post=2630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}