{"id":841,"date":"2013-04-12T14:00:57","date_gmt":"2013-04-12T12:00:57","guid":{"rendered":"http:\/\/oldblogs.uct.ac.za\/blog\/big-bytes\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure"},"modified":"2015-08-14T11:44:51","modified_gmt":"2015-08-14T09:44:51","slug":"high-availability-core-services-for-the-south-african-national-grid-infrastructure","status":"publish","type":"post","link":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/","title":{"rendered":"High Availability Core Services for the South African National Grid Infrastructure"},"content":{"rendered":"<p style=\"margin-bottom: 0cm;\"><img src=\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/07\/4441-images.jpg\" alt=\"\" border=\"0\" \/><\/p>\r\n<p style=\"margin-bottom: 0cm;\">The University of Cape Town currently\r\nmaintains a set of core services for SAGrid running on the EMI and\r\ngLite middleware distributions. These set of services form the\r\nbackbone of the SAGrid and enables the submission of computational\r\njobs. When the initial set of core services were deployed, high\r\navailability was not regarded as top priority. During the\r\nAfrica-Arabia ROC meeting the topic of providing highly available\r\ncore services became a priority. We sat around a table and came with\r\na fresh few ideas. One of them was to use the anycast protocol to\r\nprovide the access.<\/p>\r\nTo start off, the SAGrid site\r\nadministrators will host the TopBDII in three places. University of\r\nCape Town, University of Free State and the Meraka Institute (CSIR).\r\nOur upstream provider, TENET will provide SAGrid with a AnyCast\r\naddress and configure BGP sessions so that failover, redundancy and\r\nrouting the request to the nearest server of potential servers is\r\nachieved. If a site becomes unavailable, the routing stack will take\r\ncare of routing clients and servers to the next nearest server.\r\nAnother issue which could occur is if the service on the server fails\r\nbut the server is accessible from a networking perspecitive. This is\r\nresolved by sending the server a poison pill. We will use a tool\r\ncalled \" Monit\" which will be setup on each of the core\r\nservice servers to monitor the local services. Should a service fail\r\nduring processing the \"monit\" application will run through\r\na set of service checks to try and restart the service. Should the\r\nrestart of these services be unsuccessful a poison pill will be\r\nissued to the server.\r\n\r\nEach site will follow a similar\r\nconfiguration. This is something new which none of the other\r\ninternational grids have adopted. According to a EGI representative\r\nfor GRNET, DNS round-robin or site specific load balancers are used\r\nto maintain levels of high availability but lack national routing\r\nintelligence.","protected":false},"excerpt":{"rendered":"<p><img decoding=\"async\" src=\"http:\/\/blogs.uct.ac.za\/gallery\/1253\/4441-images.jpg\" border=\"0\"><\/p>\n<p>The University of Cape Town currently<br \/>\nmaintains a set of core services for SAGrid running on the EMI and<br \/>\ngLite middleware distributions. These set of services form the<br \/>\nbackbone of the SAGrid and enables the submission of computational<br \/>\njobs. When the initial set of core services were deployed, high<br \/>\navailability was not regarded as top priority. During the<br \/>\nAfrica-Arabia ROC meeting the topic of providing highly available<br \/>\ncore services became a priority. We sat around a table and came with<br \/>\na fresh few ideas. One of them was to use the anycast protocol to<br \/>\nprovide the access. <\/p>\n<p>To start off, the SAGrid site<br \/>\nadministrators will host the TopBDII in three places. University of<br \/>\nCape Town, University of Free State and the Meraka Institute (CSIR).<br \/>\nOur upstream provider, TENET will provide SAGrid with a AnyCast<br \/>\naddress and configure BGP sessions so that failover, redundancy and<br \/>\nrouting the request to the nearest server of potential servers is<br \/>\nachieved. If a site becomes unavailable, the routing stack will take<br \/>\ncare of routing clients and servers to the next nearest server.<br \/>\nAnother issue which could occur is if the service on the server fails<br \/>\nbut the server is accessible from a networking perspecitive. This is<br \/>\nresolved by sending the server a poison pill. We will use a tool<br \/>\ncalled &#8221; Monit&#8221; which will be setup on each of the core<br \/>\nservice servers to monitor the local services. Should a service fail<br \/>\nduring processing the &#8220;monit&#8221; application will run through<br \/>\na set of service checks to try and restart the service. Should the<br \/>\nrestart of these services be unsuccessful a poison pill will be<br \/>\nissued to the server.<\/p>\n<p>Each site will follow a similar<br \/>\nconfiguration. This is something new which none of the other<br \/>\ninternational grids have adopted. According to a EGI representative<br \/>\nfor GRNET, DNS round-robin or site specific load balancers are used<br \/>\nto maintain levels of high availability but lack national routing<br \/>\nintelligence. <\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>High Availability Core Services for the South African National Grid Infrastructure - UCT HPC<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"High Availability Core Services for the South African National Grid Infrastructure - UCT HPC\" \/>\n<meta property=\"og:description\" content=\"The University of Cape Town currently maintains a set of core services for SAGrid running on the EMI and gLite middleware distributions. These set of services form the backbone of the SAGrid and enables the submission of computational jobs. When the initial set of core services were deployed, high availability was not regarded as top priority. During the Africa-Arabia ROC meeting the topic of providing highly available core services became a priority. We sat around a table and came with a fresh few ideas. One of them was to use the anycast protocol to provide the access. To start off, the SAGrid site administrators will host the TopBDII in three places. University of Cape Town, University of Free State and the Meraka Institute (CSIR). Our upstream provider, TENET will provide SAGrid with a AnyCast address and configure BGP sessions so that failover, redundancy and routing the request to the nearest server of potential servers is achieved. If a site becomes unavailable, the routing stack will take care of routing clients and servers to the next nearest server. Another issue which could occur is if the service on the server fails but the server is accessible from a networking perspecitive. This is resolved by sending the server a poison pill. We will use a tool called &quot; Monit&quot; which will be setup on each of the core service servers to monitor the local services. Should a service fail during processing the &quot;monit&quot; application will run through a set of service checks to try and restart the service. Should the restart of these services be unsuccessful a poison pill will be issued to the server.Each site will follow a similar configuration. This is something new which none of the other international grids have adopted. According to a EGI representative for GRNET, DNS round-robin or site specific load balancers are used to maintain levels of high availability but lack national routing intelligence.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/\" \/>\n<meta property=\"og:site_name\" content=\"UCT HPC\" \/>\n<meta property=\"article:published_time\" content=\"2013-04-12T12:00:57+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-08-14T09:44:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/07\/4441-images.jpg\" \/>\n<meta name=\"author\" content=\"Timothy Carr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Timothy Carr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/\"},\"author\":{\"name\":\"Timothy Carr\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/41f6cd039836d7741f2b82a7b7cfe8d0\"},\"headline\":\"High Availability Core Services for the South African National Grid Infrastructure\",\"datePublished\":\"2013-04-12T12:00:57+00:00\",\"dateModified\":\"2015-08-14T09:44:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/\"},\"wordCount\":330,\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"articleSection\":[\"sagrid\"],\"inLanguage\":\"en-ZA\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/\",\"name\":\"High Availability Core Services for the South African National Grid Infrastructure - UCT HPC\",\"isPartOf\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\"},\"datePublished\":\"2013-04-12T12:00:57+00:00\",\"dateModified\":\"2015-08-14T09:44:51+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/#breadcrumb\"},\"inLanguage\":\"en-ZA\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucthpc.uct.ac.za\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"High Availability Core Services for the South African National Grid Infrastructure\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#website\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"name\":\"UCT HPC\",\"description\":\"University of Cape Town High Performance Computing\",\"publisher\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-ZA\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#organization\",\"name\":\"University of Cape Town High Performance Computing\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"contentUrl\":\"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png\",\"width\":450,\"height\":423,\"caption\":\"University of Cape Town High Performance Computing\"},\"image\":{\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/41f6cd039836d7741f2b82a7b7cfe8d0\",\"name\":\"Timothy Carr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-ZA\",\"@id\":\"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/7e94dcf3a408e6ada008042fc29d4b15?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/7e94dcf3a408e6ada008042fc29d4b15?s=96&d=mm&r=g\",\"caption\":\"Timothy Carr\"},\"sameAs\":[\"http:\/\/ucthpc.uct.ac.za\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"High Availability Core Services for the South African National Grid Infrastructure - UCT HPC","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/","og_locale":"en_US","og_type":"article","og_title":"High Availability Core Services for the South African National Grid Infrastructure - UCT HPC","og_description":"The University of Cape Town currently maintains a set of core services for SAGrid running on the EMI and gLite middleware distributions. These set of services form the backbone of the SAGrid and enables the submission of computational jobs. When the initial set of core services were deployed, high availability was not regarded as top priority. During the Africa-Arabia ROC meeting the topic of providing highly available core services became a priority. We sat around a table and came with a fresh few ideas. One of them was to use the anycast protocol to provide the access. To start off, the SAGrid site administrators will host the TopBDII in three places. University of Cape Town, University of Free State and the Meraka Institute (CSIR). Our upstream provider, TENET will provide SAGrid with a AnyCast address and configure BGP sessions so that failover, redundancy and routing the request to the nearest server of potential servers is achieved. If a site becomes unavailable, the routing stack will take care of routing clients and servers to the next nearest server. Another issue which could occur is if the service on the server fails but the server is accessible from a networking perspecitive. This is resolved by sending the server a poison pill. We will use a tool called \" Monit\" which will be setup on each of the core service servers to monitor the local services. Should a service fail during processing the \"monit\" application will run through a set of service checks to try and restart the service. Should the restart of these services be unsuccessful a poison pill will be issued to the server.Each site will follow a similar configuration. This is something new which none of the other international grids have adopted. According to a EGI representative for GRNET, DNS round-robin or site specific load balancers are used to maintain levels of high availability but lack national routing intelligence.","og_url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/","og_site_name":"UCT HPC","article_published_time":"2013-04-12T12:00:57+00:00","article_modified_time":"2015-08-14T09:44:51+00:00","og_image":[{"url":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/07\/4441-images.jpg"}],"author":"Timothy Carr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Timothy Carr","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/#article","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/"},"author":{"name":"Timothy Carr","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/41f6cd039836d7741f2b82a7b7cfe8d0"},"headline":"High Availability Core Services for the South African National Grid Infrastructure","datePublished":"2013-04-12T12:00:57+00:00","dateModified":"2015-08-14T09:44:51+00:00","mainEntityOfPage":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/"},"wordCount":330,"publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"articleSection":["sagrid"],"inLanguage":"en-ZA"},{"@type":"WebPage","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/","url":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/","name":"High Availability Core Services for the South African National Grid Infrastructure - UCT HPC","isPartOf":{"@id":"https:\/\/ucthpc.uct.ac.za\/#website"},"datePublished":"2013-04-12T12:00:57+00:00","dateModified":"2015-08-14T09:44:51+00:00","breadcrumb":{"@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/#breadcrumb"},"inLanguage":"en-ZA","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucthpc.uct.ac.za\/index.php\/2013\/04\/12\/high-availability-core-services-for-the-south-african-national-grid-infrastructure\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucthpc.uct.ac.za\/"},{"@type":"ListItem","position":2,"name":"High Availability Core Services for the South African National Grid Infrastructure"}]},{"@type":"WebSite","@id":"https:\/\/ucthpc.uct.ac.za\/#website","url":"https:\/\/ucthpc.uct.ac.za\/","name":"UCT HPC","description":"University of Cape Town High Performance Computing","publisher":{"@id":"https:\/\/ucthpc.uct.ac.za\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucthpc.uct.ac.za\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-ZA"},{"@type":"Organization","@id":"https:\/\/ucthpc.uct.ac.za\/#organization","name":"University of Cape Town High Performance Computing","url":"https:\/\/ucthpc.uct.ac.za\/","logo":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/","url":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","contentUrl":"https:\/\/ucthpc.uct.ac.za\/wp-content\/uploads\/2015\/09\/logocircless.png","width":450,"height":423,"caption":"University of Cape Town High Performance Computing"},"image":{"@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/41f6cd039836d7741f2b82a7b7cfe8d0","name":"Timothy Carr","image":{"@type":"ImageObject","inLanguage":"en-ZA","@id":"https:\/\/ucthpc.uct.ac.za\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/7e94dcf3a408e6ada008042fc29d4b15?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/7e94dcf3a408e6ada008042fc29d4b15?s=96&d=mm&r=g","caption":"Timothy Carr"},"sameAs":["http:\/\/ucthpc.uct.ac.za"]}]}},"_links":{"self":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/841"}],"collection":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/comments?post=841"}],"version-history":[{"count":3,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/841\/revisions"}],"predecessor-version":[{"id":2090,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/posts\/841\/revisions\/2090"}],"wp:attachment":[{"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/media?parent=841"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/categories?post=841"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucthpc.uct.ac.za\/index.php\/wp-json\/wp\/v2\/tags?post=841"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}