Last month we decided to adjust our HPC queues so that jobs by default would land on the correct series. Working under the usual amount of stress and pressure these changes were duly made (by the author) to the queue manager:
set queue [QueueName] resources_default.neednodes = [NodeTag]
About 24 hours later it was remarked upon that jobs were queuing even though resources were available. Initially it was suspected that as the PBS nodes=X directive was being used that jobs designated for the series600 servers were being offered GPU series servers.
Googling for the usual "PBS queues jobs and they don't run" didn't produce much in the way of solutions. 48 hours later it was clear that there was a deeper problem, several large jobs had just finished freeing up many cores and jobs were still queuing, even single core jobs. Eventually Heine managed to spot the error, instead of entering the NodeTag in the .neednodes directive I had added the queue name again, hence the default request could never be satisfied. Correcting the line and updating queue manager fixed the problem immediately.
Fortunately this did not result in much interruption to research but it did highlight the need for the existing change control system to be used.