Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

askthedev.com Logo askthedev.com Logo
Sign InSign Up

askthedev.com

Search
Ask A Question

Mobile menu

Close
Ask A Question
  • Ubuntu
  • Python
  • JavaScript
  • Linux
  • Git
  • Windows
  • HTML
  • SQL
  • AWS
  • Docker
  • Kubernetes
Home/ Questions/Q 5076
Next
In Process

askthedev.com Latest Questions

Asked: September 25, 20242024-09-25T01:27:48+05:30 2024-09-25T01:27:48+05:30In: Ubuntu

What steps should I follow to set up SLURM on Ubuntu 20.04?

anonymous user

I’ve been diving into the world of high-performance computing lately and decided to set up a cluster using Ubuntu 20.04. I’ve heard a lot about SLURM and how it can really help in managing job scheduling on clusters, but I have to admit, I’m a bit overwhelmed by the whole process.

I read somewhere that there are several steps involved in setting it up, but I can’t quite wrap my head around it all. I mean, there are so many components to think about—like installing the necessary packages, configuring the SLURM controller, creating a database, and even setting up the compute nodes. It’s a little daunting. Plus, I’m concerned about ensuring that everything communicates properly. I’ve seen all kinds of guides online, but they often assume you’re already an expert or skip over important details that might trip me up.

Has anyone actually gone through this process and can offer a straightforward way to tackle it? I’d love to hear about what steps you followed from start to finish. Maybe even a couple of tips or common pitfalls to avoid would help too.

Also, is it necessary to have a dedicated node for the SLURM controller, or can I run everything on a single machine to start? What about the networking setup? Any specific configurations I need to keep in mind?

In addition, if you can shed some light on how to test if it’s working properly after the installation, that would be awesome! I definitely want to know at the end if I’ve done this right.

Honestly, any help or insight from someone who’s been through it would be super appreciated. It feels like I’m walking into this blindfolded, and the more I read, the more confused I seem to get. I’m just trying to set the foundation for some cool projects I have in mind, and starting with SLURM seems like the way to go! Thanks in advance for your thoughts!

  • 0
  • 0
  • 2 2 Answers
  • 0 Followers
  • 0
Share
  • Facebook

    Leave an answer
    Cancel reply

    You must login to add an answer.

    Continue with Google
    or use

    Forgot Password?

    Need An Account, Sign Up Here
    Continue with Google

    2 Answers

    • Voted
    • Oldest
    • Recent
    1. anonymous user
      2024-09-25T01:27:49+05:30Added an answer on September 25, 2024 at 1:27 am






      SLURM Setup Help

      Setting Up SLURM on Ubuntu 20.04

      Setting up SLURM can definitely feel overwhelming at first, especially if you’re diving into high-performance computing for the first time. Here’s a simplified approach you can follow:

      Basic Steps to Install SLURM

      1. Install Necessary Packages:
        sudo apt update
        sudo apt install slurm-wlm slurmctld slurmd munge
      2. Set Up Munge: This is important for authentication.
        sudo create-munge-key
        sudo chown munge:munge /etc/munge/munge.key
        sudo systemctl start munge
        sudo systemctl enable munge
      3. Configure SLURM Controller:

        Edit the /etc/slurm/slurm.conf file. You’ll want to define control parameters. Here’s a small example to get you started:

        ClusterName=mycluster
        SlurmdPort=7003
        SlurmctldPort=7002
        AuthType=auth/munge
        PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP
      4. Start SLURM Services:
        sudo systemctl start slurmd
        sudo systemctl start slurmctld
      5. Set Up Compute Nodes: If you’re starting on a single machine, just install SLURM on the same machine. But for future expansion, you might want additional nodes.

      Networking Setup

      For networking, ensure that all your nodes can communicate with each other. You might need to adjust firewall settings, especially if you have UFW enabled:

      sudo ufw allow 7002/tcp
      sudo ufw allow 7003/tcp

      Testing Your Setup

      Once everything is installed and running, you can check SLURM’s status with:

      scontrol show nodes

      If everything is set up correctly, you should see your node(s) listed. You can also run a simple job using:

      sbatch --wrap="sleep 10"

      Then check with:

      squeue

      Common Pitfalls

      • Forget to start Munge! Double-check that it’s running.
      • Configuration files need to match across nodes, so copy them over if you expand.
      • Ensure firewalls allow communication on the necessary ports.

      Final Thoughts

      You don’t need a dedicated node for the SLURM controller initially; running everything on a single machine is totally okay while you’re getting started. Once you’re comfortable, you can scale up!

      Hopefully, this gives you a clearer picture to get started. Just take it step by step, and don’t hesitate to ask if you need more help!


        • 0
      • Reply
      • Share
        Share
        • Share on Facebook
        • Share on Twitter
        • Share on LinkedIn
        • Share on WhatsApp
    2. anonymous user
      2024-09-25T01:27:49+05:30Added an answer on September 25, 2024 at 1:27 am

      Setting up SLURM on an Ubuntu 20.04 cluster can indeed feel overwhelming, but breaking it down into manageable steps can greatly simplify the process. First, you should install the necessary packages for SLURM by running `sudo apt update` followed by `sudo apt install slurm-wlm slurmctld slurmd munge`. Once the installation is complete, you’ll need to configure the SLURM controller. This involves editing the `/etc/slurm/slurm.conf` file to specify parameters like `ControlMachine` (the hostname of the control node) and `NodeName` along with their respective configurations. If you’re starting on a single machine, you can run both the SLURM controller and compute node processes on it, which is a great way to test your setup without needing a dedicated node initially. For networking, ensure that all nodes can communicate with each other over SSH and that you’ve configured firewalls to allow the relevant SLURM ports (usually 6817 and 6818) to be accessible.

      To check if your SLURM installation is functioning correctly after setup, you can run the command `sinfo`, which should provide you with a list of available nodes and their states. Additionally, test job scheduling by submitting a simple job script using `sbatch`. A straightforward script might look like this:

      #!/bin/bash
      #SBATCH --job-name=test_job
      #SBATCH --output=output.txt
      #SBATCH --ntasks=1
      #SBATCH --time=00:01:00
      
      echo "Hello from SLURM!"

      Save this as `test_job.sh`, and submit it with `sbatch test_job.sh`. Monitor the output file (`output.txt`) to ensure that your job ran successfully. Common pitfalls include incorrect configurations in your `slurm.conf`, failing to start the Munge authentication service (run `sudo systemctl start munge`), or network issues. By carefully following these steps and validating each part of your configuration, you’ll set a solid foundation for future high-performance computing projects.

        • 0
      • Reply
      • Share
        Share
        • Share on Facebook
        • Share on Twitter
        • Share on LinkedIn
        • Share on WhatsApp

    Related Questions

    • I'm having trouble installing the NVIDIA Quadro M2000M driver on Ubuntu 24.04.1 LTS with the current kernel. Can anyone provide guidance or solutions to this issue?
    • What steps can I take to troubleshoot high usage of GNOME Shell in Ubuntu 24.04?
    • I recently performed a fresh installation of Ubuntu 24.04, and I've noticed that my RAM usage steadily increases over time until my system becomes unresponsive. Has anyone experienced this issue ...
    • How can I resolve the "unknown filesystem" error that leads me to the GRUB rescue prompt on my Ubuntu system?
    • I'm experiencing an issue with Ubuntu 24.04 where Nautilus fails to display the progress indicator when I'm copying large files or folders. Has anyone else encountered this problem, and what ...

    Sidebar

    Related Questions

    • I'm having trouble installing the NVIDIA Quadro M2000M driver on Ubuntu 24.04.1 LTS with the current kernel. Can anyone provide guidance or solutions to this ...

    • What steps can I take to troubleshoot high usage of GNOME Shell in Ubuntu 24.04?

    • I recently performed a fresh installation of Ubuntu 24.04, and I've noticed that my RAM usage steadily increases over time until my system becomes unresponsive. ...

    • How can I resolve the "unknown filesystem" error that leads me to the GRUB rescue prompt on my Ubuntu system?

    • I'm experiencing an issue with Ubuntu 24.04 where Nautilus fails to display the progress indicator when I'm copying large files or folders. Has anyone else ...

    • How can I configure a server running Ubuntu to bind specific IP addresses to two different network interfaces? I'm looking for guidance on how to ...

    • Is it possible to configure automatic login on Ubuntu MATE 24.04?

    • After upgrading from Ubuntu Studio 22.04 to 24.04.1, I lost all audio functionality. What steps can I take to diagnose and resolve this issue?

    • I am experiencing issues booting Ubuntu 22.04 LTS from a live USB. Despite following the usual procedures, the system fails to start. What steps can ...

    • I'm encountering a problem with my Expandrive key while trying to update my Ubuntu system. Has anyone else faced similar issues, and if so, what ...

    Recent Answers

    1. anonymous user on How do games using Havok manage rollback netcode without corrupting internal state during save/load operations?
    2. anonymous user on How do games using Havok manage rollback netcode without corrupting internal state during save/load operations?
    3. anonymous user on How can I efficiently determine line of sight between points in various 3D grid geometries without surface intersection?
    4. anonymous user on How can I efficiently determine line of sight between points in various 3D grid geometries without surface intersection?
    5. anonymous user on How can I update the server about my hotbar changes in a FabricMC mod?
    • Home
    • Learn Something
    • Ask a Question
    • Answer Unanswered Questions
    • Privacy Policy
    • Terms & Conditions

    © askthedev ❤️ All Rights Reserved

    Explore

    • Ubuntu
    • Python
    • JavaScript
    • Linux
    • Git
    • Windows
    • HTML
    • SQL
    • AWS
    • Docker
    • Kubernetes

    Insert/edit link

    Enter the destination URL

    Or link to existing content

      No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.