HPC Crash Course
Fundamentals in Computing & init()
Manuel Holtgrewe
Berlin Institute of Health at Charité
Welcome
- Course Overview
- Aims, Scope, Non-Aims, Out-of-Scope
Course Overview
- Welcome to the course! 👋
- Introduction to High-Performance Computing (HPC)
- Focus on biomedical and medical research applications
- Duration:
5 days 5 x 45 min
- Instructor: Manuel Holtgrewe
- Contact Information: manuel.holtgrewe@bih-charite.de
User Engagement
- We are a large group today.
- We will use PollEverywhere for
- quizes for “Lernerfolgskontrolle”
- questions and answers
- Let’s try: PollEv.com/manuelholtgrewe153
The Sheep Scale
Who are you?
Course Aims and Scope
- Basic understanding of HPC
- Some theory to understand issues occuring in practice
- How to help yourself in case of troubleshooting
- Install software (conda, apptainer/singularity)
- Tips and sleight of hand for improved productivity
… and maybe
- Connect “birds of a feather” to later share experience, e.g., on project organisation
Prerequisites 🎓
You should have experience with…
- … the Linux operating system 🐧
- … the Bash shell (interactive, scripting) 🐚
- … using the Secure Shell (SSH) 🛡️
- … one programming language (ideally: Python 🐍)
- … (some exposure to) scientific programming / machine learning
Course Non-Aims and Out-of-Scope
- Advanced OnDemand Portal usage
- (Linux) command line basics
- Programming basics
- Scientific programming, e.g., machine learning, statistics, visualization, parameter fitting, polars, …
- Bioinformatics (variant/gene expression/… analysis)
- Networking issues with Charite or MDC/MAX
- Advanced Snakemake usage, reproducibility, …
Course Outline
- Fundamentals: Computing
- Software Installation Management
- Fundamentals: HPC
- Slurm Resource Manager
- Snakemake for Workflows
Session Overview
- (The world according to) the operating system (OS)
- user space, kernel space, pid, uid, gid, cwd, oh my!
- A dive into memory management
- The “file” abstraction
- Processes, threads, and the Python GIL
- Computer networks
- Some fundamentals
- The secret life of the secure shell (SSH)
The Operating System (OS)
- Basic Process Properties
- User Space, Kernel Space, and System Calls
- OS Memory Management
- The “File” Abstraction
- Processes, Threads, and the Python GIL
Note: simplified treatment of the subject only
Basic Process Properties
- running programs are processes
bash
launched via ssh
bash
launched via srun
python
, conda
, vim
, …
- each process has…
- properties such as PID and others
- its own memory space
- list of open files
(some) process properties
pid
- process ID
uid
- numeric user ID
gid
- (effective) group ID
umask
- default mode for new files
cwd
- current working directory
limits
- virtual memory
- open files
- max processes
User Space, Kernel Space
Linux Memory Layout
Source: Computer Systems: A Programmer’s Perspective, 3/E (CS:APP3e)
Virtual and Physical Memory Layout
Source: Wikipedia - Page Table
Memory Managment Takeaways
- Each process has its own, isolated, virtual memory
- The kernel is responsible for
- mapping between virtual and physical memory
- allocating and tracking virtual memory
- The user space program
- only operates on virtual memory
- can only move the
brk
pointer to (de)allocate memory
- implements its own management within virtual memory
- Resident memory is virtual memory that is
- actually mapped (and not swapped out)
Memory Management Takeaways (2)
- Memory management is a complex topic! 🙉🙊🙊
- Most of you …
- either call C programs from Bash …
- or use a language with garbage collection (Python, R, Java, etc.) …
- … and should probably not worry too much about it.
- beware of memory leaks (and “leaks”), though!
- Try to learn some best practices now and return once you have issues.
- Premature optimization is the root of all evil. - Donald Knuth
- We will cover memory again with respect to Slurm.
The “File” Abstraction
- In Unix systems, everything is a file
- directories, devices, sockets, …
- GPUs
- Pro tip: Learn how to use
umask
!
-rw------- 1 holtgrem_c hpc-ag-cubi 142K Oct 23 11:08 .bash_history
-rw-r----- 1 holtgrem_c hpc-ag-cubi 18 Aug 23 2016 .bash_logout
-rw-r----- 1 holtgrem_c hpc-ag-cubi 215 Jul 21 2022 .bash_profile
lrwxrwxrwx 1 holtgrem_c hpc-users 16 Jan 12 2023 .bashrc -> .dotfiles/bashrc
Processes, Threads
Source: https://dev.to/hrishi2710/threading-in-operating-system-3gjb
Time Slicing
Source: https://commons.wikimedia.org/wiki/File:CpuTimeonSingleCpuMultiTaskingSystem.svg
The Python GIL (1)
Source: https://twitter.com/StepFor40691082/status/1269101158561431557S
The Python GIL (2)
- Global Interpreter Lock implications:
- multithreading only useful for I/O-bound tasks (concurrency)
- for parallel computation, use
multiprocessing
- we will see an example in “Fundamentals: HPC”
Secure Shell
Demo Time!
… feel free to type along and learn some sleight of hand
beginners: ascend!
- master the Bash!
tmux
or screen
nano
or vim
- NO VS Code on the Login node!
git
grep
, find
, tar
, xargs
, …
man
<– your friend
intermediates: ascend!
vim
!
~/.bashrc
alias
es
- be careful, and don’t hurt yourself
ssh $HOST /usr/bin/bash --noprofile --norc
- bash scripting
awk
, sed
, perl
, …