STAT 39000: Project 6 — Spring 2022
Motivation: In this project we will continue to get familiar with SLURM, the job scheduler installed on our clusters at Purdue, including Brown.
Context: This is the second in a series of (now) 4 projects focused on parallel computing using SLURM and Python.
Scope: SLURM, unix, Python
Dataset(s)
The following questions will use the following dataset(s):
-
/depot/datamine/data/coco/attempt02/*.jpg
Questions
Question 1
This project, and the next, will have a variety of different types of deliverables. Ultimately, each question will result in some entry in a Jupyter notebook, and/or 1 or more additional Python and/or Bash scripts. In addition, to properly save screenshots in your Jupyter notebook, please follow the guidelines here. Images that don’t appear in your notebook in Gradescope will not get credit. |
In project 5, question 2, we asked you to test out a variety of srun
commands with variations in the options. As you are probably now well-aware — it can be difficult to understand what combination of parameters are needed. With that being said, in this course, we will focus on jobs that can be perfectly or embarassingly parallel, and single core single threaded jobs. So, the following job script is a safe and effective way to break your jobs up.
#!/bin/bash
#SBATCH --account=datamine
#SBATCH --job-name=serial_job_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=me@purdue.edu # Where to send mail
#SBATCH --ntasks=3 # Number of tasks (total)
#SBATCH --cpus-per-task=1 # Number of cores per task
#SBATCH -o /dev/null # Output to dev null
#SBATCH -e /dev/null # Error to dev null
srun --exclusive -n 1 --mem-per-cpu=1000 -t 00:00:00 some_command &
srun --exclusive -n 1 --mem-per-cpu=1000 -t 00:00:00 some_command &
srun --exclusive -n 1 --mem-per-cpu=1000 -t 00:00:00 some_command &
wait
Just be sure to modify your job script ntasks
and the amount of time and memory you need for each job step.
Remember, you use |
To add to the difficulty you maybe had understanding the various options available to you, if you used a terminal from with Jupyter Lab, you were technically already in a SLURM job with |
When inside a SLURM job, a variety of environment variables are set that alters how srun
behaves. If you open a terminal from within Jupyter Lab and run the following, you will see.
env | grep -i slurm
These variables altered the behavior of srun
. We can however, unset these variables, and the behavior will revert to the default behavior. In your terminal, run the following.
for i in $(env | awk -F= '/SLURM/ {print $1}'); do unset $i; done;
Confirm that the environment variables are unset by running the following.
env | grep -i slurm
Great! Now, we can work in our nice Jupyter Lab environment without any concern that SLURM environment variables are changing any behaviors. Let’s test it out with something actually predictable.
#!/usr/bin/python3
import time
import socket
from pathlib import Path
import datetime
def main():
print(f"started: {datetime.datetime.now()}")
print(socket.gethostname())
with open("/proc/self/cgroup") as file:
for line in file:
if 'cpuset' in line:
cpu_loc = "cpuset" + line.split(":")[2].strip()
if 'memory' in line:
mem_loc = "memory" + line.split(":")[2].strip()
base_loc = Path("/sys/fs/cgroup/")
with open(base_loc / cpu_loc / "cpuset.cpus") as file:
num_cpus = len(file.read().strip().split(","))
print(f"CPUS: {num_cpus}")
with open(base_loc / mem_loc / "memory.limit_in_bytes") as file:
mem_in_bytes = int(file.read().strip())
print(f"Memory: {mem_in_bytes/1024**2} Mbs")
time.sleep(3)
print(f"ended: {datetime.datetime.now()}")
if __name__ == "__main__":
main()
#!/bin/bash
#SBATCH --account=datamine
#SBATCH --job-name=serial_job_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=me@purdue.edu # Where to send mail
#SBATCH --ntasks=3 # Number of tasks (total)
#SBATCH --cpus-per-task=1 # Number of cores per task
#SBATCH -o /dev/null # Output to dev null
#SBATCH -e /dev/null # Error to dev null
srun --exclusive -n 1 --mem-per-cpu=1000 -t 00:00:00 $HOME/get_info.py > 1.txt &
srun --exclusive -n 1 --mem-per-cpu=1000 -t 00:00:00 $HOME/get_info.py > 2.txt &
srun --exclusive -n 1 --mem-per-cpu=1000 -t 00:00:00 $HOME/get_info.py > 3.txt &
srun --exclusive -n 1 --mem-per-cpu=1000 -t 00:00:00 $HOME/get_info.py > 4.txt &
wait
Place get_info.py
in your $HOME
directory and launch the job with the following command.
sbatch my_job.sh
Make sure to give your
|
Note that there is no |
Check out the contents of 1.txt
, 2.txt
, 3.txt
, and 4.txt
. Explain in as much detail as possible what resources (cpus) were allocated for the job, what resources (cpus and memory) were allocated for each step, and how the jobs resources (cpus) effected the results of each step.
-
Code used to solve this problem.
-
Output from running the code.
Question 2
I hope that the previous question was helpful, and gave you at least 1 reliable way to write job scripts for embarrassingly parallel jobs, where you can predict what will happen.
If at this point in time you are wondering "why would we do this when we can just use |
In the previous project, you were able to use the sha256 hash to efficiently find the extra image that the trickster Dr. Ward added to our dataset. Dr. Ward, knowing all about hashing algorithms, thinks he has a simple solution to circumventing your work. In the "new" dataset: /depot/datamine/data/coco/attempt02
, he has modified the value of a single pixel of his duplicate image.
Re-run your SLURM job from the previous project on the new dataset, and process the results to try to find the duplicate image. Was Dr. Ward’s modification successful? Do your best to explain why or why not.
-
Code used to solve this problem.
-
Output from running the code.
Question 3
Unfortunately, Dr. Ward was right, and our methodology didn’t work. Luckily, there is a cool technique called perceptual hashing that is almost meant just for this! Perceptual hashing is a technique that can be used to know whether or not any two images appear the same, without actually viewing the images. The general idea is this. Given two images that are essentially the same (maybe they have a few different pixels, have been cropped, gone through a filter, etc.), a perceptual hash can give you a very good idea whether the images are the "same" (or close enough). Of course, it is not a perfect tool, but most likely good enough for our purposes.
To be a little more specific, two images are very likely the same if their perceptual hashes are the same. If two perceptual hashes are the same, their Hamming distance is 0. For example, if your hashes were: 8f373714acfcf4d0
and 8f373714acfcf4d0
, you would the Hamming distance would be 0, because if you convert the hexadecimal values to binary, at each position in the string of 0s and 1s, the values are identical. If 1 of the 0s and 1s didnt match after converting to binary, this would be a Hamming distance of 1.
Use the imagehash
library, and modify your job script from the previous project to use perceptual hashing instead of the sha256 algorithm to produce 1 file for each image where the filename remains the same as the original image, and the contents of the file contains the hash.
Make sure to clear out your slurm environment variables before submitting your job to run with
If you are in a bash cell in Jupyter Lab, do the same.
|
In order for the
In order for your hash script to find the |
To help get you going using this package, let me demonstrate using the package.
|
Make sure that you pass the hash as a string to the |
Make sure that once you’ve written your script, |
It would be a good idea to make sure you’ve modified your hash script to work properly with the
This should produce a file, |
Make sure your
|
We’ve now posted the solutions to project 5 question 4. See here. |
Process the results (like in the previous project). Did you find the duplicate image? Explain what you think could have happened.
-
Code used to solve this problem.
-
Output from running the code.
Question 4
What!?! That is pretty cool! You found the "wrong" duplicate image? Well, I guess it is totally fine to find multiple duplicates. Modify the code you used to find the duplicates so it finds all of the duplicates and originals. In total there should be 50. Display 2-5 of the pairs (or triplets or more). Can you see any of the subtle differences? Hopefully you find the results to be pretty cool! If you look, you will find Dr. Wards hidden picture, but you do not have to exhaustively display all 50 images.
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connect ion, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |