Machine learning Program
Missouri University of Science & Technology Department of Computer Science Spring 2024 CS 6406: Machine Learning for Computer Vision (Sec: 101/102) Homework 2: Efficient Learning Instructor: Sid Nadendla Due: Mar 24, 2024 Goals and Directions: • The main goal of this assignment is to implement efficient neural networks from scratch, and train them on any given dataset while splitting it into multiple mini-batches. • Comprehend the impact of hyperparameters and learn to tune them effectively. • You are not allowed to use neural network libraries like PyTorch, Tensorflow and Keras. • You are also not allowed to add, move, or remove any files, or even modify their names. • You are also not allowed to change the signature (list of input attributes) of each function. Problem 1 Model Aggregation 5 points Implement dropout layer in hw2/mlcvlab/nn/basis.py Linear Function with Inverse Dropout: Accomplish ensemble training via randomly “turningoff” nodes using a binary mask with parameter p, for each mini-batch. In other words, create a mask for each neuron’s input by sampling a Bernoulli random variable (instantiated using numpy.random.binomial). If the sample is ’1’, multiply p1 to the corresponding neuron’s input. Otherwise, multiply the input with ’0’. Note: The pattern of dropped nodes changes for each input (i.e. each forward pass). For more details, please refer to Section 10 (Page 1951) in the following paper: N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. ”Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014. Problem 2 Regularization 5 points Implement regularizers in hw2/mlcvlab/nn/basis.py. Batch Normalization: Following are the sequence of steps that need to be followed in a BatchNorm layer. 1 CS6406: Topic 2 – Efficient Learning 2 Gradient of Batch Normalization: Following are the sequence of steps needed for computing the gradient of BatchNorm for backpropagation. For more details, please refer to S. Ioffe, and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in International Conference on Machine Learning, pp. 448-456, 2015. Problem 3 Architecture 5 points Implement a four-layer NN in hw2/mlcvlab/models/nn4.py NN4 model: Implement in nn4 definition with batchnorm and dropout features at each layer. • Layer 1: z 1 = W1 · Dropout(x), z̃ 1 = ReLU(z 1 ), y 1 = BatchNorm(z̃ 1 , γ1 , β1 ). • Layer 2: z 2 = W2 · Dropout(y 1 ), z̃ 2 = ReLU(z 2 ), y 2 = BatchNorm(z̃ 2 , γ2 , β2 ). • Layer 3: z 3 = W3 · Dropout(y 2 ), z̃ 3 = ReLU(z 3 ), y 3 = BatchNorm(z̃ 3 , γ3 , β3 ). CS6406: Topic 2 – Efficient Learning 3 • Layer 4: z4 = w4T · Dropout(y 3 ), y = Sigmoid(z4 ) Gradient of NN4 model: Compute the gradients for NN4 model using backprop algorithm and implement them in nn4 grad definition. Problem 4 Data-Parallelism in Optimization1 5 points Implement SyncSGD in hw2/mlcvlab/optim/sync sgd.py Synchronous Mini-Batch SGD: Implement the synchronous mini-batch SGD over multiple GPUs. • Hyperparameter: δ, K • Divide training data into K mini-batches. • Compute the gradient estimate of empirical loss on each mini-batch with respect to W r−1 using emp loss grad function in the model class. • Wait for all the gradient computations across different mini-batches on different GPUs and ˆ N (W(r−1) ). aggregate them to obtain a gradient estimate ∇L • Compute the update step using the gradient estimate from the previous step: ˆ N (W(r−1) ) W(r) = W(r−1) − δ · ∇L Note that the above mini-batch SGD algorithm should leverage the presence of multiple GPUs, for which you will need to use the just-in-time (@cuda.jit) decorator provided by numba package. A necessary dependency for this package to run is the CUDA toolkit, which can be installed by executing the following commands in the Terminal: conda install cudatoolkit For more details about numba, please refer to https://numba.pydata.org/numba-doc/latest/cuda/index.html Remark: When using @cuda.jit, the input array is first copied from RAM to GPU for processing. The returned values are then copied from GPU to CPU back. However, special care should be taken when the function under @cuda.jit attempts to call any other function. In that case, both functions should be optimized with @cuda.jit. Otherwise, @cuda.jit may slow down the overall computation. Handout: For your reference, a handout code is provided that demonstrates how you can distribute your computations across GPUs using numba package and cuda toolkit. 1 You may need to run this on Google’s colab, AWS SageMaker Studio Lab, or Foundry/Mill for GPU access. CS6406: Topic 2 – Efficient Learning Problem 5 Training and Testing For this question, write your code in hw2/HW2 MNIST NN4.ipynb. Train and test your NN4 using sync SGD() on MNIST, similar to that of HW1. 4 5 points
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.