{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up environment" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "Install Ablator via pip:\n", "```\n", "pip install ablator\n", "```\n", "For development version of Ablator:\n", "```\n", "pip install git+https://github.com/fostiropoulos/ablator.git@v0.0.1-mp\n", "```\n", "\n", "Note: Python version is should be 3.10 or newer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Prerequisites" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Setting up ray cluster\n", "\n", "Ablator harnesses the power of the `ray` framework to streamline and expand its ablation studies, facilitating the concurrent execution of multiple trials, each corresponding to a distinct configuration variation. These trials are dynamically allocated to the cluster based on available computational resources. Experiments can be executed either in a cluster that lives in a single local machine (most of the ablation study/HPO tutorials run ablation experiments this way) or in one that spans across multiple machines for increased scalability.\n", "\n", "When initiating an experiment, you have the flexibility to specify a cluster address to which Ablator will allocate and dispatch the trials. Otherwise, if no address provided, Ablator will automatically set up a local ray cluster and execute the experiment within this local environment.\n", "\n", "This section provides you with steps to manually set up a ray cluster for your machines. However, one can also deploy a cluster in any cloud service of interest (AWS, GCP, or Kubernetes). Refer to [ray clusters docs](https://docs.ray.io/en/latest/cluster/getting-started.html#cluster-index) to learn more.\n", "\n", "##### Start the Head node\n", "Choose any machine to be the head node and run the following shell command. Note that Ray will choose port `6379` by default:\n", "```shell\n", "ray start --head\n", "```\n", "The command will print out the Ray cluster address, which can be passed to `ray start` on other machines to start and attach worker nodes to the cluster (see below). If you receive a `ConnectionError`, check your firewall settings and network configuration.\n", "\n", "##### Start Worker nodes\n", "On each of the other nodes (machines), run the following command to connect to the head node:\n", "```\n", "ray start --address=\n", "```\n", "`head-node-address:port` should be the value printed by the command on the head node (e.g., `123.45.67.89:6379`). \n", "\n", "##### Launch experiment to the cluster\n", "Once the cluster is set up, you can launch an experiment to the cluster by specifying the cluster head address. A preview of the command is as follows (make sure that ablator is installed on all nodes in the cluster):\n", "```python\n", "ablator_trainer.launch(working_directory=\"\", ray_head_address=\"\")\n", "```\n", "\n", "##### Setup ray nodes in Python\n", "Alternative to the CLI commands above, you can also set up ray nodes in Python:\n", "\n", "```python\n", "import ray\n", "if not ray.is_initialized():\n", " ray.init(address=\"\")\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Note\n", "\n", "- This tutorial for setting up ray cluster is adapted from [this ray doc](https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html).\n", "\n", "- Here you can find instructions to launch a ray cluster using `cluster-launcher`, which sets up all nodes at once instead of manually going over each of the node.\n", "\n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### GPU-Accelerated Computing (Optional)\n", "If GPUs are available, you can run your experiments with the power of GPU acceleration, which can significantly speed up the training process. To run CUDA Python, you’ll need the [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) installed on your system with CUDA-capable GPUs.\n", "\n", "After setting up CUDA, you can install the cuda-enabled torch packages. Refer to [this tutorial](https://pytorch.org/get-started/locally/) for instructions on how to install these. A sample command to install `torch` and `torchvision` cuda package is shown below:\n", "```\n", "pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117 --force-reinstall\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }