Manual Installation on a YARN-Based Cluster

You can use Apache Slider. to manually install Presto on a YARN-based cluster.

Deploying Presto on a YARN-Based Cluster

The installation procedures assume that you have a basic knowledge of Presto and the configuration files and properties it uses.

Note

All example files referred to are from: https://github.com/prestodb/presto-yarn/


Pre-Requisites


Presto Installation Directory Structure

When you use Slider to install Presto on a YARN-based cluster, the Presto installation directory structure differs from the standard structure.

For more information, see: Presto Installation Directory Structure for YARN-Based Clusters.


Presto Installation Configuration Options

Before installation, you must configure the .json files required for running Presto.

For more information, see: Presto Configuration Options for YARN-Based Clusters.


Using Apache Slider to Manually Install Presto on a YARN-Based Cluster

  1. Download the slider 0.80.0 installation file from http://slider.incubator.apache.org/index.html to one of your nodes in the cluster.
tar -xvf slider-0.80.0-incubating-all.tar.gz
  1. Now configure Slider with JAVA_HOME and HADOOP_CONF_DIR in slider-0.80.0-incubating/conf/slider-env.sh
export JAVA_HOME=/usr/lib/jvm/java
export HADOOP_CONF_DIR=/etc/hadoop/conf
  1. Configure zookeeper in conf/slider-client.xml. In case zookeper is listening on master:2181 you need to add there the following section:
<property>
    <name>slider.zookeeper.quorum</name>
    <value>master:2181</value>
</property>
  1. Configure path where slider packages will be installed
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://master/</value>
</property>
  1. Make sure the user running slider, which should be same as site.global.app_user in appConfig.json, has a home dir in HDFS (See note here: appConfig.json).

    For more details about appConfig.json and resources.json, see Presto Configuration Options for YARN-Based Clusters

su hdfs
$ hdfs dfs -mkdir -p /user/<user>
$ hdfs dfs -chown <user>:<user> -R /user/<user>
  1. Now run Slider:
su <user>
cd slider-0.80.0-incubating
bin/slider package --install --name PRESTO --package ../presto-yarn-package-*.zip
bin/slider create presto1 --template appConfig.json --resources resources.json (using modified .json files as per your requirement)

This should start your application, and you can see it under the Yarn ResourceManager webUI.If your application is successfully run, it should continuously be available in the YARN resource manager as a “RUNNING” application. If the job fails, please be sure to check the job history’s logs along with the logs on the node’s disk. See Debugging and Logging for YARN-Based Clusters.


Additional Slider Commands

You can use the following Slider commands to manage your existing Presto application.

Check the Status

If you want to check the status of running application you run the following, and you will have status printed to a file status_file

bin/slider status presto1 --out status_file

Check where the coordinator is running

Use the following command to check what is the host and port of presto coordinator after deployment, so that you can connect to it. You can use output of this command to specify –server flag on presto command line.

bin/slider registry --name presto1 --getexp presto

You can also view this information through Slider REST API and YARN Application UI.

Destroy the App and Re-create

If you want to re-create the app due to some failures or you want to reconfigure Presto (eg: add a new connector)

bin/slider destroy presto1
bin/slider create presto1 --template appConfig.json --resources resources.json

Completely Remove the App

Delete the app including the app package.

bin/slider package --delete --name PRESTO

‘Flex’ible App

Flex the number of Presto workers to the new value. If greater than before, new copies of the worker will be requested. If less, component instances will be destroyed.

Changes are immediate and depend on the availability of resources in the YARN cluster. Make sure while flex that there are extra nodes available(if adding) with YARN nodemanagers running and also Presto data directory pre-created/owned by yarn user. Also make sure these nodes do not have a Presto component already running, which may cause flex-ing to deploy worker on these nodes and eventually failing.

eg: Asumme there are 2 nodes (with YARN nodemanagers running) in the cluster and you initially deployed only one of the nodes with Presto via Slider. If you want to deploy and start Presto WORKER component on the second node (assuming it meets all resource requirements) and thus have the total number of WORKERS to be 2, then run:

bin/slider flex presto1 --component WORKER 2

Please note that if your cluster already had 3 WORKER nodes running, the above command will destroy one of them and retain 2 WORKERs.


Advanced Configuration Options

The following advanced configuration options are available:

  • Configuring memory, CPU, and YARN CGroups
  • Failure policy
  • YARN label

For more information, see Advanced Configuration Options for YARN-Based Clusters.