gcloud dataproc jobs submit - submit Dataproc jobs to execute on a cluster
gcloud dataproc jobs submit COMMAND [--async] [--bucket=BUCKET] [--region=REGION] [GCLOUD_WIDE_FLAG ...]
Submit Dataproc jobs to execute on a cluster.
To submit a Hadoop MapReduce job, run:
$ gcloud dataproc jobs submit hadoop --cluster my-cluster \ --jar my_jar.jar -- arg1 arg2
To submit a Spark Scala or Java job, run:
$ gcloud dataproc jobs submit spark --cluster my-cluster \ --jar my_jar.jar -- arg1 arg2
To submit a PySpark job, run:
$ gcloud dataproc jobs submit pyspark --cluster my-cluster \ my_script.py -- arg1 arg2
To submit a Spark SQL job, run:
$ gcloud dataproc jobs submit spark-sql --cluster my-cluster \ --file my_queries.q
To submit a Pig job, run:
$ gcloud dataproc jobs submit pig --cluster my-cluster \ --file my_script.pig
To submit a Hive job, run:
$ gcloud dataproc jobs submit hive --cluster my-cluster \ --file my_queries.q
- --async
Return immediately, without waiting for the operation in progress to complete.
- --bucket=BUCKET
The Cloud Storage bucket to stage files in. Defaults to the cluster's configured bucket.
- --region=REGION
Dataproc region to use. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default dataproc/region property value for this command invocation.
These flags are available to all commands: --help.
Run $ gcloud help for details.
COMMAND is one of the following:
- hadoop
Submit a Hadoop job to a cluster.
- hive
Submit a Hive job to a cluster.
- pig
Submit a Pig job to a cluster.
- presto
Submit a Presto job to a cluster.
- pyspark
Submit a PySpark job to a cluster.
- spark
Submit a Spark job to a cluster.
- spark-r
Submit a SparkR job to a cluster.
- spark-sql
Submit a Spark SQL job to a cluster.
- trino
Submit a Trino job to a cluster.
These variants are also available:
$ gcloud alpha dataproc jobs submit
$ gcloud beta dataproc jobs submit