NAME

gcloud dataproc jobs submit pig - submit a Pig job to a cluster

SYNOPSIS

gcloud dataproc jobs submit pig (--cluster=CLUSTER | --cluster-labels=[KEY=VALUE,...]) (--execute=QUERY, -e QUERY | --file=FILE, -f FILE) [--async] [--bucket=BUCKET] [--continue-on-failure] [--driver-log-levels=[PACKAGE=LEVEL,...]] [--jars=[JAR,...]] [--labels=[KEY=VALUE,...]] [--max-failures-per-hour=MAX_FAILURES_PER_HOUR] [--max-failures-total=MAX_FAILURES_TOTAL] [--params=[PARAM=VALUE,...]] [--properties=[PROPERTY=VALUE,...]] [--properties-file=PROPERTIES_FILE] [--region=REGION] [GCLOUD_WIDE_FLAG ...]

DESCRIPTION

Submit a Pig job to a cluster.

EXAMPLES

To submit a Pig job with a local script, run:

$ gcloud dataproc jobs submit pig --cluster=my-cluster \ --file=my_queries.pig

To submit a Pig job with inline queries, run:

$ gcloud dataproc jobs submit pig --cluster=my-cluster \ -e="LNS = LOAD 'gs://my_bucket/my_file.txt' AS (line)" \ -e="WORDS = FOREACH LNS GENERATE FLATTEN(TOKENIZE(line)) AS \ word" -e="GROUPS = GROUP WORDS BY word" \ -e="WORD_COUNTS = FOREACH GROUPS GENERATE group, COUNT(WORDS)" \ -e="DUMP WORD_COUNTS"

REQUIRED FLAGS

Exactly one of these must be specified:
--cluster=CLUSTER

The Dataproc cluster to submit the job to.

--cluster-labels=[KEY=VALUE,...]

List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers.

Labels of Dataproc cluster on which to place the job.

Exactly one of these must be specified:
--execute=QUERY, -e QUERY

A Pig query to execute as part of the job.

--file=FILE, -f FILE

HCFS URI of file containing Pig script to execute as the job.

OPTIONAL FLAGS

--async

Return immediately, without waiting for the operation in progress to complete.

--bucket=BUCKET

The Cloud Storage bucket to stage files in. Defaults to the cluster's configured bucket.

--continue-on-failure

Whether to continue if a single query fails.

--driver-log-levels=[PACKAGE=LEVEL,...]

A list of package to log4j log level pairs to configure driver logging. For example: root=FATAL,com.example=INFO

--jars=[JAR,...]

Comma separated list of jar files to be provided to Pig and MR. May contain UDFs.

--labels=[KEY=VALUE,...]

List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers.

--max-failures-per-hour=MAX_FAILURES_PER_HOUR

Specifies the maximum number of times a job can be restarted per hour in event of failure. Default is 0 (no retries after job failure).

--max-failures-total=MAX_FAILURES_TOTAL

Specifies the maximum total number of times a job can be restarted after the job fails. Default is 0 (no retries after job failure).

--params=[PARAM=VALUE,...]

A list of key value pairs to set variables in the Pig queries.

--properties=[PROPERTY=VALUE,...]

A list of key value pairs to configure Pig.

--properties-file=PROPERTIES_FILE

Path to a local file or a file in a Cloud Storage bucket containing configuration properties for the job. The client machine running this command must have read permission to the file.

Specify properties in the form of property=value in the text file. For example:

# Properties to set for the job: key1=value1 key2=value2 # Comment out properties not used. # key3=value3

If a property is set in both --properties and --properties-file, the value defined in --properties takes precedence.

--region=REGION

Dataproc region to use. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default dataproc/region property value for this command invocation.

GCLOUD WIDE FLAGS

These flags are available to all commands: --access-token-file, --account, --billing-project, --configuration, --flags-file, --flatten, --format, --help, --impersonate-service-account, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity.

Run $ gcloud help for details.

NOTES

These variants are also available:

$ gcloud alpha dataproc jobs submit pig

$ gcloud beta dataproc jobs submit pig