Post

Getting Started with gcloud CLI: A Practical Guide Using Document AI

Learn Google Cloud's gcloud CLI from scratch. We'll install gcloud, configure projects, enable APIs, and build a practical document extraction pipeline using Document AI and Vertex AI Gemini.

Getting Started with gcloud CLI: A Practical Guide Using Document AI

The Google Cloud CLI (gcloud) is one of the most powerful tools in a cloud developer’s arsenal. While the web console is great for exploration, the CLI is where real productivity happens—automation, scripting, and rapid iteration.

In this guide, we’ll go from zero to extracting structured data from PDFs using Google Cloud’s Document AI and Vertex AI. Along the way, you’ll learn the essential gcloud commands that form the foundation of any GCP workflow.

Why Use the gcloud CLI?

Before we dive in, here’s why you should invest time learning the CLI:

  • Speed: Perform tasks faster than clicking through the console
  • Automation: Script your infrastructure for CI/CD pipelines
  • Reproducibility: Version control your cloud setup alongside your code
  • Full API Access: Some features are CLI-only or available before the console

Installation

On macOS, the easiest path is Homebrew:

1
brew install --cask google-cloud-sdk

After installation, add gcloud to your shell profile:

1
2
3
# Add to ~/.bash_profile or ~/.zshrc
source "/opt/homebrew/share/google-cloud-sdk/path.bash.inc"
source "/opt/homebrew/share/google-cloud-sdk/completion.bash.inc"

Verify it’s working:

1
gcloud --version

Initial Configuration

Initialize gcloud to authenticate and set your default project:

1
gcloud init

This opens a browser for Google authentication and walks you through project selection. You can also configure settings manually:

1
2
3
4
5
6
7
8
# Set default project
gcloud config set project YOUR_PROJECT_ID

# Set default region (optional but useful)
gcloud config set compute/region us-central1

# View current config
gcloud config list

Creating a New Project

Project IDs must be globally unique, 6-30 characters, start with a letter, and contain only lowercase letters, digits, or hyphens:

1
2
gcloud projects create my-project-name --name="My Project"
gcloud config set project my-project-name

Understanding gcloud Command Structure

All gcloud commands follow a consistent pattern:

1
gcloud [GROUP] [COMMAND] [FLAGS]

For example:

  • gcloud compute instances list — list VMs
  • gcloud storage buckets list — list Cloud Storage buckets
  • gcloud services enable documentai.googleapis.com — enable an API

Getting Help

1
2
3
gcloud help                      # General help
gcloud compute instances --help  # Help for a specific command
gcloud cheat-sheet               # Quick reference

Billing Management

Before using paid services, you need billing configured:

1
2
3
4
5
6
7
8
9
# List billing accounts
gcloud billing accounts list

# Check billing status for a project
gcloud billing projects describe PROJECT_ID

# Link billing to a project
gcloud billing projects link PROJECT_ID \
  --billing-account=BILLING_ACCOUNT_ID

Enabling APIs

GCP services require explicit API enablement:

1
2
3
4
5
6
7
8
# Enable Document AI
gcloud services enable documentai.googleapis.com

# Enable Vertex AI
gcloud services enable aiplatform.googleapis.com

# List enabled APIs
gcloud services list --enabled

Document AI: From PDF to Data

Now let’s put this knowledge to work. Document AI is Google’s service for extracting structured data from documents using machine learning.

Creating a Processor

Document AI doesn’t have direct gcloud commands, so we use the REST API:

1
2
3
4
5
6
7
8
9
# Get auth token
TOKEN=$(gcloud auth print-access-token)

# Create an OCR processor
curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"type": "OCR_PROCESSOR", "displayName": "my-ocr-processor"}' \
  "https://us-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/us/processors"

This returns processor details including the processEndpoint you’ll use for document processing.

Processing a Document

Here’s a Python script that sends a PDF to Document AI for text extraction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import urllib.request
import base64
import json
import subprocess

# Get auth token from gcloud
token = subprocess.check_output([
    'gcloud', 'auth', 'print-access-token'
]).decode().strip()

processor_url = "https://us-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/us/processors/PROCESSOR_ID:process"

# Read and encode PDF
with open("document.pdf", "rb") as f:
    content = base64.b64encode(f.read()).decode()

# Send to Document AI
data = json.dumps({
    "rawDocument": {
        "content": content,
        "mimeType": "application/pdf"
    }
}).encode()

req = urllib.request.Request(
    processor_url,
    data=data,
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    },
    method="POST"
)

with urllib.request.urlopen(req, timeout=60) as response:
    result = json.loads(response.read().decode())
    text = result["document"]["text"]
    print(text)

Available Processor Types

Document AI offers specialized processors for different document types:

Processor Use Case
OCR_PROCESSOR General text extraction
FORM_PARSER_PROCESSOR Forms with key-value pairs
INVOICE_PROCESSOR Invoice parsing
EXPENSE_PROCESSOR Receipt parsing
FORM_W2_PROCESSOR W-2 tax forms

The Extraction Challenge

While Document AI excels at OCR, extracting specific fields from varied documents (names, addresses, dates) is tricky. Regex patterns work for consistent formats but fail on real-world documents with different layouts.

The solution? Combine Document AI’s OCR with an LLM for intelligent extraction.

Enter Vertex AI Gemini

Vertex AI provides access to Google’s Gemini models. Here’s how to use it for smart data extraction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import urllib.request
import json
import subprocess

token = subprocess.check_output([
    'gcloud', 'auth', 'print-access-token'
]).decode().strip()

gemini_url = "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-001:generateContent"

# OCR text from Document AI
ocr_text = "..."  # Your extracted text here

prompt = """Extract the following from this legal document.
Return ONLY valid JSON:

{
  "purchaser_name": "",
  "property_address": "",
  "date_filed": "",
  "mailing_address": ""
}

Document:
""" + ocr_text[:8000]

data = json.dumps({
    "contents": [{"role": "user", "parts": [{"text": prompt}]}],
    "generationConfig": {
        "temperature": 0.1,
        "maxOutputTokens": 1024
    }
}).encode()

req = urllib.request.Request(
    gemini_url,
    data=data,
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    },
    method="POST"
)

with urllib.request.urlopen(req, timeout=60) as response:
    result = json.loads(response.read().decode())
    extracted = result["candidates"][0]["content"]["parts"][0]["text"]
    print(extracted)

The Complete Pipeline

Here’s the full workflow for processing a batch of documents:

  1. OCR with Document AI (extract raw text)
  2. Classify documents (identify type from headers)
  3. Extract with Gemini (intelligent field extraction)
  4. Output to CSV (structured data)

Results Comparison

Using this approach on 27 legal documents:

Method Names Found Addresses Found Dates Found
Regex patterns 10% 52% 100%
Gemini extraction 96% 100% 100%

The LLM dramatically outperforms regex for varied document formats.

Cost Breakdown

Both services have generous free tiers:

Service Free Tier Paid Rate
Document AI OCR 1,000 pages/month $1.50/1K pages
Document AI Form Parser 1,000 pages/month $30/1K pages
Vertex AI Gemini Varies ~$0.075/1M input tokens

For our 27-document test batch (109 pages total), the cost was approximately $0.02 total.

Key gcloud Commands Reference

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Configuration
gcloud config list
gcloud config set project PROJECT_ID

# Authentication
gcloud auth list
gcloud auth login
gcloud auth print-access-token

# Projects
gcloud projects list
gcloud projects create PROJECT_ID

# Billing
gcloud billing accounts list
gcloud billing projects link PROJECT_ID --billing-account=ACCOUNT_ID

# APIs
gcloud services enable SERVICE_NAME
gcloud services list --enabled

# Components
gcloud components list
gcloud components install beta
gcloud components update

Key Takeaways

  • Start with gcloud init to configure authentication and defaults
  • Enable APIs explicitly before using any GCP service
  • Use the REST API for services without direct gcloud commands
  • Combine services (Document AI + Vertex AI) for intelligent pipelines
  • Free tiers are generous for learning and small projects
  • LLMs beat regex for extracting data from varied documents

Resources

This post is licensed under CC BY 4.0 by the author.