Getting Started with gcloud CLI: A Practical Guide Using Document AI
Learn Google Cloud's gcloud CLI from scratch. We'll install gcloud, configure projects, enable APIs, and build a practical document extraction pipeline using Document AI and Vertex AI Gemini.
The Google Cloud CLI (gcloud) is one of the most powerful tools in a cloud developer’s arsenal. While the web console is great for exploration, the CLI is where real productivity happens—automation, scripting, and rapid iteration.
In this guide, we’ll go from zero to extracting structured data from PDFs using Google Cloud’s Document AI and Vertex AI. Along the way, you’ll learn the essential gcloud commands that form the foundation of any GCP workflow.
Why Use the gcloud CLI?
Before we dive in, here’s why you should invest time learning the CLI:
- Speed: Perform tasks faster than clicking through the console
- Automation: Script your infrastructure for CI/CD pipelines
- Reproducibility: Version control your cloud setup alongside your code
- Full API Access: Some features are CLI-only or available before the console
Installation
On macOS, the easiest path is Homebrew:
1
brew install --cask google-cloud-sdk
After installation, add gcloud to your shell profile:
1
2
3
# Add to ~/.bash_profile or ~/.zshrc
source "/opt/homebrew/share/google-cloud-sdk/path.bash.inc"
source "/opt/homebrew/share/google-cloud-sdk/completion.bash.inc"
Verify it’s working:
1
gcloud --version
Initial Configuration
Initialize gcloud to authenticate and set your default project:
1
gcloud init
This opens a browser for Google authentication and walks you through project selection. You can also configure settings manually:
1
2
3
4
5
6
7
8
# Set default project
gcloud config set project YOUR_PROJECT_ID
# Set default region (optional but useful)
gcloud config set compute/region us-central1
# View current config
gcloud config list
Creating a New Project
Project IDs must be globally unique, 6-30 characters, start with a letter, and contain only lowercase letters, digits, or hyphens:
1
2
gcloud projects create my-project-name --name="My Project"
gcloud config set project my-project-name
Understanding gcloud Command Structure
All gcloud commands follow a consistent pattern:
1
gcloud [GROUP] [COMMAND] [FLAGS]
For example:
gcloud compute instances list— list VMsgcloud storage buckets list— list Cloud Storage bucketsgcloud services enable documentai.googleapis.com— enable an API
Getting Help
1
2
3
gcloud help # General help
gcloud compute instances --help # Help for a specific command
gcloud cheat-sheet # Quick reference
Billing Management
Before using paid services, you need billing configured:
1
2
3
4
5
6
7
8
9
# List billing accounts
gcloud billing accounts list
# Check billing status for a project
gcloud billing projects describe PROJECT_ID
# Link billing to a project
gcloud billing projects link PROJECT_ID \
--billing-account=BILLING_ACCOUNT_ID
Enabling APIs
GCP services require explicit API enablement:
1
2
3
4
5
6
7
8
# Enable Document AI
gcloud services enable documentai.googleapis.com
# Enable Vertex AI
gcloud services enable aiplatform.googleapis.com
# List enabled APIs
gcloud services list --enabled
Document AI: From PDF to Data
Now let’s put this knowledge to work. Document AI is Google’s service for extracting structured data from documents using machine learning.
Creating a Processor
Document AI doesn’t have direct gcloud commands, so we use the REST API:
1
2
3
4
5
6
7
8
9
# Get auth token
TOKEN=$(gcloud auth print-access-token)
# Create an OCR processor
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"type": "OCR_PROCESSOR", "displayName": "my-ocr-processor"}' \
"https://us-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/us/processors"
This returns processor details including the processEndpoint you’ll use for document processing.
Processing a Document
Here’s a Python script that sends a PDF to Document AI for text extraction:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import urllib.request
import base64
import json
import subprocess
# Get auth token from gcloud
token = subprocess.check_output([
'gcloud', 'auth', 'print-access-token'
]).decode().strip()
processor_url = "https://us-documentai.googleapis.com/v1/projects/PROJECT_ID/locations/us/processors/PROCESSOR_ID:process"
# Read and encode PDF
with open("document.pdf", "rb") as f:
content = base64.b64encode(f.read()).decode()
# Send to Document AI
data = json.dumps({
"rawDocument": {
"content": content,
"mimeType": "application/pdf"
}
}).encode()
req = urllib.request.Request(
processor_url,
data=data,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
},
method="POST"
)
with urllib.request.urlopen(req, timeout=60) as response:
result = json.loads(response.read().decode())
text = result["document"]["text"]
print(text)
Available Processor Types
Document AI offers specialized processors for different document types:
| Processor | Use Case |
|---|---|
OCR_PROCESSOR |
General text extraction |
FORM_PARSER_PROCESSOR |
Forms with key-value pairs |
INVOICE_PROCESSOR |
Invoice parsing |
EXPENSE_PROCESSOR |
Receipt parsing |
FORM_W2_PROCESSOR |
W-2 tax forms |
The Extraction Challenge
While Document AI excels at OCR, extracting specific fields from varied documents (names, addresses, dates) is tricky. Regex patterns work for consistent formats but fail on real-world documents with different layouts.
The solution? Combine Document AI’s OCR with an LLM for intelligent extraction.
Enter Vertex AI Gemini
Vertex AI provides access to Google’s Gemini models. Here’s how to use it for smart data extraction:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import urllib.request
import json
import subprocess
token = subprocess.check_output([
'gcloud', 'auth', 'print-access-token'
]).decode().strip()
gemini_url = "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-2.0-flash-001:generateContent"
# OCR text from Document AI
ocr_text = "..." # Your extracted text here
prompt = """Extract the following from this legal document.
Return ONLY valid JSON:
{
"purchaser_name": "",
"property_address": "",
"date_filed": "",
"mailing_address": ""
}
Document:
""" + ocr_text[:8000]
data = json.dumps({
"contents": [{"role": "user", "parts": [{"text": prompt}]}],
"generationConfig": {
"temperature": 0.1,
"maxOutputTokens": 1024
}
}).encode()
req = urllib.request.Request(
gemini_url,
data=data,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
},
method="POST"
)
with urllib.request.urlopen(req, timeout=60) as response:
result = json.loads(response.read().decode())
extracted = result["candidates"][0]["content"]["parts"][0]["text"]
print(extracted)
The Complete Pipeline
Here’s the full workflow for processing a batch of documents:
- OCR with Document AI (extract raw text)
- Classify documents (identify type from headers)
- Extract with Gemini (intelligent field extraction)
- Output to CSV (structured data)
Results Comparison
Using this approach on 27 legal documents:
| Method | Names Found | Addresses Found | Dates Found |
|---|---|---|---|
| Regex patterns | 10% | 52% | 100% |
| Gemini extraction | 96% | 100% | 100% |
The LLM dramatically outperforms regex for varied document formats.
Cost Breakdown
Both services have generous free tiers:
| Service | Free Tier | Paid Rate |
|---|---|---|
| Document AI OCR | 1,000 pages/month | $1.50/1K pages |
| Document AI Form Parser | 1,000 pages/month | $30/1K pages |
| Vertex AI Gemini | Varies | ~$0.075/1M input tokens |
For our 27-document test batch (109 pages total), the cost was approximately $0.02 total.
Key gcloud Commands Reference
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Configuration
gcloud config list
gcloud config set project PROJECT_ID
# Authentication
gcloud auth list
gcloud auth login
gcloud auth print-access-token
# Projects
gcloud projects list
gcloud projects create PROJECT_ID
# Billing
gcloud billing accounts list
gcloud billing projects link PROJECT_ID --billing-account=ACCOUNT_ID
# APIs
gcloud services enable SERVICE_NAME
gcloud services list --enabled
# Components
gcloud components list
gcloud components install beta
gcloud components update
Key Takeaways
- Start with gcloud init to configure authentication and defaults
- Enable APIs explicitly before using any GCP service
- Use the REST API for services without direct gcloud commands
- Combine services (Document AI + Vertex AI) for intelligent pipelines
- Free tiers are generous for learning and small projects
- LLMs beat regex for extracting data from varied documents