Metadata Extraction
Understand how HubSpot Deploy reads and processes your metadata from HubSpot portals and Git repositories. This guide explains what happens during extraction and why it matters for your workflows.
Overview
Metadata extraction is the process of reading your HubSpot configuration (workflows, forms, properties, etc.) and preparing it for comparison and deployment.
When you create a comparison or backup, the system:
- Reads metadata from HubSpot API or Git repository
- Converts it to a portable format that works across environments
- Stores it for comparison and deployment
This happens automatically in the background, but understanding the process helps you:
- Know what to expect during extraction
- Understand why it takes time
- Troubleshoot issues when they occur
What Gets Extracted?
From HubSpot Portals
The system reads your portal configuration through the HubSpot API:
CRM Configuration:
- Custom objects and properties
- Standard object properties
- Property groups
- Pipelines and stages
- Association labels
Marketing Assets:
- Workflows
- Forms
- Email templates
- Landing pages
- Site pages
- Blog posts
- CTAs
- Campaigns
Lists and Segmentation:
- Contact lists
- Company lists
- Deal lists
- Custom object lists
Sales Tools:
- Sequences
- Quote templates
Users and Teams:
- Owners (users)
- Teams
See Metadata Types for complete details.
From Git Repositories
The system reads JSON/YAML files from your repository:
my-hubspot-metadata/
├── workflows/
│ ├── welcome-workflow.json
│ └── nurture-workflow.json
├── forms/
│ ├── contact-form.json
│ └── demo-request.json
├── custom_objects/
│ └── deals-extended.json
└── properties/
├── contact-properties.json
└── company-properties.json
Each file represents one metadata item in the same format HubSpot uses.
How Extraction Works
Step 1: Reading Data
From HubSpot:
- Connects using your OAuth or Private App credentials
- Fetches metadata through HubSpot API
- Handles pagination for large datasets (e.g., 1000+ workflows)
- Respects API rate limits automatically
From Git:
- Clones or pulls your repository
- Reads JSON/YAML files from directories
- Parses each file into metadata objects
Step 2: Making It Portable
The Problem: HubSpot uses numeric IDs that are different in each portal.
For example, the same owner might be:
- ID
12345in Production - ID
67890in Staging
This makes it impossible to compare or deploy between portals.
The Solution: Convert IDs to human-readable names (URNs).
Before (portal-specific):
"ownerId": "12345"
After (portable):
"ownerId": "12345"
"__stable__ownerId": "urn:hubspot:users:email:john@example.com"
Now the system can:
- Compare workflows between portals (even if owner IDs differ)
- Deploy workflows to any portal (matching by email, not ID)
- Show you meaningful names instead of cryptic IDs
See URN Management for details.
Step 3: Converting to YAML
Why YAML?
YAML is a human-readable format that's perfect for configuration:
- Easy to read and understand
- Works great with Git (clean diffs)
- Industry standard for infrastructure-as-code
Example:
"name": "Contact Form"
"formType": "HUBSPOT"
"submitText": "Submit"
"fields":
- "name": "email"
"label": "Email Address"
"required": true
- "name": "firstname"
"label": "First Name"
"required": false
You can view this YAML in the comparison diff viewer to see exactly what changed.
Step 4: Detecting Changes
The system calculates a "fingerprint" (checksum) for each item to quickly detect changes:
- Same fingerprint = No changes, skip it
- Different fingerprint = Something changed, show in comparison
This makes re-extraction much faster because unchanged items are skipped.
Extraction Order
Metadata types are extracted in a specific order to handle dependencies:
Phase 1: Foundation (no dependencies)
- Owners
- Custom and standard objects
- Email templates
Phase 2: Marketing (depends on Phase 1) 4. Campaigns 5. Forms 6. CTAs
Phase 3: Automation (depends on Phase 1 & 2) 7. Lists 8. Workflows 9. Sequences
Phase 4: Content (independent) 10. Landing pages 11. Site pages 12. Blog posts
Phase 5: CRM (depends on objects) 13. Pipelines 14. Property groups 15. Association definitions 16. Quote templates
Why this order?
Some metadata types reference others. For example:
- Workflows reference owners, lists, and objects
- Forms reference campaigns
- CTAs reference campaigns, forms, and emails
By extracting in dependency order, the system can properly convert all references to portable URNs.
Progress Tracking
Real-Time Status
During extraction, you'll see real-time progress for each metadata type:
Status indicators:
- ⏳ Not populated: Waiting to start
- 🔄 Populating: Currently extracting
- ✅ Populated: Successfully extracted
- ⚠️ Skipped (missing scopes): OAuth permissions missing
- ❌ Error: Extraction failed
Example progress:
✅ Owners (15 items)
✅ Custom Objects (3 items)
🔄 Workflows (extracting...)
⏳ Forms (waiting...)
⚠️ Sequences (missing scopes)
What Affects Speed?
Portal size:
- Small portal (less than100 items): 1-2 minutes
- Medium portal (100-1000 items): 3-5 minutes
- Large portal (more than 1000 items): 5-15 minutes
Factors:
- Number of metadata types selected
- Amount of data in each type
- HubSpot API rate limits (10,000 requests per day)
- Network latency
Tip: If you only need specific metadata types, extract only those instead of "all" to save time.
OAuth Scopes and Permissions
Scope Validation
Before extracting each metadata type, the system checks if you have the required OAuth scopes.
If scopes are missing:
- Metadata type is skipped
- Marked as "skipped (missing scopes)"
- Extraction continues with other types
- You'll see a warning in the UI
Common scenarios:
✅ Full access (all scopes granted):
✅ Workflows
✅ Forms
✅ Sequences
✅ All metadata types available
⚠️ Limited access (some scopes missing):
✅ Workflows
✅ Forms
⚠️ Sequences (missing sales-email-read scope)
Required Scopes by Type
| Metadata Type | Required Scope |
|---|---|
| Owners | crm.objects.owners.read |
| Custom Objects | crm.schemas.custom.read |
| Workflows | automation |
| Forms | forms |
| Email Templates | content |
| Lists | crm.lists.read |
| CTAs | content |
| Campaigns | content |
| Landing Pages | content |
| Site Pages | content |
| Blog Posts | content |
| Sequences | sales-email-read |
| Quote Templates | crm.objects.quotes.read |
Solution: If metadata types are skipped, re-authenticate your connection with the required scopes.
See OAuth Scopes for complete details.
Viewing Extracted Metadata
Instance Observer
After extraction, view your portal's metadata in Instance Observer:
- Navigate to Connections
- Click on your portal
- Go to Metadata tab
You'll see a read-only view of all extracted metadata organized by type.
See Instance Observer for details.
Comparison Diff Viewer
When comparing two sources, view the YAML diff:
- Create a comparison
- Wait for extraction to complete
- Click on any item to see side-by-side YAML comparison
The diff shows:
- Green: Added in target
- Red: Removed from target
- Yellow: Modified between source and target
Common Workflows
Initial Portal Extraction
Goal: Extract all metadata from a portal for the first time
- Navigate to Connections
- Click Connect HubSpot
- Authorize with all required scopes
- Wait for automatic extraction
- View metadata in Instance Observer
Time: 5-15 minutes for typical portal
Comparison Extraction
Goal: Extract metadata for a specific comparison
- Navigate to Comparisons
- Click New Comparison
- Select source and target
- Click Initialize Comparison
- Wait for both sides to extract
- Review differences in diff viewer
Time: 3-10 minutes per side
Re-Extraction
Goal: Update metadata after portal changes
- Navigate to comparison or Instance Observer
- Click Refresh or Re-extract
- Wait for extraction
- View updated metadata
Time: 1-5 minutes (faster due to change detection)
Troubleshooting
Extraction Stuck in "Populating"
Problem: Metadata type shows "populating" but never completes
Possible causes:
- Large dataset taking time
- API rate limit reached
- Network issues
Solution:
- Wait patiently: Large portals can take 10-15 minutes
- Check progress: Look for item counts increasing
- Refresh page: Sometimes UI doesn't update
- Try again: If truly stuck after 20 minutes, cancel and retry
Some Metadata Types Skipped
Problem: Metadata types marked as "skipped (missing scopes)"
Cause: Your connection doesn't have required OAuth permissions
Solution:
- Navigate to Connections
- Click on your portal
- Click Re-authenticate
- Grant all requested scopes
- Retry extraction
Example: If "Sequences" is skipped, you need the sales-email-read scope.
See OAuth Scopes for required scopes.
Extraction Very Slow
Problem: Extraction takes longer than expected
Factors affecting speed:
- Portal size: More items = more time
- Metadata types: Extracting "all" takes longer than specific types
- API rate limits: HubSpot limits requests per day
- Network: Slow connection affects speed
Expected times:
- Small portal (less than 100 items): 1-2 minutes
- Medium portal (100-1000 items): 3-5 minutes
- Large portal (more than 1000 items): 5-15 minutes
- Enterprise portal (more than 5000 items): 15-30 minutes
Optimization tips:
- Extract only needed metadata types
- Avoid multiple concurrent extractions
- Ensure stable internet connection
"Failed to Extract" Error
Problem: Extraction fails with error message
Common causes:
1. Connection expired:
- Solution: Re-authenticate your connection
2. Missing permissions:
- Solution: Grant required OAuth scopes
3. API rate limit:
- Solution: Wait 10 minutes and retry
4. Network timeout:
- Solution: Check internet connection and retry
5. HubSpot API issue:
- Solution: Check HubSpot status page, retry later
Can't See Extracted Metadata
Problem: Extraction completed but can't find metadata
Check these locations:
For comparisons:
- Navigate to Comparisons
- Find your comparison
- Click View
- Metadata shown in diff viewer
For portal-level:
- Navigate to Connections
- Click on your portal
- Go to Metadata tab
- Browse extracted metadata
Differences Look Wrong
Problem: Comparison shows unexpected differences
Possible causes:
1. Stale data:
- Solution: Click Refresh to re-extract
2. Manual portal changes:
- Solution: This is expected! The diff shows what changed.
3. Different environments:
- Solution: Production and staging are supposed to differ
4. Timing:
- Solution: Extract both sides close together in time
Best Practices
When to Extract
Before deployments:
- Always extract fresh data before deploying
- Ensures you see latest changes
- Prevents deploying stale configuration
After manual changes:
- Re-extract after editing in HubSpot UI
- Updates comparison with your changes
- Enables accurate drift detection
Regular schedule:
- Daily for active portals
- Weekly for stable portals
- Enables drift detection and audit trail
Scope Management
Initial connection:
- Grant all scopes you might need
- Easier than re-authenticating later
- Enables full metadata extraction
Review skipped types:
- Check which metadata types were skipped
- Understand what you're missing
- Re-authenticate if needed
Least privilege:
- Only grant scopes you actually use
- Reduces security risk
- Simplifies permission management
Performance Tips
Extract selectively:
- Don't always extract "all" metadata types
- Choose specific types you need
- Saves time and resources
Avoid concurrent extractions:
- Don't extract multiple comparisons simultaneously
- Can hit API rate limits
- Slows down all extractions
Clean up old comparisons:
- Delete comparisons you no longer need
- Reduces database size
- Improves overall performance
Related Features
- URN Management: Understanding portable references
- Comparisons: Using extracted metadata
- Deployments: Deploying changes
- OAuth Scopes: Required permissions
- Instance Observer: Viewing metadata
- Metadata Types: Supported types
Next Steps
- Learn about URN Management for portable references
- Create your first comparison
- Understand deployment process
- Set up drift detection