Data Catalog
The Data Catalog provides a hierarchical view of data entities discovered from your integrations, particularly useful for Databricks Unity Catalog and similar data platforms.
Overviewβ
Access the Data Catalog from the main navigation sidebar. It displays:
- Catalogs and schemas
- Tables and views
- Columns and metadata
- Usage statistics and lineage
Navigationβ
Hierarchical Browserβ
Navigate the catalog tree:
π Catalogs
βββ π production
β βββ π analytics
β β βββ π user_events
β β βββ π orders
β β βββ π products
β βββ π ml_features
β βββ π user_embeddings
β βββ π product_embeddings
βββ π staging
βββ π raw_data
βββ π events_raw
βββ π logs_raw
Searchβ
Use the search bar to find entities:
π Search catalogs, schemas, tables...
Search across:
- Entity names
- Descriptions
- Column names
- Tags
Filtersβ
Filter the catalog view:
- Type - Catalog, Schema, Table, View
- Source - Databricks, Custom
- Tags - Filter by applied tags
- Modified - Recently modified entities
Entity Detailsβ
Table Detailsβ
Click any table to view details:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π user_events β
β production.analytics.user_events β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [Overview] [Columns] [Lineage] [Usage] [Metadata] β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Description: β
β User interaction events from web and mobile applications. β
β β
β Owner: data-team@company.com β
β Created: 2024-01-15 β
β Last Modified: 2024-03-20 β
β Row Count: 45,231,456 β
β Size: 12.4 GB β
β β
β Tags: [pii] [retention:90d] [tier:gold] β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Columns Tabβ
View table schema:
| Column | Type | Nullable | Description |
|---|---|---|---|
| event_id | STRING | No | Unique event identifier |
| user_id | STRING | No | User identifier |
| event_type | STRING | No | Type of event |
| event_data | JSON | Yes | Event payload |
| timestamp | TIMESTAMP | No | Event timestamp |
| STRING | Yes | User email (PII) |
Lineage Tabβ
Visualize data flow:
βββββββββββββββ
β events_raw β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β user_events β βββ You are here
ββββββββ¬βββββββ
β
βββββββ΄ββββββ
βΌ βΌ
βββββββββ βββββββββββββ
βreportsβ βdashboards β
βββββββββ βββββββββββββ
Shows:
- Upstream - Source tables
- Downstream - Dependent tables
- Transformations - Processing steps
Usage Tabβ
View entity usage statistics:
Usage Statistics (Last 30 Days)
βββββββββββββββββββββββββββββββββββββ
Queries: 12,456
Read by: 8 agents
Write operations: 234
Peak usage: Mon 9am-12pm
Top Consumers:
1. ReportingAgent - 5,200 queries
2. AnalyticsBot - 3,800 queries
3. DataPipeline - 2,100 queries
Metadata Tabβ
Custom metadata and properties:
Source System: Kafka
Data Quality Score: 98.5%
Classification: Confidential
Retention Policy: 90 days
Compliance:
- GDPR
- CCPA
Custom Properties:
team: analytics
cost_center: CC-1234
sla: tier-1
Managing Entitiesβ
Adding Descriptionsβ
- Click the edit icon next to Description
- Enter or update the description
- Click Save
Taggingβ
Add tags for organization:
- Click "Add Tag"
- Select from existing tags or create new
- Tags are searchable and filterable
Common tag patterns:
pii- Contains personal dataretention:30d- Data retention policytier:gold- Data quality tierteam:analytics- Owning team
Ownershipβ
Assign entity ownership:
- Click owner field
- Search for user or team
- Select new owner
- Owner receives notifications for changes
Data Qualityβ
Quality Indicatorsβ
Tables display quality badges:
- π’ High Quality - > 95% quality score
- π‘ Medium Quality - 80-95% quality score
- π΄ Low Quality - < 80% quality score
Quality Metricsβ
| Metric | Description |
|---|---|
| Completeness | Non-null value percentage |
| Uniqueness | Unique value percentage for key columns |
| Freshness | Time since last update |
| Consistency | Format and range validation |
Integration with Policiesβ
Data Classification Policiesβ
Automatically classify sensitive data:
- Navigate to Policies
- Enable "Sensitive Data Detection"
- Policy scans catalog entities
- Sensitive columns are flagged
Access Policiesβ
Monitor data access patterns:
- Track which agents access which tables
- Detect unusual access patterns
- Generate compliance reports
Bulk Operationsβ
Export Catalogβ
Export catalog metadata:
- Click "Export" button
- Select format (JSON, CSV, Excel)
- Choose scope (all or selected)
- Download file
Import Metadataβ
Import metadata from external sources:
- Click "Import"
- Upload file (JSON or CSV)
- Map fields
- Review and apply
Search Tipsβ
Basic Searchβ
user_events
Finds entities containing "user_events".
Field-Specific Searchβ
column:email
tag:pii
owner:data-team
Wildcardsβ
user_* # Starts with "user_"
*_events # Ends with "_events"
Filtersβ
type:table modified:last7days tag:pii
Best Practicesβ
1. Document Everythingβ
Add descriptions to all entities:
- Tables: What data does it contain?
- Columns: What does each field represent?
- Schemas: What's the purpose of this schema?
2. Use Consistent Tagsβ
Establish tagging conventions:
- PII indicators:
pii,confidential - Retention:
retention:30d,retention:1y - Quality tiers:
tier:gold,tier:silver,tier:bronze - Teams:
team:analytics,team:ml
3. Assign Ownershipβ
Every entity should have an owner:
- Responsible for data quality
- Point of contact for questions
- Notified of issues
4. Monitor Usageβ
Review usage statistics regularly:
- Identify unused entities (candidates for cleanup)
- Find heavily-used entities (candidates for optimization)
- Track access patterns for compliance
Next Stepsβ
- Discovery - How entities are discovered
- Policies - Set up data governance policies
- Integrations - Configure Databricks integration