DataHub OpenAPI Guide
Why OpenAPI
The OpenAPI standard is a widely used documentation and design approach for REST-ful APIs. To make it easier to integrate with DataHub, we are publishing an OpenAPI based set of endpoints.
Read the DataHub API overview to understand the rationale behind the different API-s and when to use each one.
Locating the OpenAPI endpoints
Currently, the OpenAPI endpoints are isolated to a servlet on GMS and are automatically deployed with a GMS server. The servlet includes auto-generation of an OpenAPI UI, also known as Swagger, which is available at GMS_SERVER_HOST:GMS_PORT/openapi/swagger-ui/index.html. For example, the Quickstart running locally exposes this at http://localhost:8080/openapi/swagger-ui/index.html.
This is also exposed through DataHub frontend as a proxy with the same endpoint, but GMS host and port replaced with DataHub frontend's url (Local Quickstart link) and is available in the top right dropdown under the user profile picture as a link.
Note that it is possible to get the raw JSON or YAML formats of the OpenAPI spec by navigating to BASE_URL/openapi/v3/api-docs or BASE_URL/openapi/v3/api-docs.yaml. The raw forms can be fed into codegen systems to generate client side code in the language of your choice that support the OpenAPI format. We have noticed varying degrees of maturity with different languages in these codegen systems so some may require customizations to be fully compatible.
The OpenAPI UI includes explorable schemas for request and response objects that are fully documented. The models used in the OpenAPI UI are all autogenerated at build time from the PDL models to JSON Schema compatible Java Models.
Understanding the OpenAPI endpoints
While the full OpenAPI spec is always available at GMS_SERVER_HOST:GMS_PORT/openapi/swagger-ui/index.html, here's a quick overview of the main OpenAPI endpoints and their purpose.
Entities (/entities)
The entities endpoints are intended for reads and writes to the metadata graph. The entire DataHub metadata model is available for you to write to (as entity, aspect pairs) or to read an individual entity's metadata from. See examples below.
Relationships (/relationships)
The relationships endpoints are intended for you to query the graph, to navigate relationships from one entity to others. See examples below.
Timeline (/timeline)
The timeline endpoints are intended for querying the versioned history of a given entity over time. For example, you can query a dataset for all schema changes that have happened to it over time, or all documentation changes that have happened to it. See this guide for more details.
Platform (/platform)
Even lower-level API-s that allow you to write metadata events into the DataHub platform using a standard format.
Example Requests
Entities (/entities) endpoint
POST (UPSERT)
A post without any additional URL parameters performs an UPSERT of entity's aspects. The entity will be created if it doesn't exist or updated if it does.
curl --location --request POST 'localhost:8080/openapi/entities/v1/' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer <token>' \
--data-raw '[
{
"aspect": {
"__type": "SchemaMetadata",
"schemaName": "SampleHdfsSchema",
"platform": "urn:li:dataPlatform:platform",
"platformSchema": {
"__type": "MySqlDDL",
"tableSchema": "schema"
},
"version": 0,
"created": {
"time": 1621882982738,
"actor": "urn:li:corpuser:etl",
"impersonator": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 1621882982738,
"actor": "urn:li:corpuser:etl",
"impersonator": "urn:li:corpuser:jdoe"
},
"hash": "",
"fields": [
{
"fieldPath": "county_fips_codefg",
"jsonPath": "null",
"nullable": true,
"description": "null",
"type": {
"type": {
"__type": "StringType"
}
},
"nativeDataType": "String()",
"recursive": false
},
{
"fieldPath": "county_name",
"jsonPath": "null",
"nullable": true,
"description": "null",
"type": {
"type": {
"__type": "StringType"
}
},
"nativeDataType": "String()",
"recursive": false
}
]
},
"entityType": "dataset",
"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)"
}
]'
POST (CREATE)
The second POST example will write the update ONLY if the entity doesn't exist. If the entity does exist the command will return an error instead of overwriting the entity.
In this example we've added a URL parameter createEntityIfNotExists=true
curl --location --request POST 'localhost:8080/openapi/entities/v1/?createEntityIfNotExists=true' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer <token>' \
--data-raw '<see previous example>'
If the entity doesn't exist the response will be identical to the previous example. In the case where the entity already exists, the following error will occur.
422 ValidationExceptionCollection{EntityAspect:(urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD),schemaMetadata) Exceptions: [com.linkedin.metadata.aspect.plugins.validation.AspectValidationException: Cannot perform CREATE if not exists since the entity key already exists.]}
GET
curl --location --request GET 'localhost:8080/openapi/entities/v1/latest?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)&aspectNames=schemaMetadata' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer <token>'
DELETE
curl --location --request DELETE 'localhost:8080/openapi/entities/v1/?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)&soft=true' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer <token>'
Postman Collection
Collection includes a POST, GET, and DELETE for a single entity with a SchemaMetadata aspect
{
"info": {
"_postman_id": "87b7401c-a5dc-47e4-90b4-90fe876d6c28",
"name": "DataHub OpenAPI",
"description": "A description",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "entities/v1",
"item": [
{
"name": "post Entities 1",
"request": {
"method": "POST",
"header": [
{
"key": "Content-Type",
"value": "application/json"
},
{
"key": "Accept",
"value": "application/json"
}
],
"body": {
"mode": "raw",
"raw": "[\n {\n \"aspect\": {\n \"__type\": \"SchemaMetadata\",\n \"schemaName\": \"SampleHdfsSchema\",\n \"platform\": \"urn:li:dataPlatform:platform\",\n \"platformSchema\": {\n \"__type\": \"MySqlDDL\",\n \"tableSchema\": \"schema\"\n },\n \"version\": 0,\n \"created\": {\n \"time\": 1621882982738,\n \"actor\": \"urn:li:corpuser:etl\",\n \"impersonator\": \"urn:li:corpuser:jdoe\"\n },\n \"lastModified\": {\n \"time\": 1621882982738,\n \"actor\": \"urn:li:corpuser:etl\",\n \"impersonator\": \"urn:li:corpuser:jdoe\"\n },\n \"hash\": \"\",\n \"fields\": [\n {\n \"fieldPath\": \"county_fips_codefg\",\n \"jsonPath\": \"null\",\n \"nullable\": true,\n \"description\": \"null\",\n \"type\": {\n \"type\": {\n \"__type\": \"StringType\"\n }\n },\n \"nativeDataType\": \"String()\",\n \"recursive\": false\n },\n {\n \"fieldPath\": \"county_name\",\n \"jsonPath\": \"null\",\n \"nullable\": true,\n \"description\": \"null\",\n \"type\": {\n \"type\": {\n \"__type\": \"StringType\"\n }\n },\n \"nativeDataType\": \"String()\",\n \"recursive\": false\n }\n ]\n },\n \"aspectName\": \"schemaMetadata\",\n \"entityType\": \"dataset\",\n \"entityUrn\": \"urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)\"\n }\n]",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "{{baseUrl}}/openapi/entities/v1/",
"host": [
"{{baseUrl}}"
],
"path": [
"openapi",
"entities",
"v1",
""
]
}
},
"response": [
{
"name": "OK",
"originalRequest": {
"method": "POST",
"header": [],
"body": {
"mode": "raw",
"raw": "[\n {\n \"aspect\": {\n \"value\": \"<Error: Too many levels of nesting to fake this schema>\"\n },\n \"aspectName\": \"aliquip ipsum tempor\",\n \"entityType\": \"ut est\",\n \"entityUrn\": \"enim in nulla\",\n \"entityKeyAspect\": {\n \"value\": \"<Error: Too many levels of nesting to fake this schema>\"\n }\n },\n {\n \"aspect\": {\n \"value\": \"<Error: Too many levels of nesting to fake this schema>\"\n },\n \"aspectName\": \"ipsum id\",\n \"entityType\": \"deser\",\n \"entityUrn\": \"aliqua sit\",\n \"entityKeyAspect\": {\n \"value\": \"<Error: Too many levels of nesting to fake this schema>\"\n }\n }\n]",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "{{baseUrl}}/entities/v1/",
"host": [
"{{baseUrl}}"
],
"path": [
"entities",
"v1",
""
]
}
},
"status": "OK",
"code": 200,
"_postman_previewlanguage": "json",
"header": [
{
"key": "Content-Type",
"value": "application/json"
}
],
"cookie": [],
"body": "[\n \"c\",\n \"labore dolor exercitation in\"\n]"
}
]
},
{
"name": "delete Entities",
"request": {
"method": "DELETE",
"header": [
{
"key": "Accept",
"value": "application/json"
}
],
"url": {
"raw": "{{baseUrl}}/openapi/entities/v1/?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)&soft=true",
"host": [
"{{baseUrl}}"
],
"path": [
"openapi",
"entities",
"v1",
""
],
"query": [
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request."
},
{
"key": "urns",
"value": "labore dolor exercitation in",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request.",
"disabled": true
},
{
"key": "soft",
"value": "true",
"description": "Determines whether the delete will be soft or hard, defaults to true for soft delete"
}
]
}
},
"response": [
{
"name": "OK",
"originalRequest": {
"method": "DELETE",
"header": [],
"url": {
"raw": "{{baseUrl}}/entities/v1/?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)&soft=true",
"host": [
"{{baseUrl}}"
],
"path": [
"entities",
"v1",
""
],
"query": [
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)"
},
{
"key": "urns",
"value": "officia occaecat elit dolor",
"disabled": true
},
{
"key": "soft",
"value": "true"
}
]
}
},
"status": "OK",
"code": 200,
"_postman_previewlanguage": "json",
"header": [
{
"key": "Content-Type",
"value": "application/json"
}
],
"cookie": [],
"body": "[\n {\n \"rowsRolledBack\": [\n {\n \"urn\": \"urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)\"\n }\n ],\n \"rowsDeletedFromEntityDeletion\": 1\n }\n]"
}
]
},
{
"name": "get Entities",
"protocolProfileBehavior": {
"disableUrlEncoding": false
},
"request": {
"method": "GET",
"header": [
{
"key": "Accept",
"value": "application/json"
}
],
"url": {
"raw": "{{baseUrl}}/openapi/entities/v1/latest?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)&aspectNames=schemaMetadata",
"host": [
"{{baseUrl}}"
],
"path": [
"openapi",
"entities",
"v1",
"latest"
],
"query": [
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request."
},
{
"key": "urns",
"value": "labore dolor exercitation in",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request.",
"disabled": true
},
{
"key": "aspectNames",
"value": "schemaMetadata",
"description": "The list of aspect names to retrieve"
},
{
"key": "aspectNames",
"value": "labore dolor exercitation in",
"description": "The list of aspect names to retrieve",
"disabled": true
}
]
}
},
"response": [
{
"name": "OK",
"originalRequest": {
"method": "GET",
"header": [],
"url": {
"raw": "{{baseUrl}}/entities/v1/latest?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)&aspectNames=schemaMetadata",
"host": [
"{{baseUrl}}"
],
"path": [
"entities",
"v1",
"latest"
],
"query": [
{
"key": "urns",
"value": "non exercitation occaecat",
"disabled": true
},
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)"
},
{
"key": "aspectNames",
"value": "non exercitation occaecat",
"disabled": true
},
{
"key": "aspectNames",
"value": "schemaMetadata"
}
]
}
},
"status": "OK",
"code": 200,
"_postman_previewlanguage": "json",
"header": [
{
"key": "Content-Type",
"value": "application/json"
}
],
"cookie": [],
"body": "{\n \"responses\": {\n \"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)\": {\n \"entityName\": \"dataset\",\n \"urn\": \"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)\",\n \"aspects\": {\n \"datasetKey\": {\n \"name\": \"datasetKey\",\n \"type\": \"VERSIONED\",\n \"version\": 0,\n \"value\": {\n \"__type\": \"DatasetKey\",\n \"platform\": \"urn:li:dataPlatform:hive\",\n \"name\": \"SampleHiveDataset\",\n \"origin\": \"PROD\"\n },\n \"created\": {\n \"time\": 1650657843351,\n \"actor\": \"urn:li:corpuser:__datahub_system\"\n }\n },\n \"schemaMetadata\": {\n \"name\": \"schemaMetadata\",\n \"type\": \"VERSIONED\",\n \"version\": 0,\n \"value\": {\n \"__type\": \"SchemaMetadata\",\n \"schemaName\": \"SampleHiveSchema\",\n \"platform\": \"urn:li:dataPlatform:hive\",\n \"version\": 0,\n \"created\": {\n \"time\": 1581407189000,\n \"actor\": \"urn:li:corpuser:jdoe\"\n },\n \"lastModified\": {\n \"time\": 1581407189000,\n \"actor\": \"urn:li:corpuser:jdoe\"\n },\n \"hash\": \"\",\n \"platformSchema\": {\n \"__type\": \"KafkaSchema\",\n \"documentSchema\": \"{\\\"type\\\":\\\"record\\\",\\\"name\\\":\\\"SampleHiveSchema\\\",\\\"namespace\\\":\\\"com.linkedin.dataset\\\",\\\"doc\\\":\\\"Sample Hive dataset\\\",\\\"fields\\\":[{\\\"name\\\":\\\"field_foo\\\",\\\"type\\\":[\\\"string\\\"]},{\\\"name\\\":\\\"field_bar\\\",\\\"type\\\":[\\\"boolean\\\"]}]}\"\n },\n \"fields\": [\n {\n \"fieldPath\": \"field_foo\",\n \"nullable\": false,\n \"description\": \"Foo field description\",\n \"type\": {\n \"type\": {\n \"__type\": \"BooleanType\"\n }\n },\n \"nativeDataType\": \"varchar(100)\",\n \"recursive\": false,\n \"isPartOfKey\": true\n },\n {\n \"fieldPath\": \"field_bar\",\n \"nullable\": false,\n \"description\": \"Bar field description\",\n \"type\": {\n \"type\": {\n \"__type\": \"BooleanType\"\n }\n },\n \"nativeDataType\": \"boolean\",\n \"recursive\": false,\n \"isPartOfKey\": false\n }\n ]\n },\n \"created\": {\n \"time\": 1650610810000,\n \"actor\": \"urn:li:corpuser:UNKNOWN\"\n }\n }\n }\n }\n }\n}"
}
]
}
],
"auth": {
"type": "bearer",
"bearer": [
{
"key": "token",
"value": "{{token}}",
"type": "string"
}
]
},
"event": [
{
"listen": "prerequest",
"script": {
"type": "text/javascript",
"exec": [
""
]
}
},
{
"listen": "test",
"script": {
"type": "text/javascript",
"exec": [
""
]
}
}
]
}
],
"event": [
{
"listen": "prerequest",
"script": {
"type": "text/javascript",
"exec": [
""
]
}
},
{
"listen": "test",
"script": {
"type": "text/javascript",
"exec": [
""
]
}
}
],
"variable": [
{
"key": "baseUrl",
"value": "localhost:8080",
"type": "string"
},
{
"key": "token",
"value": "eyJhbGciOiJIUzI1NiJ9.eyJhY3RvclR5cGUiOiJVU0VSIiwiYWN0b3JJZCI6ImRhdGFodWIiLCJ0eXBlIjoiUEVSU09OQUwiLCJ2ZXJzaW9uIjoiMSIsImV4cCI6MTY1MDY2MDY1NSwianRpIjoiM2E4ZDY3ZTItOTM5Yi00NTY3LWE0MjYtZDdlMDA1ZGU3NjJjIiwic3ViIjoiZGF0YWh1YiIsImlzcyI6ImRhdGFodWItbWV0YWRhdGEtc2VydmljZSJ9.pp_vW2u1tiiTT7U0nDF2EQdcayOMB8jatiOA8Je4JJA",
"type": "default"
}
]
}
Relationships (/relationships) endpoint
GET
Sample Request
curl -X 'GET' \
'http://localhost:8080/openapi/relationships/v1/?urn=urn%3Ali%3Acorpuser%3Adatahub&relationshipTypes=IsPartOf&direction=INCOMING&start=0&count=200' \
-H 'accept: application/json'
Sample Response
{
"start": 0,
"count": 2,
"total": 2,
"entities": [
{
"relationshipType": "IsPartOf",
"urn": "urn:li:corpGroup:bfoo"
},
{
"relationshipType": "IsPartOf",
"urn": "urn:li:corpGroup:jdoe"
}
]
}
Programmatic Usage
Programmatic usage of the models can be done through the Java Rest Emitter which includes the generated models. A minimal Java project for emitting to the OpenAPI endpoints would need the following dependencies (gradle format):
dependencies {
implementation 'io.acryl:datahub-client:<DATAHUB_CLIENT_VERSION>'
implementation 'org.apache.httpcomponents:httpclient:<APACHE_HTTP_CLIENT_VERSION>'
implementation 'org.apache.httpcomponents:httpasyncclient:<APACHE_ASYNC_CLIENT_VERSION>'
}
Writing metadata events to the /platform endpoints
The following code emits metadata events through OpenAPI by constructing a list of UpsertAspectRequest
s. Behind the scenes, this is using the /platform/entities/v1 endpoint to send metadata to GMS.
import io.datahubproject.openapi.generated.DatasetProperties;
import datahub.client.rest.RestEmitter;
import datahub.event.UpsertAspectRequest;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
public class Main {
public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {
RestEmitter emitter = RestEmitter.createWithDefaults();
List<UpsertAspectRequest> requests = new ArrayList<>();
UpsertAspectRequest upsertAspectRequest = UpsertAspectRequest.builder()
.entityType("dataset")
.entityUrn("urn:li:dataset:(urn:li:dataPlatform:bigquery,my-project.my-other-dataset.user-table,PROD)")
.aspect(new DatasetProperties().description("This is the canonical User profile dataset"))
.build();
UpsertAspectRequest upsertAspectRequest2 = UpsertAspectRequest.builder()
.entityType("dataset")
.entityUrn("urn:li:dataset:(urn:li:dataPlatform:bigquery,my-project.another-dataset.user-table,PROD)")
.aspect(new DatasetProperties().description("This is the canonical User profile dataset 2"))
.build();
requests.add(upsertAspectRequest);
requests.add(upsertAspectRequest2);
System.out.println(emitter.emit(requests, null).get());
System.exit(0);
}
}
OpenAPI v3 Features
Conditional Writes
All the create/POST endpoints for aspects support headers
in the POST body to support batch APIs. See the docs in the
MetadataChangeProposal section for the use of these headers to support conditional writes semantics.