Overview
All necessary information to keep a certain operation going on. For example: reading three books at the same time, the page number of each book has been turned is the context of continuing to read the book.
CS is used to solve the problem of data and information sharing across multiple systems in a data application development process.
For example, system B needs to use a piece of data generated by system A. The usual practice is as follows:
B system calls the data access interface developed by A system;
System B reads the data written by system A into a shared storage.
With CS, the A and B systems only need to interact with the CS, write the data and information that need to be shared into the CS, and read the data and information that need to be read from the CS, without the need for an external system to develop and adapt. , Which greatly reduces the call complexity and coupling of information sharing between systems, and makes the boundaries of each system clearer.
The metadata context defines the metadata specification.
Metadata context relies on data middleware, and its main functions are as follows:
Open up the relationship with the data middleware, and get all user metadata information (including Hive table metadata, online database table metadata, and other NOSQL metadata such as HBase, Kafka, etc.)
When all nodes need to access metadata, including existing metadata and metadata in the application template, they must go through the metadata context. The metadata context records all metadata information used by the application template.
The new metadata generated by each node must be registered with the metadata context.
When the application template is extracted, the metadata context is abstracted for the application template (mainly, the multiple library tables used are made into \${db}. tables to avoid data permission problems) and all dependent metadata information is packaged.
Metadata context is the basis of interactive workflows and the basis of application templates. Imagine: When Widget is defined, how to know the dimensions of each indicator defined by DataWrangler? How does Qualitis verify the graph report generated by Widget?
The data context defines the data specification.
The data context depends on data middleware and Linkis computing middleware. The main functions are as follows:
Get through the data middleware and get all user data information.
Get through the computing middleware and get the data storage information of all nodes.
When all nodes need to write temporary results, they must pass through the data context and be uniformly allocated by the data context.
When all nodes need to access data, they must pass the data context.
The data context distinguishes between dependent data and generated data. When the application template is extracted, all dependent data is abstracted and packaged for the application template.
The resource context defines the resource specification.
The resource context mainly interacts with Linkis computing middleware. The main functions are as follows:
User resource files (such as Jar, Zip files, properties files, etc.)
User UDF
User algorithm package
User script
The environmental context defines the environmental specification.
The main functions are as follows:
Operating System
Software, such as Hadoop, Spark, etc.
Package dependencies, such as Mysql-JDBC.
The runtime context is all the context information retained when the application template (workflow) is defined and executed.
It is used to assist in defining the workflow/application template, prompting and perfecting all necessary information when the workflow/application template is executed.
The runtime workflow is mainly used by Linkis.
The entrance of external access to CS, Client module provides HA function; Enter Client Architecture Design
Provide a Restful interface to encapsulate and process CS requests submitted by the client; Enter Service Architecture Design
The context query module provides rich and powerful query capabilities for the client to find the key-value key-value pairs of the context; Enter ContextSearch architecture design
The CS listener module provides synchronous and asynchronous event consumption capabilities, and has the ability to notify the Client in real time once the Zookeeper-like Key-Value is updated; Enter Listener architecture design
The context memory cache module provides the ability to quickly retrieve the context and the ability to monitor and clean up JVM memory usage; Enter ContextCache architecture design
Provide CS high availability capability; Enter HighAvailable architecture design
The persistence function of CS; Enter Persistence architecture design