Python Coprocessor

Introduction

Coprocessor, as in TiDB and HBase or Stored Procedure in SQL, is a method to analyze data in local databases and send it to remote node instead of sending all data to remote node, which then lets remote do the analysis. This way, a lot of data moving costs are saved. This picture below depicts how coprocessor works . The RecordBatch (Basically a column in table with type and nullability metadata) can come from anywhere in the database, and returned RecordBatch can be annotated in python grammar to indicate its metadata like type or nullability. Coprocessor will try its best to convert returned object to a RecordBatch , be it a python list, a RecordBatch computed from parameters, or a constant(which is extends to the same length of input arguments)

Python Coprocessor

Two optional backends

CPython Backend powered by PyO3

This backend is powered by PyO3, enabling the use of your favourite Python libraries (such as NumPy, Pandas, etc.) and allowing Conda to manage your Python environment.

But using it also involves some complications. You must set up the correct Python shared library, which can be a bit challenging. In general, you just need to install the python-dev package. However, if you are using Homebrew to install Python on macOS, you must create a proper soft link to Library/Frameworks/Python.framework. Detailed instructions on using PyO3 crate with different Python Version can be found here

Embedded RustPython Interpreter

An experiment python interpreter to run the coprocessor script, it supports Python 3.10 grammar. You can use all the very Python syntax, see User Guide/Python Coprocessor for more!