Executor module

class joinboost.executor.CudfExecutor(conn, debug=False)

Bases: DataFrameExecutor

class joinboost.executor.DataFrameExecutor(conn, debug=False, df_lib=None)

Bases: DuckdbExecutor

add_table(table: str, table_address)

Add a new table to the database.

Parameters:
  • table (str) – The name of the table to add.

  • table_address (str) – The address of the table to add.

delete_table(table)

Delete a table.

Parameters:

table (str) – The name of the table.

execute_spja_query(spja_data: SPJAData, mode: ExecuteMode = ExecuteMode.WRITE_TO_TABLE)

Executes an SPJA query using the current object’s database connection.

Parameters:
  • spja_data (SPJAData) – The SPJAData object containing the query parameters.

  • mode (ExecuteMode, optional) –

    The mode in which the query is executed. Default is ExecuteMode.NESTED_QUERY. if ExecuteMode.WRITE_TO_TABLE

    The query is executed and the results are stored in a new table. The table name is returned.

    if ExecuteMode.CREATE_VIEW

    The query is executed and the results are stored in a new view. The table name is returned.

    if ExecuteMode.EXECUTE

    The query is executed and the results are returned.

    if ExecuteMode.NESTED_QUERY

    Creates a parenthesized query and returns it as a string.

Returns:

The result of the query. Determined by mode.

Return type:

Any

get_schema(table)

Get a list of column names in a table.

Parameters:

table (str) – The name of the table.

Returns:

A list of column names in the table.

Return type:

list

rename_column(table, old_name, new_name)

Rename a column in a table.

Parameters:
  • table (str) – The name of the table.

  • old_name (str) – The old name of the column.

  • new_name (str) – The new name of the column.

class joinboost.executor.DuckdbExecutor(conn, debug=False)

Bases: Executor

Executor object providing methods for executing queries on a DuckDB database.

conn

A DuckDB connection object.

Type:

Connection

debug

A flag to enable/disable debug mode.

Type:

bool

add_table(table: str, table_address)

Add a new table to the database.

Parameters:
  • table (str) – The name of the table to add.

  • table_address (str) – The address of the table to add.

delete_table(table: str)

Delete a table.

Parameters:

table (str) – The name of the table.

execute_spja_query(spja_data: SPJAData, mode: ExecuteMode = ExecuteMode.NESTED_QUERY) Any

Executes an SPJA query using the current object’s database connection.

Parameters:
  • spja_data (SPJAData) – The SPJAData object containing the query parameters.

  • mode (ExecuteMode, optional) –

    The mode in which the query is executed. Default is ExecuteMode.NESTED_QUERY. if ExecuteMode.WRITE_TO_TABLE

    The query is executed and the results are stored in a new table. The table name is returned.

    if ExecuteMode.CREATE_VIEW

    The query is executed and the results are stored in a new view. The table name is returned.

    if ExecuteMode.EXECUTE

    The query is executed and the results are returned.

    if ExecuteMode.NESTED_QUERY

    Creates a parenthesized query and returns it as a string.

Returns:

The result of the query. Determined by mode.

Return type:

Any

get_schema(table: str) list

Get a list of column names in a table.

Parameters:

table (str) – The name of the table.

Returns:

A list of column names in the table.

Return type:

list

rename_column(table, old_name, new_name)

Rename a column in a table.

Parameters:
  • table (str) – The name of the table.

  • old_name (str) – The old name of the column.

  • new_name (str) – The new name of the column.

spja_query(spja_data: SPJAData, parenthesize: bool = True)

Generates an SQL query based on the given SPJAData object and returns the query as a string.

Parameters:
  • spja_data (SPJAData) – The SPJAData object representing the query to be generated.

  • parenthesize (bool, optional) – wrap the query in parentheses. Default is True

Returns:

The generated SQL query as a string.

Return type:

str

update_query(update_expression, table, select_conds: list = [], qualified=True)

Executes an SQL UPDATE statement on a specified table with the provided update_expression.

Parameters:
  • update_expression (str) – A string specifying the update expression to be executed.

  • table (str) – A string specifying the name of the table to execute the update query on.

  • select_conds (list, optional) – A list of strings specifying the selection conditions for the update query. Default is an empty list.

Raises:

Exception – If the specified table does not start with the prefix of the current DuckDBExecutor object.

Return type:

None

class joinboost.executor.ExecuteMode(value)

Bases: Enum

An enumeration.

class joinboost.executor.Executor

Bases: ABC

Base executor object- defines a template for special executor objects.

Parameters:
  • view_id (int) – The id of the next view to be created.

  • prefix (str) – The prefix to be used for the view names.

abstract add_table(table: str, table_address)

Add a new table to the database.

Parameters:
  • table (str) – The name of the table to add.

  • table_address (str) – The address of the table to add.

abstract delete_table(table: str)

Delete a table.

Parameters:

table (str) – The name of the table.

abstract execute_spja_query(spja_data: SPJAData, mode: ExecuteMode = ExecuteMode.NESTED_QUERY) Any

Executes an SPJA query using the current object’s database connection.

Parameters:
  • spja_data (SPJAData) – The SPJAData object containing the query parameters.

  • mode (ExecuteMode, optional) –

    The mode in which the query is executed. Default is ExecuteMode.NESTED_QUERY. if ExecuteMode.WRITE_TO_TABLE

    The query is executed and the results are stored in a new table. The table name is returned.

    if ExecuteMode.CREATE_VIEW

    The query is executed and the results are stored in a new view. The table name is returned.

    if ExecuteMode.EXECUTE

    The query is executed and the results are returned.

    if ExecuteMode.NESTED_QUERY

    Creates a parenthesized query and returns it as a string.

Returns:

The result of the query. Determined by mode.

Return type:

Any

get_next_name()

Get a unique name of the next view to be created.

abstract get_schema(table: str) list

Get a list of column names in a table.

Parameters:

table (str) – The name of the table.

Returns:

A list of column names in the table.

Return type:

list

abstract rename_column(table, old_name, new_name)

Rename a column in a table.

Parameters:
  • table (str) – The name of the table.

  • old_name (str) – The old name of the column.

  • new_name (str) – The new name of the column.

exception joinboost.executor.ExecutorException

Bases: Exception

joinboost.executor.ExecutorFactory(con=None)

Factory function to create and return Executor objects for different connectors.

Parameters:

con (DuckDBPyConnection or Executor, optional) – The connector to use for creating Executor objects. By default, if con is not specified, the function uses a PandasExecutor.

Returns:

executor – An Executor object for the given connector.

Return type:

Executor

Raises:

ExecutorException – If an unknown connector type is specified, or if the default con is used without installing duckdb.

class joinboost.executor.PandasExecutor(conn, debug=False)

Bases: DataFrameExecutor

class joinboost.executor.SPJAData(aggregate_expressions: dict = <factory>, from_tables: ~typing.List[str] = <factory>, select_conds: ~typing.List[~joinboost.aggregator.SelectionExpression] = <factory>, join_conds: ~typing.List[~joinboost.aggregator.SelectionExpression] = <factory>, group_by: ~typing.List[~joinboost.aggregator.QualifiedAttribute] = <factory>, window_by: ~typing.List[~joinboost.aggregator.QualifiedAttribute] = <factory>, order_by: ~typing.List[~joinboost.aggregator.QualifiedAttribute] = <factory>, limit: ~typing.Optional[int] = None, sample_rate: ~typing.Optional[float] = None, replace: bool = True, join_type: str = 'INNER', qualified: bool = True)

Bases: object

Data structure for SPJA queries. Could be recursive (e.g, from_tables could be a list of SPJAData objects). .. attribute:: aggregate_expressions

dict mapping column names to tuples containing the aggregation expression and the aggregator object.

type:

dict

from_tables

list of table names to select from.

Type:

List[str]

select_conds

list of conditions to apply to the SELECT statement.

Type:

List[joinboost.aggregator.SelectionExpression]

join_conds

list of conditions of the form “table1.col1 IS NOT DISTINCT FROM table2.col2”.

Type:

List[joinboost.aggregator.SelectionExpression]

group_by

list of column names to group by.

Type:

List[joinboost.aggregator.QualifiedAttribute]

window_by

list of column names to use for windowing.

Type:

List[joinboost.aggregator.QualifiedAttribute]

order_by

list of columns to use for ordering the results.

Type:

List[joinboost.aggregator.QualifiedAttribute]

limit

maximum number of rows to return.

Type:

Optional[int]

sample_rate

sampling rate to use for the query.

Type:

Optional[float]

replace

if True, replaces an existing table or view with the same name.

Type:

bool

join_type

type of join to use for the query.

Type:

str

class joinboost.executor.SparkExecutor(conn, debug=False)

Bases: DuckdbExecutor

add_table(table: str, table_address)

Add a new table to the database.

Parameters:
  • table (str) – The name of the table to add.

  • table_address (str) – The address of the table to add.

case_query(from_table: str, operator: str, cond_attr: str, base_val: str, case_definitions: list, select_attrs: list = [], table_name: Optional[str] = None, order_by: Optional[str] = None)

Executes a SQL query with a CASE statement to perform tree-model prediction. Each CASE represents a tree and each WHEN within a CASE represents a leaf.

Parameters:
  • from_table – str, name of the source table

  • operator – str, the operator used to combine predictions

  • cond_attr – str, name of the column used in the conditions of the case statement

  • base_val – int, base value for the entire tree-model

  • case_definitions – list, a list of lists containing the (leaf prediction, leaf predicates) for each tree.

  • select_attrs – list, list of attributes to be selected, defaults to empty

  • table_name – str, name of the new table, defaults to None

  • order_by – str, name of the table to be ordered by rowid, defaults to None

Returns:

str, name of the new table

get_schema(table: str) list

Get a list of column names in a table.

Parameters:

table (str) – The name of the table.

Returns:

A list of column names in the table.

Return type:

list

JoinBoost