Executor module
- class joinboost.executor.CudfExecutor(conn, debug=False)
Bases:
DataFrameExecutor
- class joinboost.executor.DataFrameExecutor(conn, debug=False, df_lib=None)
Bases:
DuckdbExecutor- add_table(table: str, table_address)
Add a new table to the database.
- Parameters:
table (str) – The name of the table to add.
table_address (str) – The address of the table to add.
- delete_table(table)
Delete a table.
- Parameters:
table (str) – The name of the table.
- execute_spja_query(spja_data: SPJAData, mode: ExecuteMode = ExecuteMode.WRITE_TO_TABLE)
Executes an SPJA query using the current object’s database connection.
- Parameters:
spja_data (SPJAData) – The SPJAData object containing the query parameters.
mode (ExecuteMode, optional) –
The mode in which the query is executed. Default is ExecuteMode.NESTED_QUERY. if ExecuteMode.WRITE_TO_TABLE
The query is executed and the results are stored in a new table. The table name is returned.
- if ExecuteMode.CREATE_VIEW
The query is executed and the results are stored in a new view. The table name is returned.
- if ExecuteMode.EXECUTE
The query is executed and the results are returned.
- if ExecuteMode.NESTED_QUERY
Creates a parenthesized query and returns it as a string.
- Returns:
The result of the query. Determined by mode.
- Return type:
Any
- get_schema(table)
Get a list of column names in a table.
- Parameters:
table (str) – The name of the table.
- Returns:
A list of column names in the table.
- Return type:
list
- rename_column(table, old_name, new_name)
Rename a column in a table.
- Parameters:
table (str) – The name of the table.
old_name (str) – The old name of the column.
new_name (str) – The new name of the column.
- class joinboost.executor.DuckdbExecutor(conn, debug=False)
Bases:
ExecutorExecutor object providing methods for executing queries on a DuckDB database.
- conn
A DuckDB connection object.
- Type:
Connection
- debug
A flag to enable/disable debug mode.
- Type:
bool
- add_table(table: str, table_address)
Add a new table to the database.
- Parameters:
table (str) – The name of the table to add.
table_address (str) – The address of the table to add.
- delete_table(table: str)
Delete a table.
- Parameters:
table (str) – The name of the table.
- execute_spja_query(spja_data: SPJAData, mode: ExecuteMode = ExecuteMode.NESTED_QUERY) Any
Executes an SPJA query using the current object’s database connection.
- Parameters:
spja_data (SPJAData) – The SPJAData object containing the query parameters.
mode (ExecuteMode, optional) –
The mode in which the query is executed. Default is ExecuteMode.NESTED_QUERY. if ExecuteMode.WRITE_TO_TABLE
The query is executed and the results are stored in a new table. The table name is returned.
- if ExecuteMode.CREATE_VIEW
The query is executed and the results are stored in a new view. The table name is returned.
- if ExecuteMode.EXECUTE
The query is executed and the results are returned.
- if ExecuteMode.NESTED_QUERY
Creates a parenthesized query and returns it as a string.
- Returns:
The result of the query. Determined by mode.
- Return type:
Any
- get_schema(table: str) list
Get a list of column names in a table.
- Parameters:
table (str) – The name of the table.
- Returns:
A list of column names in the table.
- Return type:
list
- rename_column(table, old_name, new_name)
Rename a column in a table.
- Parameters:
table (str) – The name of the table.
old_name (str) – The old name of the column.
new_name (str) – The new name of the column.
- spja_query(spja_data: SPJAData, parenthesize: bool = True)
Generates an SQL query based on the given SPJAData object and returns the query as a string.
- Parameters:
spja_data (SPJAData) – The SPJAData object representing the query to be generated.
parenthesize (bool, optional) – wrap the query in parentheses. Default is True
- Returns:
The generated SQL query as a string.
- Return type:
str
- update_query(update_expression, table, select_conds: list = [], qualified=True)
Executes an SQL UPDATE statement on a specified table with the provided update_expression.
- Parameters:
update_expression (str) – A string specifying the update expression to be executed.
table (str) – A string specifying the name of the table to execute the update query on.
select_conds (list, optional) – A list of strings specifying the selection conditions for the update query. Default is an empty list.
- Raises:
Exception – If the specified table does not start with the prefix of the current DuckDBExecutor object.
- Return type:
None
- class joinboost.executor.ExecuteMode(value)
Bases:
EnumAn enumeration.
- class joinboost.executor.Executor
Bases:
ABCBase executor object- defines a template for special executor objects.
- Parameters:
view_id (int) – The id of the next view to be created.
prefix (str) – The prefix to be used for the view names.
- abstract add_table(table: str, table_address)
Add a new table to the database.
- Parameters:
table (str) – The name of the table to add.
table_address (str) – The address of the table to add.
- abstract delete_table(table: str)
Delete a table.
- Parameters:
table (str) – The name of the table.
- abstract execute_spja_query(spja_data: SPJAData, mode: ExecuteMode = ExecuteMode.NESTED_QUERY) Any
Executes an SPJA query using the current object’s database connection.
- Parameters:
spja_data (SPJAData) – The SPJAData object containing the query parameters.
mode (ExecuteMode, optional) –
The mode in which the query is executed. Default is ExecuteMode.NESTED_QUERY. if ExecuteMode.WRITE_TO_TABLE
The query is executed and the results are stored in a new table. The table name is returned.
- if ExecuteMode.CREATE_VIEW
The query is executed and the results are stored in a new view. The table name is returned.
- if ExecuteMode.EXECUTE
The query is executed and the results are returned.
- if ExecuteMode.NESTED_QUERY
Creates a parenthesized query and returns it as a string.
- Returns:
The result of the query. Determined by mode.
- Return type:
Any
- get_next_name()
Get a unique name of the next view to be created.
- abstract get_schema(table: str) list
Get a list of column names in a table.
- Parameters:
table (str) – The name of the table.
- Returns:
A list of column names in the table.
- Return type:
list
- abstract rename_column(table, old_name, new_name)
Rename a column in a table.
- Parameters:
table (str) – The name of the table.
old_name (str) – The old name of the column.
new_name (str) – The new name of the column.
- exception joinboost.executor.ExecutorException
Bases:
Exception
- joinboost.executor.ExecutorFactory(con=None)
Factory function to create and return Executor objects for different connectors.
- Parameters:
con (DuckDBPyConnection or Executor, optional) – The connector to use for creating Executor objects. By default, if con is not specified, the function uses a PandasExecutor.
- Returns:
executor – An Executor object for the given connector.
- Return type:
- Raises:
ExecutorException – If an unknown connector type is specified, or if the default con is used without installing duckdb.
- class joinboost.executor.PandasExecutor(conn, debug=False)
Bases:
DataFrameExecutor
- class joinboost.executor.SPJAData(aggregate_expressions: dict = <factory>, from_tables: ~typing.List[str] = <factory>, select_conds: ~typing.List[~joinboost.aggregator.SelectionExpression] = <factory>, join_conds: ~typing.List[~joinboost.aggregator.SelectionExpression] = <factory>, group_by: ~typing.List[~joinboost.aggregator.QualifiedAttribute] = <factory>, window_by: ~typing.List[~joinboost.aggregator.QualifiedAttribute] = <factory>, order_by: ~typing.List[~joinboost.aggregator.QualifiedAttribute] = <factory>, limit: ~typing.Optional[int] = None, sample_rate: ~typing.Optional[float] = None, replace: bool = True, join_type: str = 'INNER', qualified: bool = True)
Bases:
objectData structure for SPJA queries. Could be recursive (e.g, from_tables could be a list of SPJAData objects). .. attribute:: aggregate_expressions
dict mapping column names to tuples containing the aggregation expression and the aggregator object.
- type:
dict
- from_tables
list of table names to select from.
- Type:
List[str]
- select_conds
list of conditions to apply to the SELECT statement.
- Type:
List[joinboost.aggregator.SelectionExpression]
- join_conds
list of conditions of the form “table1.col1 IS NOT DISTINCT FROM table2.col2”.
- Type:
List[joinboost.aggregator.SelectionExpression]
- group_by
list of column names to group by.
- Type:
List[joinboost.aggregator.QualifiedAttribute]
- window_by
list of column names to use for windowing.
- Type:
List[joinboost.aggregator.QualifiedAttribute]
- order_by
list of columns to use for ordering the results.
- Type:
List[joinboost.aggregator.QualifiedAttribute]
- limit
maximum number of rows to return.
- Type:
Optional[int]
- sample_rate
sampling rate to use for the query.
- Type:
Optional[float]
- replace
if True, replaces an existing table or view with the same name.
- Type:
bool
- join_type
type of join to use for the query.
- Type:
str
- class joinboost.executor.SparkExecutor(conn, debug=False)
Bases:
DuckdbExecutor- add_table(table: str, table_address)
Add a new table to the database.
- Parameters:
table (str) – The name of the table to add.
table_address (str) – The address of the table to add.
- case_query(from_table: str, operator: str, cond_attr: str, base_val: str, case_definitions: list, select_attrs: list = [], table_name: Optional[str] = None, order_by: Optional[str] = None)
Executes a SQL query with a CASE statement to perform tree-model prediction. Each CASE represents a tree and each WHEN within a CASE represents a leaf.
- Parameters:
from_table – str, name of the source table
operator – str, the operator used to combine predictions
cond_attr – str, name of the column used in the conditions of the case statement
base_val – int, base value for the entire tree-model
case_definitions – list, a list of lists containing the (leaf prediction, leaf predicates) for each tree.
select_attrs – list, list of attributes to be selected, defaults to empty
table_name – str, name of the new table, defaults to None
order_by – str, name of the table to be ordered by rowid, defaults to None
- Returns:
str, name of the new table
- get_schema(table: str) list
Get a list of column names in a table.
- Parameters:
table (str) – The name of the table.
- Returns:
A list of column names in the table.
- Return type:
list