scystream.sdk.database_handling package#

Submodules#

scystream.sdk.database_handling.postgres_manager module#

class scystream.sdk.database_handling.postgres_manager.PostgresConfig[source]#

Bases: BaseModel

Configuration class for PostgreSQL connection details.

This class holds the necessary configuration parameters to connect to a PostgreSQL database. It includes the database user, password, host, and port.

Parameters:

PG_USER – The username for the PostgreSQL database.
PG_PASS – The password for the PostgreSQL database.
PG_HOST – The host address of the PostgreSQL server.
PG_PORT – The port number of the PostgreSQL server.

PG_HOST: str#

PG_PASS: str#

PG_PORT: int#

PG_USER: str#

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scystream.sdk.database_handling.postgres_manager.PostgresOperations[source]#

Bases: object

Class to perform PostgreSQL operations using Apache Spark.

This class provides methods to read from and write to a PostgreSQL database using JDBC and Spark’s DataFrame API. It requires a SparkSession and a PostgresConfig object or the PostgresSettings from an input or output for database connectivity.

__init__(spark: SparkSession, config: PostgresConfig | PostgresSettings)[source]#

read(database_name: str, table: str = None, query: str = None) → DataFrame[source]#

Reads data from a PostgreSQL database into a Spark DataFrame.

This method can either read data from a specified table or execute a custom SQL query to retrieve data from the database.

Parameters:

database_name – The name of the database to connect to.
table – The name of the table to read data from. Must be provided if query is not supplied. (optional)
query – A custom SQL query to run. If provided, this overrides the table parameter. (optional)

Raises:

ValueError – If neither table nor query is provided.

Returns:

A Spark DataFrame containing the result of the query or table data.

Return type:

DataFrame

write(database_name: str, table: str, dataframe, mode='overwrite')[source]#

Writes a Spark DataFrame to a specified table in a PostgreSQL database using JDBC.

This method writes the provided DataFrame to the target PostgreSQL table, with the option to specify the write mode (overwrite, append,

etc.).

Parameters:

database_name – The name of the database to connect to.
table – The name of the table where data will be written.
dataframe – The Spark DataFrame containing the data to write.
mode – The write mode. Valid options are ‘overwrite’, ‘append’, ‘ignore’, and ‘error’. Defaults to ‘overwrite’. (optional)

Note:

Ensure that the schema of the DataFrame matches the schema of the target table if the table exists.

Note:

The mode parameter controls the behavior when the table already exists.