scystream.sdk.database_handling package#

Submodules#

scystream.sdk.database_handling.postgres_manager module#

class scystream.sdk.database_handling.postgres_manager.PostgresConfig[source]#

Bases: BaseModel

Configuration class for PostgreSQL connection details.

This class holds the necessary configuration parameters to connect to a PostgreSQL database. It includes the database user, password, host, and port.

Parameters:
  • PG_USER – The username for the PostgreSQL database.

  • PG_PASS – The password for the PostgreSQL database.

  • PG_HOST – The host address of the PostgreSQL server.

  • PG_PORT – The port number of the PostgreSQL server.

PG_HOST: str#
PG_PASS: str#
PG_PORT: int#
PG_USER: str#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scystream.sdk.database_handling.postgres_manager.PostgresOperations[source]#

Bases: object

Class to perform PostgreSQL operations using Apache Spark.

This class provides methods to read from and write to a PostgreSQL database using JDBC and Spark’s DataFrame API. It requires a SparkSession and a PostgresConfig object or the PostgresSettings from an input or output for database connectivity.

__init__(spark: SparkSession, config: PostgresConfig | PostgresSettings)[source]#
read(database_name: str, table: str = None, query: str = None) DataFrame[source]#

Reads data from a PostgreSQL database into a Spark DataFrame.

This method can either read data from a specified table or execute a custom SQL query to retrieve data from the database.

Parameters:
  • database_name – The name of the database to connect to.

  • table – The name of the table to read data from. Must be provided if query is not supplied. (optional)

  • query – A custom SQL query to run. If provided, this overrides the table parameter. (optional)

Raises:

ValueError – If neither table nor query is provided.

Returns:

A Spark DataFrame containing the result of the query or table data.

Return type:

DataFrame

write(database_name: str, table: str, dataframe, mode='overwrite')[source]#

Writes a Spark DataFrame to a specified table in a PostgreSQL database using JDBC.

This method writes the provided DataFrame to the target PostgreSQL table, with the option to specify the write mode (overwrite, append,

etc.).

Parameters:
  • database_name – The name of the database to connect to.

  • table – The name of the table where data will be written.

  • dataframe – The Spark DataFrame containing the data to write.

  • mode – The write mode. Valid options are ‘overwrite’, ‘append’, ‘ignore’, and ‘error’. Defaults to ‘overwrite’. (optional)

Note:

Ensure that the schema of the DataFrame matches the schema of the target table if the table exists.

Note:

The mode parameter controls the behavior when the table already exists.