scystream.sdk.database_handling package#
Submodules#
scystream.sdk.database_handling.postgres_manager module#
- class scystream.sdk.database_handling.postgres_manager.PostgresConfig[source]#
Bases:
BaseModel
Configuration class for PostgreSQL connection details.
This class holds the necessary configuration parameters to connect to a PostgreSQL database. It includes the database user, password, host, and port.
- Parameters:
PG_USER – The username for the PostgreSQL database.
PG_PASS – The password for the PostgreSQL database.
PG_HOST – The host address of the PostgreSQL server.
PG_PORT – The port number of the PostgreSQL server.
- PG_HOST: str#
- PG_PASS: str#
- PG_PORT: int#
- PG_USER: str#
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class scystream.sdk.database_handling.postgres_manager.PostgresOperations[source]#
Bases:
object
Class to perform PostgreSQL operations using Apache Spark.
This class provides methods to read from and write to a PostgreSQL database using JDBC and Spark’s DataFrame API. It requires a SparkSession and a PostgresConfig object or the PostgresSettings from an input or output for database connectivity.
- __init__(spark: SparkSession, config: PostgresConfig | PostgresSettings)[source]#
- read(database_name: str, table: str = None, query: str = None) DataFrame [source]#
Reads data from a PostgreSQL database into a Spark DataFrame.
This method can either read data from a specified table or execute a custom SQL query to retrieve data from the database.
- Parameters:
database_name – The name of the database to connect to.
table – The name of the table to read data from. Must be provided if query is not supplied. (optional)
query – A custom SQL query to run. If provided, this overrides the table parameter. (optional)
- Raises:
ValueError – If neither table nor query is provided.
- Returns:
A Spark DataFrame containing the result of the query or table data.
- Return type:
DataFrame
- write(database_name: str, table: str, dataframe, mode='overwrite')[source]#
Writes a Spark DataFrame to a specified table in a PostgreSQL database using JDBC.
This method writes the provided DataFrame to the target PostgreSQL table, with the option to specify the write mode (overwrite, append,
etc.).
- Parameters:
database_name – The name of the database to connect to.
table – The name of the table where data will be written.
dataframe – The Spark DataFrame containing the data to write.
mode – The write mode. Valid options are ‘overwrite’, ‘append’, ‘ignore’, and ‘error’. Defaults to ‘overwrite’. (optional)
- Note:
Ensure that the schema of the DataFrame matches the schema of the target table if the table exists.
- Note:
The mode parameter controls the behavior when the table already exists.