scystream.sdk.config package#

class scystream.sdk.config.SDKConfig[source]#

Bases: object

Singleton class that holds the SDK configuration.

This class manages the configuration for the SDK, primarily the path to the configuration file (cbc.yaml), the application name, and the Spark master URL for ComputeBlock communication.

Parameters:
  • app_name – The name of the application (default: ‘unnamed_compute_block’).

  • cb_spark_master – The URL of the Spark master (default: ‘spark://spark-master:7077’).

static __new__(cls, app_name: str = 'unnamed_compute_block', cb_spark_master: str = 'spark://spark-master:7077')[source]#

Creates or returns the singleton instance of SDKConfig.

Parameters:
  • app_name – The name of the application.

  • cb_spark_master – The URL of the Spark master.

Returns:

The singleton SDKConfig instance.

get_cb_spark_master() str[source]#

Get the Spark master URL.

Returns:

The Spark master URL.

set_cb_spark_master(spark_master: str) str[source]#

Set the Spark master URL.

Parameters:

spark_master – The spark master URL with this schema: spark://url:port

scystream.sdk.config.get_compute_block() ComputeBlock[source]#

Converts Entrypoints & Settings defined in the Code to a ComputeBlock instance.

scystream.sdk.config.load_config(path_to_cfg: str | None) ComputeBlock[source]#

Loads and validates configuration from a YAML file.

If no path is provided, attempts to load from the default location in the current working directory.

Parameters:

path_to_cfg – Optional path to the configuration YAML file.

Returns:

A ComputeBlock instance if the YAML is valid.

Raises:

ValueError – If the config file is missing, invalid, or fails validation.

scystream.sdk.config.validate_config_with_code(entrypoint_name: str | None = None, config_path: str | None = None)[source]#

Validates that the configuration loaded from the YAML file matches the code-defined configuration for the ComputeBlock.

Parameters:

entrypoint_name – Optional name of an entrypoint to validate. If provided, it will validate the specific entrypoint instead of the entire ComputeBlock configuration.

Raises:

ValueError – If the configurations do not match.

Submodules#

scystream.sdk.config.compute_block_utils module#

scystream.sdk.config.compute_block_utils.get_compute_block() ComputeBlock[source]#

Converts Entrypoints & Settings defined in the Code to a ComputeBlock instance.

scystream.sdk.config.config_loader module#

class scystream.sdk.config.config_loader.SDKConfig[source]#

Bases: object

Singleton class that holds the SDK configuration.

This class manages the configuration for the SDK, primarily the path to the configuration file (cbc.yaml), the application name, and the Spark master URL for ComputeBlock communication.

Parameters:
  • app_name – The name of the application (default: ‘unnamed_compute_block’).

  • cb_spark_master – The URL of the Spark master (default: ‘spark://spark-master:7077’).

static __new__(cls, app_name: str = 'unnamed_compute_block', cb_spark_master: str = 'spark://spark-master:7077')[source]#

Creates or returns the singleton instance of SDKConfig.

Parameters:
  • app_name – The name of the application.

  • cb_spark_master – The URL of the Spark master.

Returns:

The singleton SDKConfig instance.

get_cb_spark_master() str[source]#

Get the Spark master URL.

Returns:

The Spark master URL.

set_cb_spark_master(spark_master: str) str[source]#

Set the Spark master URL.

Parameters:

spark_master – The spark master URL with this schema: spark://url:port

scystream.sdk.config.config_loader.generate_config_from_compute_block(compute_block: ComputeBlock, output_path: Path)[source]#

Generates a YAML configuration file from a ComputeBlock instance.

Make sure to edit the generated yaml accordingly.

Parameters:
  • compute_block – The ComputeBlock instance to generate the configuration from.

  • output_path – The path where the YAML configuration file should be saved.

scystream.sdk.config.config_loader.load_config(path_to_cfg: str | None) ComputeBlock[source]#

Loads and validates configuration from a YAML file.

If no path is provided, attempts to load from the default location in the current working directory.

Parameters:

path_to_cfg – Optional path to the configuration YAML file.

Returns:

A ComputeBlock instance if the YAML is valid.

Raises:

ValueError – If the config file is missing, invalid, or fails validation.

scystream.sdk.config.config_loader.validate_config_with_code(entrypoint_name: str | None = None, config_path: str | None = None)[source]#

Validates that the configuration loaded from the YAML file matches the code-defined configuration for the ComputeBlock.

Parameters:

entrypoint_name – Optional name of an entrypoint to validate. If provided, it will validate the specific entrypoint instead of the entire ComputeBlock configuration.

Raises:

ValueError – If the configurations do not match.

scystream.sdk.config.entrypoints module#

scystream.sdk.config.entrypoints.TEST_reset_registered_functions()[source]#
scystream.sdk.config.entrypoints.get_registered_functions()[source]#
scystream.sdk.config.entrypoints.register_entrypoint(func_name, func, settings_class)[source]#

scystream.sdk.config.models module#

class scystream.sdk.config.models.ComputeBlock[source]#

Bases: BaseModel

Represents a ComputeBlock configuration, which describes the compute process, including entrypoints, inputs, and outputs.

A ComputeBlock is defined by:

  • A name, description, and author.

  • One or more entrypoints that specify how data is passed into and out of the compute process.

  • Optionally, a Docker image to specify the execution environment.

At least one entrypoint must be defined for the ComputeBlock to be valid.

Parameters:
  • name – The name of the ComputeBlock.

  • description – A description of the ComputeBlock.

  • author – The author of the ComputeBlock.

  • entrypoints – A dictionary of entrypoints.

  • docker_image – The Docker image for the execution environment, if any.

author: Annotated[str, Strict(strict=True)]#
classmethod check_entrypoints(v)[source]#

Validates that at least one entrypoint is defined for the ComputeBlock.

Raises:

ValueError: If no entrypoints are defined.

description: Annotated[str, Strict(strict=True)]#
docker_image: Annotated[str, Strict(strict=True)] | None#
entrypoints: Dict[Annotated[str, Strict(strict=True)], Entrypoint]#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: Annotated[str, Strict(strict=True)]#
class scystream.sdk.config.models.Entrypoint[source]#

Bases: BaseModel

Represents an entrypoint within a ComputeBlock.

An entrypoint includes: - A description of the entrypoint’s purpose.

  • A dictionary of environment variables (envs), where each key-value pair represents an environment variable and its default value.

  • These variables should be shared across the entrypoint.

  • Input and output configurations, each described by the InputOutputModel.

If an environment variable’s value is set to None in the configuration, the ComputeBlock user must provide that variable during runtime, or else the process will fail.

Parameters:
  • description – A description of the entrypoint.

  • envs – A dictionary of environment variables w their default values.

  • inputs – A dictionary of input configurations.

  • outputs – A dictionary of output configurations.

description: Annotated[str, Strict(strict=True)]#
envs: Dict[Annotated[str, Strict(strict=True)], Annotated[str, Strict(strict=True)] | Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)] | List | bool | None] | None#
inputs: Dict[Annotated[str, Strict(strict=True)], InputOutputModel] | None#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

outputs: Dict[Annotated[str, Strict(strict=True)], InputOutputModel] | None#
class scystream.sdk.config.models.InputOutputModel[source]#

Bases: BaseModel

Represents configuration for inputs or outputs in a ComputeBlock.

The configuration is defined as a dictionary with key-value pairs, where:

  • The key is the name of an environment variable (e.g., FILE_PATH, TABLE_NAME).

  • The value is the default value for that environment variable, which can be a string, integer, or float.

If a value is explicitly set to null, validation will fail unless the environment variable is manually set by the ComputeBlock user.

Parameters:
  • type – Type of the I/O (file, db_table, TODO: SetType).

  • description – A description of the I/O.

  • config – A dictionary of configuration settings for the I/O, such as file path, table name, etc.

config: Dict[Annotated[str, Strict(strict=True)], Annotated[str, Strict(strict=True)] | Annotated[int, Strict(strict=True)] | Annotated[float, Strict(strict=True)] | List | bool | None] | None#
description: Annotated[str, Strict(strict=True)] | None#
model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['file', 'pg_table', 'custom']#