Python Development Best Practices — The Overload decorator use case
Python has emerged as a powerful programming language suitable for various applications. One of the factors contributing to Python’s versatility is its extensive collection of libraries and SDKs. These resources enable the development of fast and scalable tools for data analysis and data science. Moreover, Python offers a wide range of built-in and third-party libraries, catering to other domains such as traditional software development, including the construction of SDKs, APIs, database connections, and event-driven applications. In this post, I will explore a fascinating tool from the Python toolkit that I discovered while interacting with the OpenAI Python SDK, specifically the AzureOpenAI Class. However, let’s begin by addressing a common issue encountered in software development.
Motivation with a simple case
Suppose that you are developing a library in Python. One of the functionalities that you should implement is written inside a method (function) called process_data. The code snippet below shows the behavior of the function, where we change the behavior based on the type of the input value of the function.
# Code Snippet 1
def process_data(data):
if isinstance(data, str):
# Implementation for str type input
return "Processed " + data
elif isinstance(data, int):
# Implementation for int type input
return data + 10
else:
raise TypeError("Invalid data type")
# Function call with str type parameter and return type
result1 = process_data("Hello")
print(result1) # Output: Processed Hello
# Function call with int type parameter and return type
result2 = process_data(5)
print(result2) # Output: 15
The code above has the description of each step, as well as the expected result commented right next to the corresponding print statements. We can see that if the input type is a string, we will print a given phrase. If it’s an integer number, we will sum 10 to the original input and print the resulting value in the screen.
In practical terms, when developing applications intended for maintenance, updates, and serving as foundational code for other projects, it is important to prioritize the experience of other developers using our package. By ensuring a seamless experience, we can expedite software development and enhance code comprehension. This is where we generally use the PEPs principles and enhancements.
Suppose that we want to follow a very used practice in developing our application, and we will add type hints (see also PEP 484 for more information about this best practice). We would then use tools like mypy to perform static type checking in our code. In the example of code snippet 1 above, doing “mypy example.py” would lead to no errors, since we don’t annotate any of the types up to now. Let us introduce some type annotations below.
# Code Snippet 2
from typing import Union
def process_data(data: Union[str,int]):
if isinstance(data, str):
# Implementation for str type input
return "Processed " + data
elif isinstance(data, int):
# Implementation for int type input
return data + 10
else:
raise TypeError("Invalid data type")
# Function call with str type parameter and return type
result1 = process_data("Hello")
print(result1) # Output: Processed Hello
# Function call with int type parameter and return type
result2 = process_data(5)
print(result2) # Output: 15
# Function call with unsupported type parameter
result3 = process_data(3.14) # Raises TypeError
Try running the snippet above, and also run “mypy example.py”. You will see that, due to our explicit type annotation in data input, we will get an error in mypy. This is caused by the fact that we are explicitly saying that data is either a string or an integer, and then we pass a float in the last example of the function call. If we had left the data argument of process_data without an explicit type annotation (like in snippet 1), we would have no problems with mypy here.
Notice that the fact that we introduced type annotations increased the “complexity” of our function definition (this example is very simple, but imagine an application where each method can have 8–9 arguments and each one could potentially have more than one type). One of the use cases of the overload decorator is precisely to help ORGANIZE and REFACTOR the code in this case. Notice that we emphasized the organization and refactoring aspects (overload decorator does not actually change any behavior of functions at runtime, but rather helps us read and understand Python code faster). Let us implement the corresponding overloads to make our process_data function to pass the mypy type checking procedure.
# Code Snippet 3
from typing import overload
@overload
def process_data(data: str) -> str: ...
@overload
def process_data(data: int) -> int: ...
@overload
def process_data(data: float) -> TypeError: ...
def process_data(data):
if isinstance(data, str):
# Implementation for str type input
return "Processed " + data
elif isinstance(data, int):
# Implementation for int type input
return data + 10
else:
raise TypeError("Invalid data type")
# Function call with str type parameter and return type
result1 = process_data("Hello")
print(result1) # Output: Processed Hello
# Function call with int type parameter and return type
result2 = process_data(5)
print(result2) # Output: 15
# Function call with unsupported type parameter
result3 = process_data(3.14) # Raises TypeError
In code snippet 3 above, notice how we have a more verbose file for implementing the process_data with overloads. For each type, we implemented a class with an ellipsis (meaning that we should basically pass the class and do nothing), and then we created a final “catch-all cases” implementation of process_data. In this function, we put all the logic that should be used in each different case of the data type. This is the procedure generally implemented for working with overload. We define one different method (decorated with the overload decorator from the typing library), and finally we implement a more concrete method for dealing with all the cases (the “catch-all” method). Notice that this last one is not decorated with overload. You can test the working of this implementation by both executing the Python script and verifying the outputs, and by using mypy to perform static type checking.
Now let us look at a more fun example of where the overload decorator is used.
Overloading with OpenAI class implementations
For the following example, we analyse the source code of the openai Python package, more specifically the AzureOpenAI class of such library. The version used to explore the source code was 1.12.0 of the openai Python library / SDK.
When using OpenAI-related models and deployments with the Azure OpenAI resource, we use a specific class adapted to the Azure OpenAI provider, instead of the general openai class. This class is called AzureOpenAI. Look at the code snippet below for a detailed look on the implementation of this class (the code was adapted to contain more comments, to help us proceed with the discussion).
class AzureOpenAI(BaseAzureClient[httpx.Client, Stream[Any]], OpenAI):
# First overload, used for the initialization of the object of the class
@overload
def __init__(
self,
*,
azure_endpoint: str,
azure_deployment: str | None = None,
api_version: str | None = None,
api_key: str | None = None,
azure_ad_token: str | None = None,
azure_ad_token_provider: AzureADTokenProvider | None = None,
organization: str | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
max_retries: int = DEFAULT_MAX_RETRIES,
default_headers: Mapping[str, str] | None = None,
default_query: Mapping[str, object] | None = None,
http_client: httpx.Client | None = None,
_strict_response_validation: bool = False,
) -> None:
...
# Second overload, see that the arguments themselves are different here
@overload
def __init__(
self,
*,
azure_deployment: str | None = None,
api_version: str | None = None,
api_key: str | None = None,
azure_ad_token: str | None = None,
azure_ad_token_provider: AzureADTokenProvider | None = None,
organization: str | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
max_retries: int = DEFAULT_MAX_RETRIES,
default_headers: Mapping[str, str] | None = None,
default_query: Mapping[str, object] | None = None,
http_client: httpx.Client | None = None,
_strict_response_validation: bool = False,
) -> None:
...
# Third overload, once again we have different initialization arguments
@overload
def __init__(
self,
*,
base_url: str,
api_version: str | None = None,
api_key: str | None = None,
azure_ad_token: str | None = None,
azure_ad_token_provider: AzureADTokenProvider | None = None,
organization: str | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
max_retries: int = DEFAULT_MAX_RETRIES,
default_headers: Mapping[str, str] | None = None,
default_query: Mapping[str, object] | None = None,
http_client: httpx.Client | None = None,
_strict_response_validation: bool = False,
) -> None:
...
# The catch-all cases initialization method, which will handle
# the logic of all possible cases indicated by the three overloads above
def __init__(
self,
*,
api_version: str | None = None,
azure_endpoint: str | None = None,
azure_deployment: str | None = None,
api_key: str | None = None,
azure_ad_token: str | None = None,
azure_ad_token_provider: AzureADTokenProvider | None = None,
organization: str | None = None,
base_url: str | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
max_retries: int = DEFAULT_MAX_RETRIES,
default_headers: Mapping[str, str] | None = None,
default_query: Mapping[str, object] | None = None,
http_client: httpx.Client | None = None,
_strict_response_validation: bool = False,
) -> None:
# Let us ignore this concrete implementation since we won't use
# it here. See the original library code in your local environment
# for more details about this part. Code should be inside
# venv/lib/python3.10/site-packages/openai/lib/azure.py, in case your
# virtual environment name is venv.
Notice, in the code above, how we have different arguments for the init function, in each one of the three overloads used. The url / deployment / endpoint variables are the ones that vary the most. We can thus easily see that the implementation of the method itself varies depending on the arguments we provide to the object, at initialization time. We could, as we stated in the previous section, use just one init method and then do all the conditionals to deal with the logical layer inside it, without using any overload decorators. However, in this case, we would spend more time analysing code logic to find out that there is actually a different implementation, depending on the parameters passed to the object at class instantiation time. This is something that is visible at a glance now, thanks to the overload decorator and the three overloaded init methods used above.
In this post, we investigated the possibilities of use cases for the Python typing overload decorator. We saw how it can be used to improve code readability and also adapt existing code to be used with static type checking tools like mypy, without the need to use huge “Union” statements on class and method type annotations. The contents of this post were inspired in personal studies of the openai Python library, especially the Azure OpenAI class adapter. Some additional references include:
https://pub.towardsai.net/the-python-decorator-that-supercharges-developer-experience-78b3fe7f1682