Implementing the Repository pattern (part 1)

Repository pattern - why do we need it?

The main goal of the repository pattern is to abstract the logic used for retrieving (loading) and persisting (saving) entities in external storage. The interface of a repository usually resembles the ones used by typical in-memory data collection (i.e. a dict, or a set). It hides all the implementation details related to data loading, sessions, transactions, etc. From a consumer point of view, the typical usage of a repository within a domain logic would look like this:

# somewhere in a domain layer

# 1. retrieve an entity from a repository
entity = repositoty[entity_id]

# 2. somehow change the state of an entity
entity.foobar()

# 3. eventually persist all the changes that were made
repository.persist()

At this point, you may ask: we already have ORMs, why would we need repositories? First of all, ORM models are usually tied to a single database. In the case of the repository, it doesn't matter if there is a text file, MySQL, MongoDB, or an external API under the hood - the interface stays the same. Secondly, repositories work with domain entities, which are infrastructure (i.e. database) agnostic. ORM model instances cannot exist without a database and are expressed in terms of database field types, so you lose all the benefits of having custom data types and structures (such as value objects). A repository will most likely use a data mapping layer, which is responsible for translating a persisted state (rows in a database) into the actual entity class instance. Finally, new persistence mechanisms are likely to be invented in the future, and you don't want to change the core of your system (business logic) only if you decide to use some fancy new storage mechanism.

Here are some additional key takeaways from an excellent article by Microsoft:

  • For each aggregate or aggregate root, you should create one repository class.

  • The only channel you should use to update the database should be the repositories, as aggregate root, which controls the aggregate's invariants and transactional consistency.

  • It's okay to query the database through other channels because queries don't change the state of the database.

This makes a lot of sense if you are following CQRS: use repositories to retrieve entities and act upon them. You don't need a repository to simply display a data table on a screen - in such a case a raw SQL query could do the job.

The implementation of a Repository Pattern (file-based)

Enough theory, let's apply the repository pattern to our examplar bidding application. Somewhere in the domain layer of the application, we have place_bid_on_listing_use_case service, which handles the bid placement use case. Note that all the types given in the function parameters are a part of a business layer (note that Bid and Money are value objects we discussed earlier).

from domain.value_objects import ListingId, BidderId, Money, Bid
from domain.repositories import ListingRepository


def place_bid_on_listing_use_case(listing_id: ListingId, bidder_id: BidderId,
                                  price: Money,
                                  listing_repository: ListingRepository):
    listing = listing_repository[listing_id]
    bid = Bid(bidder_id, price)
    listing.place_bid(bid)

From this perspective, a listing_reposioty looks like an in-memory dictionary holding items for sale, while its entries consist of (ListingId, Listing) value pairs. The ListingRepository is merely an interface defined in the domain layer, as we do not want to have any implementation details in the domain:

import abc
from domain.value_objects import ListingId
from domain.entities import Listing


class ListingRepository(metaclass=abc.ABCMeta):
    """An interface to listing repository"""

    @abc.abstractmethod
    def get_by_id(id: ListingId) -> Listing:
        ...


    def __getitem__(self, index) -> Listing:
        return self.get_by_id(index)

The actual implementation of the repository resides in the infrastructure layer. The following implementation holds all entities in memory (which is not really useful in real life, but nicely demonstrates the concept of having an interface and implementation living in different layers of the application):

```python
from domain.repositories import ListingRepository
from domain.entities import Listing


class InMemoryListingRepository(ListingRepository):
    """ In-memory implementation of Listing Repository. Feel free to use it in unit tests."""
    def __init__(self, items=None):
        if items is None:
            self.items = dict()
        self.items = items

    def get_by_id(self, id):
        return self.items[id]

Now let's get back to the place_bid_on_listing_use_case function. As you can see, the is no explicit save() being called at the end of the function, as all in-memory data structures do not need it. However, in real life, we still want to persist the changes we've made to the listing, in most cases we would use the database to do this job. Let's start with the simplest scenario - instead of using a database, we will use a file and a pickle library. We will use a context manager to handle loading and saving all the changes made to the entities stored in the repository. This is how the actual entry point for the use case could look like:

# somewhere in the application layer...

from domain.use_cases import place_bid_on_listing_use_case
from application.factories import in_memory_listing_repository

# use case arguments
listing_id = 1
bidder_id = ...
price = ...

# use case execution within a repository context
with in_memory_listing_repository('listings.pickle') as repository:
    place_bid_on_listing_use_case(listing_id, bidder_id, price, repository)

Such an entry point logically belongs to the application layer, and in the case of a web app would be called by a route handler. The in_memory_listing_repository is a factory function that coordinates the creation of a repository through a context manager. Its initial state is loaded from a pickle file before executing the business logic and persisted to the same file afterward:

from contextlib import contextmanager
import pickle
from infrastructure.repositories import InMemoryListingRepository


@contextmanager
def in_memory_listing_repository(filename):
    """ A factory function which returns a listing repository """
    try:
        with open(filename,'rb') as file:
            print(f"Loading repository state from {filename}")
            items = pickle.load(file)
    except FileNotFoundError:
        # populate the repository with dummy data
        items = {
            1: Listing(id=1)
        }

    repository = InMemoryListingRepository(items)

    yield repository

    print(f"Saving repository state to {filename}")
    with open(filename,'wb') as file:
        items = pickle.dump(repository.items, file)

That's it for today. Check out this repl If you want to see the running code from this post. In Part 2 of this article, we will use another implementation of a repository that uses a database (SQLAlchemy). The only required change would be to replace:

with in_memory_listing_repository('listings.pickle') as repository:

with:

with sql_listing_repository('sqlite:///user:password@host/database') as repository:

That's the beauty of repositories. Stay tuned and feel free to leave a comment!