Showing posts with label ddd. Show all posts
Showing posts with label ddd. Show all posts

2022-05-08

How to structure/layer your Golang Project (or whatever language you are using)

I've been maintaing a lot of other people's project, and see that most people blindly following a framework's structure without purpose. So I write this so that people can be convinced how to write a good directory structure or good layering on your application, especially for this case when creating a service.

It would be better to split your project to exactly 3 layers:
  1. presentation (handle only serialization/deserialization, and transport)
  2. business-logic (handles pure business logic, DTO goes here)
  3. persistence (handles records and it's persistence, DAO goes here)
It could be a package with bunch of sub-package, or per domain basis, or per user role basis. For example:

# monolith example

bin/ # should be added to .gitignore
presenter/ # de/serialization, transport, formatter goes here
  grpc.go
  restapi.go
  cli.go
  ...
businesslogic/ # pure business logic and INPUT/OUTPUT=DTO struct
  role1|role2|etc.go
  domain1|domain2|etc.go
  ...
models/ # all DAO/data access model goes here
  users.go
  schedules.go
  posts.go
  ...
deploy/ 
  Dockerfile
  docker-compose.yml
  deploy.sh
pkg/ # all 3rd party helpers/lib/wrapper goes here
  mysql/
  redpanda/
  aerospike/
  minio/

# Vertical-slice example

bin/
domain1/
  presenter1.go
  business1.go
  models1.go
domain2/
  presenter2.go
  business2.go
  models2.go
...
deploy/
pkg/

Also it's better to inject per function basis instead of whole struct/interface, something like this:

type LoginIn struct {
  Email string
  Password string
  // normally I embed CommonRequest object
}
type LoginOut struct {

  // normally I embed CommonResponse object with properties:
  SessionToken string
  Error string
  Success bool
}
type GuestDeps struct {
  GetUserByEmailPass func(string, string) (*User, error)
  // other injected dependencies, eg. S3 uploader, 3rd party libs
}
func (g *GuestDeps) Login(in *LoginIn) (out LoginOut) {
  // do validation
  // do retrieve from database
  // return proper object
}

So when you need to do testing, all you need is create a fake, either with counterfeiter (if you inject an interface instead of function) or manual, then check with autogold:

func TestLogin(t *testing.T) {
  t.Run(`fail_case`, func(t *testing.T){
    in := LoginIn{}
    deps := 
GuestDeps{
      GetUserByEmailPass: func(string,string) { return nil, errors.New("failed to retrieve from db") }
    }
    out := deps.Login(in)
    want := autogold.Want("fail_case",nil)
    want.Equal(t, out) // ^ update with go test -update 
  })
}

then on the main (real server implementation, you can put real dependency something like this:

rUser := userRepo.NewPgConn(conf.PgConnStr)
srv := httpRest.New( ... )
guest := GuestDeps{
  GetUserByEmailPass: rUser.GetUserByEmailPass,
  
DoFileUpload: s3client.DoFileUpload,
  ...
}
srv.Handle[LoginIn,LoginOut](LoginPath,guest.Login)
srv.Listen(conf.HostPort)

Why are we doing like this? because usually a framework that I ever used is either insanely overlayered or no clear separation between controller and business logic (it still handles transport, serialization, etc), and validation only happenned on outermost layer or sometimes half outside half inside the business logic (which can make it vulnerable), which when we create unit test, the programmer tempted to test whole http/grpc layer instead of pure business logic. This way we can also use another kind of serialization/transport layer without having to modify the business logic side. Imagine if you use 1 controller to handle the business logic, how much hassle it is if you have to switch framework because some reasons (framework no longer maintained, performance bottleneck in framework side, the framework doesn't provide proper middleware for some fragile telemetry, need to add other kind of serialization format or protocol, etc). But with this kind of layering, for example if we want to add grpc or json-rpc or command line or switching framework, or anything else, it's easy, just need to add a layer with proper serialization/transport then call the original business logic.
 
 
mermaid link (number 2-4 is our control)

Talk is cheap, show me the code example! gingorm1 or echogorm1 is the minimal example (you can always change the framework to fiber or default net/http or any other framework, an the orm to sqlc sqlx jet. But if you are all alone don't want to inject the database functions (which against clean-architecture, but this is most sensible way) and want to test directly against the database, you can check this example fiber1 or sveltefiber (without database). Note that those just example, I would not inject the database as a function (see fiber1 for example), I would directly depend and use dockertest for managed dependencies, and only use function injection for unmanaged dependency (3rd party). Some more complex example can be found here: street.

This is only from code maintainer's point of view, there are WORST practice that I found in the past when continuing other people's project:
  • table-based entity microservice, every table or set of tables has their own microservice, it's overly granular, where some should be coupled instead (eg. user, group, roles, permission -- this should be one service instead of 2-3 services)
  • MVC microservice, one microservice for each layer of the MVC and worse it's on different repository, eg. API service for X (V), API service for Y (V), webhook for X (C), adapter for third party Z (C/M), service for persistence (M), etc -- should be separated by domains/capability instead of by MVC layer, why? because if we implement one feature, that normally in monolith we only need to change 1 repository, but if the microservice separated by MVC layer, we have to modify 2-4 microservice when implementing 1 feature, have to start 2-4 service just to debug something, which doesn't make sense. It might make a bit sense if you are using monorepo, but without it, it's more pain than the benefit.
  • pub-sub channel-goroutine without persistence, this one is ok only if the request is discardable (all or nothing), but if it's very important (money for example) you should always persist every state, and would be better if there's a worker that progressing every state into next state so we don't have to fix manually.
There's also some GOOD things I found:
  • Start with auth, metrics, logs (show request id to user, and useful response), proper dependency injection, don't let this became too late that you have to fix everything later
  • Standard go test instead of custom test framework, because go test has proper tooling on the IDEs
  • Use worker for slow things, which is make sense, don't let user wait, unless the product guy want it all sync. Also send failures to slack or telegram or something
  • CRON pattern, runs a code every specific time, this is good when you need a scheduled task (eg. billing, reminder, etc)
  • Query-job publish task, this pattern separate CRON from time dependency (query from db, get list of items to be processed, publish to MQ), so the publish task can be triggered independently (eg. if there's bug in only 1 item), and regardless of time, and the workers will pick up any kind of work that are late.
  • Separating proxy (transport/serialization) and processor (business-logic/persistence), this have a really good benefit in terms of scalability of small requests (not for upload/download big files), where we put generic reverse proxy, push it to two-way pub-sub, then return back the response. For example, we create a rest proxy, grpc proxy, json-rpc proxy, all those 3 will push to NATS, then worker/processor will process the request and return proper response, this works like lambda, so programmer only need to focus building the worker/processor part instead of generic serialization/transport, all the generic auth, logging, request id, metrics, can be handled by the proxy, programmer only need to focus on business logic.
    the chart is something like this:

    generic proxy <--> NATS <--> worker/processor <--> databases/3rd party

    this way we can also scale independently either with monolith or microservice. Service mesh? no! service bus/star is the way :3

2021-08-06

Database Patterns of Microservices

When you need microservice? When you have multiple business domain (not domain of DNS), that are best to be splitted, managed, and deployed separately. If your business domain so small and the team is small, it's better to use modular monolith instead, since using Microservice adds a lot of operational complexity (especially if you are using Kubernetes).

These are database patterns I got from Kindson presentation.

  1. Private database per service
    this is the most common pattern, every domain/service must have their own database, this has some benefit:
    + the developers won't be tempted to join across domains that could make the codebase hard to refactor if someday need to be splitted to microservice/modular approach
    + easier for new developer that joining the team, since you don't need to know whole ER diagram, just a small segment that related to the service he/she managed
    - more complicated for analytics use case because you can't do JOIN, but this can be solved using distributed sql query engine like Trino
    + each database can scale and migrate independently (no downtimes especially when you are using database that require locking on migration like MySQL)
    - this causes another problem for accessing different domain that could be solved by:
       * api gateway (sync): service must hit other service thru API gateway
       * event hub/pubsub (async/push): service must subscribe other services' event to retrieve the data, which causes another consistency-related problem
       * service mesh (sync): service must hit other service thru sidecar
       * directly read a replica (async/pull)
  2. Shared database 
    all or some microservice accessing the same database, this have some pros and cons:
    + simpler for the developers, since they can do JOIN and transaction
    - worst kind of performance when the bottleneck is the database especially on migration and scaling
    + no consistency problem
  3. SAGA (sequence of local transaction)
    this have pros and cons:
    + split an atomic transaction into multiple steps that when one of the steps are failed, must reconcile/undo/create a compensation action
    - more complex than normal database transaction
  4. API Composition (join inside the API service, the pattern used in Trino)
    this have pros and cons:
    + can do join across services and datasource
    - must hit multiple services (slower than normal join) if the count are large
    - can be bad if the other service calling another service too (cascading N+1 queries), eg. A hit B, B hit C, but this can be solved if B and C have batch APIs (that usually using WHERE IN, instead of single API)
  5. CQRS (Command Query Responsibility Segregation)
    a pattern that created because old databases usually single master, multiple slave, but this also have a good benefit and cons:
    + simpler scaling, either need to scale the write or just scale the read
    - possible inconsistency problem if not reading from the master for transaction which adds complexity on development (which one must read the master, which one can read the readonly replica)
  6. Domain Events 
    service must publish events, have pros and cons:
    + decoupling, no more service mesh/hitting other services, we just need to subscribe the events
    - eventual consistency
    - must store and publish events that probably never need to be consumed, but we can use directly read event database of that service to overcome this, but this can be also a benefit since events helps the auditing
  7. Event Sourcing
    this pattern create a snapshot to reconstruct final state from series of events, have pros and cons:
    + can be used for reliably publish an event when state changed
    + good for auditing
    + theoritically easier to track when business logic changed (but we must build the full DFA/NFA state graph to reliably consider the edge cases)
    - difficult to query since it's series of events, unless you are prioritizing the snapshot

Engineering is about making decision and prioritization, which simplicity that needs to be prioritized, either maintainability, raw performance, ease of scaling, or other metrics, you'll need to define your own "best". And there's no silver bullet, each solution are best only for specific use case. 

But.. if you have to create a ultimate/general purpose service which patter you would use?
Based on experience, I'll use:
  1. CQRS (1 writer, 1-N reader)
    reader must cache the reads
    writer must log each changes (Domain Events)
    writer can be sync or async (for slow computation, or those that depends on another service/SAGA) and update snapshot each time
  2. Domain Events
    this logs of events can be tailed (async/pull) from another service if they need it
    they must record their own bookmark (of which events already tailed/consumed/ack'ed, which hasn't)
    EDIT 2021-08-16: this guy has the same idea, except that I believe there's should be no circular dependency (eg, his order-inventory system should be composed from order, inventory, and delivery)
  3. API Composition
    but we can use JOIN inside the domain
    especially for statistics, we must collect each services statistics for example
    API have 2 version: consistent version (read from master), eventual consistency version (read from readonly replica)
    API must have batch/paged version other than standard CRUD
    API must tell whether that API depends on another service's APIs
  4. Private database is a must
    since bottleneck are mostly always the database, it'll be better to split databases by domains from beginning (but no need to create a readonly replicata until the read part became the bottleneck)
    If the write more than 200K-600K rps i prefer manual partitioning instead of sharding (unless the database I use support sharding, automatic rebalancing, super-easy to add a new node with Tarantool-like performance)
    What if you need joins for analytics reasons? You can use Trino/Presto/BigQuery/etc, or just delegate the statistics responsibility to each services then collect/aggregate from statistics/collector service.