Chinese version: here

I’m not going to discuss how to use monads.
If you want to know how they’re used, you should know its various instances first, such as List, Maybe, Writer, State, etc.
What am I going to talk about in this article then?
You should know there are 3 coherence laws since you knew what monads are, but they might seem like black magic for most of you (including myself).
What I’m going to do is to show that those 3 laws are “obvious” in some sense without actually teaching you the whole category theory behind it.

The first problem is that there are several different but equivalent kinds of definition of monads, and Haskell adopts the weirdest one.
I’m going to introduce an alternative definition that appears more commonly in category theory.
Let me mention that there’s such a function in the module Control.Monad:

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)

Why mentioning this operator?
The (simplified) definition of the Monad class in Haskell is:

class Monad m where
  return :: a -> m a
  (>>=)  :: m a -> (a -> m b) -> m b

But the below definition is equivalent to the above.

class Monad m where
  return :: a -> m a
  (>=>)  :: (a -> m b) -> (b -> m c) -> (a -> m c)

The truth that both of them can define one another is left as an exercise to the reader.

The 3 laws according to the latter definition would be:

return >=> f = f
f >=> return = f
(f >=> g) >=> h = f >=> (g >=> h)

The spoiler is that the 3 laws below are those for monoids.

mempty <> x = x
x <> mempty = x
(x <> y) <> z = x <> (y <> z)

Why mentioning monoids?
You can see that the laws for >=> are actually the same as those for monoids!
Turn f, g, h into a, b, c, resp., return into mempty,and >=> into <> then you get the laws for monoids.
The first law is called the coherence of left identity, meaning adding nothingness to the left doesn’t change the result.
The second is the one for right identity, meaning adding to the right also does nothing.
The final one is the law of associativity, which says that the order of applying the operation is not important, and is a generalization of the associativity for the sum and the product you learned in elementary school.

The only difference is that a <> b is always defined for all monoids a and b.
On the other hand, the arguments of >=> are not simple types such as Int or [Bool] but a function type a -> m b.
They have some internal structure namely a and b.
>=> is not always defined if you pick one a -> m b and one c -> m d randomly
The compiler could simply pop out a compile error.
In fact, it only compiles when b ~ c.
So return and >=> does not form a monoid, but something that is more strongly-typed, in that the return type of the first function argument have to do something with the input type of the second function argument.
And those “strongly-typed” monoids actually have a name in math, namely, Categories.

Now introducing the definition the mathmaticians love even more but might be harder for dummies.
I assume you already knew what Functors are.
You probably also knew that all Monads are Functors
I’m going to define Monads as Functors with extra constraints below:

class Functor m => Monad m where
  return :: a -> m a
  join   :: m (m a) -> m a

How can you derive the previous 2 kinds of definition from this one?
If you have something of type a -> f b and something of type b -> f c in which f is just a Functor, after applying with first function, you can map the elements of the result of it with the second function.
You would get an f (f c) from an a then.
The problem is that repeating concatenation makes the result type more nested (more layers of f), and it really isn’t convenient for us programmers.
Can’t I collapse them into one layer?
Of course you can with the operation join because that’s exactly what join does!

Now there’s a trick notation-wise.
We can just look at the Functors of kind * -> * themselves and ignore their underlying type arguments.
For the sake of brevity, I use some strange syntax in which the type of join is written mm -> m.
it merges 2 sibling (actually nested) Monads into one.
But how do I write the type of return in that strange kind of syntax?
You should know there’s a Functor that does nothing called Identity, and Identity a is equivalent to a because Identity is just an empty effect.
I abbreviate Identity as I below.
The type of return is then equivalent to I a -> m a.
Using that strange notation, it’s I -> m.

It’s time for introducing Monads again.
They are Functors equipped with return and join, satisfying the following 3 laws, respectively:

  1. I can add an I to the left of an m (actually wrapping the m inside), and then transform it into an m using return.
    How is it possible?
    Thinking in the way of the strange syntax, return has type I -> m, and what it does is exactly transforming an I into an m!
    Now we have 2 layers of ms, that is, mm.
    I can join them together, transforming that into m again.
    The law is that transforming back and forth is the same as doing nothing.

  2. I can add an I to the right of an m.
    Thinking in the Haskell way, you just map elements of type a inside m to the ones of type I a.
    Thinking in the strange syntax, you simply add an adjacent I on the right.
    I can turn mI into mm again using return, which should also be the same as doing nothing after joining it into a single m.

  3. If there are 3 layers of m, that is, mmm, there are 2 different ways to join them twice into an m.
    One joins the left 2 ms first, the other joins the right ones first.
    Nonetheless, ordering of the operations shouldn’t affect the result.

But they are the 3 laws of Monoids!
The first is the left identity: adding nothingness to the left doesn’t change anything.
The second is the right identity: you can also add nothingness to the right.
Finally the associative law, the order of doing operations isn’t important.
Doesn’t it seem obvious enough now?

Therefore,

A monad is just a monoid in the category of endofunctors, what’s the problem?