# Bread Production

You are a baker, you choose how many breads you produce. 
Each bread you sell yields a gain of $g$, while each bread produced but not sold yields a loss of $l$.
The demand for each day is an independent random draw of a distribution D.

In [1]:
import numpy as np
#import matplotlib.pyplot as plt #%matplotlib inline

## When the law of the demand is known

In [2]:
g = 1 # gain per unit sold
l = 0.5 # loss per unit produced but not sold
demandDist = lambda: np.random.geometric(0.1) # random demand

In [3]:
class knownDemandAgent:
    
    def __init__(self, g, l, demandDist):
        self.m = np.mean([demandDist() for k in range(1000)])

    def receiveReward(self, reward):
        pass
    
    def decisionRule(self):
        return self.m

myAgent = knownDemandAgent(g, l, demandDist)

In [4]:
def evaluatePolicy(T, N, agent, g, l, demandDist):
    res = []
    for _ in range(N):
        s = 0
        for t in range(T):
            prod = agent.decisionRule()
            demand = demandDist()
            reward = g * min(prod, demand) - l * max(0, prod - demand)
            s += reward
            agent.receiveReward(reward)
        res.append(s)
    return np.mean(res)
            

In [5]:
T = 365
N = 100
evaluatePolicy(T, N, myAgent, g, l, demandDist)

1736.0540249999967

## When the law of the demand is unknown

In [6]:
g = 1 # gain per unit sold
l = 0.5 # loss per unit produced but not sold

In [11]:
class unknownDemandAgent:
    
    def __init__(self, g, l):
        self.g = g
        self.l = l 
        self.n = 0 # number of days observed
        self.s = 0 # sum of the demands so far
        self.lastProd = 0
        
    def receiveReward(self, reward):
        self.n += 1
        if reward == self.g * self.lastProd:
            self.s += self.lastProd + 1
        else: 
            self.s += (reward + self.l * self.lastProd) / (self.g + self.l)
        
    def decisionRule(self):
        if self.n > 0:
            self.lastProd = self.s / self.n
        else:
            self.lastProd = 10 #default for the first day
        return self.lastProd

myAgent = unknownDemandAgent(g, l)

In [12]:
T = 365
N = 100
evaluatePolicy(T, N, myAgent, g, l, demandDist)

1212.9784500404169

## Evaluation: 

upload a file <yourname.py> into https://plmbox.math.cnrs.fr/u/d/524d0632b3ae4b42b13a/  containing:
  
  * a class "yourname_K" (replace "yourname" with your name...) implementing the agent with known demand
  * a class "yourname_U" implementing the agent with unknown demand