We were settling into the Christmas break, I realized something: There will be no daycare until January! While this on its own is not so much of a problem, but we were both looking forward to some quiet time. However, it got worse. But before it got worse it got better because the first couple of days that our son Linus was home with us, he slept in. So we got to sleep in as well, which was needed the first couple of days. Then I started to wake up around 7:30 or so and get up. I expected Linus to get up at some point too. I actually enjoyed to quiet time in the mornings. A few more days went by like this, and I got used the quiet time in the morning. However of course this quiet time was ruined.
The most common thought would at this stage of the narrative would, of course, be that Linus started to wake up at 5:30 or so. On the contrarary, he continued to sleep until we woke him up at 10. Furthermore, he also slept a lot for his nap. One day he asked if I can wake him up for his bedtime..perhaps he was realizing as well that he was sleeping a lot. As his parental units got really worried and did what one does: we googled around for 20 minutes and became instant sleep experts. We looked through the usual chat rooms, finding comments ranging between "growth spurt" and "emergency room", and found a plot with normal, recommended and not recommended. One becomes very used to this kind of information, but it also is not very helpful.
Later that day as the information I read earlier sank in, I thought: "How worried do I have to be?" I had two options: (a) eat a whole bunch of cookies and drink enough wine to forget this topic until after the holidays, or (b) look for some data and try to see how abnormal that really is. I decided to the latter. So I started my search for sleep-related data...it turns out this data is hard to find. After an hour or so scanning different different manuscripts on google scholar, I found Williams , J.A., Zimmerman, F.J., Bell , J.F. (2013): Norms and trends of sleep time among US children and adolescents, Journal of the American Medical Association - Pediatrics, 167(1), 55 -60, which contained a figure with the 10$^{th}$ , 25$^{th}$ , 50$^{th}$ , 75$^{th}$ , and 90$^{th}$ percentile of night sleep during the weekdays, weekends for ages between 0.5 and 18 years, and daytime sleep for weekend and weekdays for the ages between 0.5 and 5 years. I digitized the data and started playing with it.
Let's start: ...
We have to start with the usual loading of modules in jupyter:
%pylab inline
from scipy.interpolate import interp1d
from scipy.special import erf
import csv
import random
import warnings
matplotlib.rcParams.update({'font.size': 18,'legend.fontsize':15,'font.family': 'serif'})
warnings.filterwarnings('ignore')
Data¶
The data that I digitized are in different files: {A:F}.cvs. Each file contains 6 colunms where the first column is age and the subsequent columns are for the different percentile plotted in Williams et al. (2013). The figure below depicts the data for A.cvs, representing the weekday night sleep.
file='A.csv'
aage=[]
a10=[]
a25=[]
a50=[]
a75=[]
a90=[]
f = open(file)
csv_f = csv.reader(f)
for row in csv_f:
aage.append(float(row[0]))
a10.append(float(row[1]))
a25.append(float(row[2]))
a50.append(float(row[3]))
a75.append(float(row[4]))
a90.append(float(row[5]))
fig0,ax0 = subplots(1,figsize=(10,8.1))
ax0.plot(aage,a10,label=r'$10^{th}$')
ax0.plot(aage,a25,label=r'$25^{th}$')
ax0.plot(aage,a50,label=r'$50^{th}$')
ax0.plot(aage,a75,label=r'$75^{th}$')
ax0.plot(aage,a90,label=r'$90^{th}$')
ax0.set_xlim([0.5,18])
ax0.legend(loc='upper left', bbox_to_anchor=(0.05,1.),title='Percentile' ,frameon=False,fancybox=True, shadow=False, ncol=5, numpoints=1)
ax0.set_xlabel('Age [yrs]')
ax0.set_ylabel('Night Sleep [hrs]')
Helper functions¶
The next few cells contain some scripts to help with the model.
def random_floats(low, high, size):
# returns an array of random numbers of size 'size' between 'low' and 'hihg'.
return [random.uniform(low, high) for _ in xrange(size)]
def error_th(arr,brr,thres):
nnx=len(arr)
new_a=[]
for i_n in range(nnx):
if arr[i_n]<=thres:
new_a=concatenate((new_a,brr[i_n,:]))
new_a=array(new_a)
new_a=reshape(new_a, (len(new_a)/len(brr[0,:]), len(brr[0,:])))
return new_a
The second function (lines 5 to 13) in the cell above is sort of interesting, but essentially just helps to keep those runs whose L$_2$ norm is smaller than the threshold 'thres'. I use the L$_2$ norm to lower find suitable realization that describe the data.
The cell below contains more helper functions.
def find_sleep_distribution(gage,nnage,nnc10,nnc25,nnc50,nnc75,nnc90):
len_age=len(nnage)
for i in range(len_age):
if gage>=nnage[i]:
age_index=i
break
distri_y=[10,25,50,75,90]
distri_x=[nnc10[age_index],nnc25[age_index],nnc50[age_index],nnc75[age_index],nnc90[age_index]]
return distri_x, distri_y
def find_percent(linus_sleep,nnage,xx):
len_age=len(nnage)
# print len_age
for i in range(len_age):
if nnage[i]>=linus_sleep:
h_index=i
break
# print h_index
len_range=len(xx[:,0])
percent=[]
# print len_range,shape(xx)
for j in range(len_range):
percent.append(float(xx[j,h_index]))
percent=array(percent)
return percent
def inter(x,y,nx):
f = interp1d(x, y,'cubic')
ny=f(nx)
return ny
def fitFunc(x,a,b):
return 0.5*(1.+erf((x-a)/(b*sqrt(2.))))
The last function (lines 32 and 33) in the cell are interesting because they reveal that we use the error function to approximate the cummulative distributation given by the data in Williams et al. (2013).
The cell below contains some basic information, like Linus' age and the amount he sleeps during the day and night on weekdays and on weekends. If you try this with your child, this is where you need to change things.
linus_age=4.5
naage=linspace(0.5,18.5,73)
nbage=linspace(0.5,5.0,21)
weekday_night_hours=11.
weekend_night_hours=12.
weekday_day_hours=1.0
weekend_day_hours=1.0
sim_hours_n=linspace(0,20,41)
sim_hours_d=linspace(0,10,11)
file='A.csv'
aage=[]
a10=[]
a25=[]
a50=[]
a75=[]
a90=[]
f = open(file)
csv_f = csv.reader(f)
for row in csv_f:
aage.append(float(row[0]))
a10.append(float(row[1]))
a25.append(float(row[2]))
a50.append(float(row[3]))
a75.append(float(row[4]))
a90.append(float(row[5]))
aage=array(aage)
a10=array(a10)
a25=array(a25)
a50=array(a50)
a75=array(a75)
a90=array(a90)
na10=inter(aage,a10,naage)
na25=inter(aage,a25,naage)
na50=inter(aage,a50,naage)
na75=inter(aage,a75,naage)
na90=inter(aage,a90,naage)
a_x,a_y = find_sleep_distribution(linus_age,naage,na10,na25,na50,na75,na90)
a_x=array(a_x)
a_y=array(a_y)/100.
sim_hours=linspace(0,20,41)
asigma=random_floats(1.5, 3.5, 10000)
amu=random_floats(a_x[2]-0.5, a_x[2]+0.5, 10000)
dummycases=zeros([len(asigma),len(sim_hours_n)])
nerror=zeros([len(asigma)])
for i in range(len(asigma)):
dummycases[i,:]=fitFunc(sim_hours_n,amu[i],asigma[i])
dummy=inter(sim_hours_n,dummycases[i,:],a_x)
nerror[i] = (dot(a_y-dummy,a_y-dummy))**.5
acases=error_th(nerror,dummycases,0.05)
weekday_night=find_percent(weekday_night_hours,sim_hours_n,acases)
The first 30 lines in the cell above deal with the data. Line 32 creates the $\sigma$ array for the error function, and line 33 the respective $\mu$. You see that the vectors for both have 10000 elements, resulting in 10000 realizations. The loop from line 36 to 39 represents the model realizations. Line 40 is interesting as well because uses the L$_2$ to limit the realizations employed to caluclate the percentile for the weekday nights (last line). The third argument (0.04) in line 40 is the threshold L$_2$ norm. Realizations with a larger L$_2$ will be discarded. The L$_2$ norm is computed between the data and the model, of course. The figure below shows the results.
fig1,ax1 = subplots(1,figsize=(10,8.1))
for i in range(len(acases)):
ax1.plot(sim_hours,acases[i,:],'grey');
ax1.plot(a_x,a_y,'k-',lw=2);
ax1.set_xlim([5,20]);
ax1.set_xlabel("Sleep [hrs]");
ax1.set_ylabel("Percentile of kids");
We see from the figure above that the model realizations envelope around the data. That is what I wanted.
Now repeat this with all the other elements...
Total: Night¶
file='C.csv'
cage=[]
c10=[]
c25=[]
c50=[]
c75=[]
c90=[]
f = open(file)
csv_f = csv.reader(f)
for row in csv_f:
cage.append(float(row[0]))
c10.append(float(row[1]))
c25.append(float(row[2]))
c50.append(float(row[3]))
c75.append(float(row[4]))
c90.append(float(row[5]))
cage=array(cage)
c10=array(c10)
c25=array(c25)
c50=array(c50)
c75=array(c75)
c90=array(c90)
nc10=inter(cage,c10,naage)
nc25=inter(cage,c25,naage)
nc50=inter(cage,c50,naage)
nc75=inter(cage,c75,naage)
nc90=inter(cage,c90,naage)
c_x,c_y = find_sleep_distribution(linus_age,naage,nc10,nc25,nc50,nc75,nc90)
c_x=array(c_x)
c_y=array(c_y)/100.
csigma=random_floats(0.1, 10.5, 10000)
cmu=random_floats(c_x[2]-0.5, c_x[2]+0.5, 10000)
dummycases=zeros([len(csigma),len(sim_hours_n)])
nerror=zeros([len(csigma)])
for i in range(len(csigma)):
dummycases[i,:]=fitFunc(sim_hours_n,cmu[i],csigma[i])
dummy=inter(sim_hours,dummycases[i,:],c_x)
nerror[i] = (dot(c_y-dummy,c_y-dummy))**.5
ccases=error_th(nerror,dummycases,0.05)
totaltime=1./7.*(5.*weekday_night_hours + 2.*weekend_night_hours)
total_night=find_percent(totaltime,sim_hours_n,ccases)
The only difference in this cell compared to cell 6 is the line 41, which computes the average time of sleep computed from weekday and weekend night sleep.
Weekend: Night¶
file='B.csv'
bage=[]
b10=[]
b25=[]
b50=[]
b75=[]
b90=[]
f = open(file)
csv_f = csv.reader(f)
for row in csv_f:
bage.append(float(row[0]))
b10.append(float(row[1]))
b25.append(float(row[2]))
b50.append(float(row[3]))
b75.append(float(row[4]))
b90.append(float(row[5]))
bage=array(bage)
b10=array(b10)
b25=array(b25)
b50=array(b50)
b75=array(b75)
b90=array(b90)
nb10=inter(bage,b10,naage)
nb25=inter(bage,b25,naage)
nb50=inter(bage,b50,naage)
nb75=inter(bage,b75,naage)
nb90=inter(bage,b90,naage)
b_x,b_y = find_sleep_distribution(linus_age,naage,nb10,nb25,nb50,nb75,nb90)
b_x=array(b_x)
b_y=array(b_y)/100.
bsigma=random_floats(1, 3.6, 10000)
bmu=random_floats(b_x[2]-0.5, b_x[2]+0.5, 10000)
#acases=zeros([len(asigma),len(naage)])
dummycases=zeros([len(bsigma),len(sim_hours_n)])
#dummy=zeros([len(asigma),len(a_x)])
nerror=zeros([len(bsigma)])
m=-1
for i in range(len(bsigma)):
dummycases[i,:]=fitFunc(sim_hours_n,bmu[i],bsigma[i])
dummy=inter(sim_hours_n,dummycases[i,:],b_x)
#nerror[i]=norm(a_y - dummy)
nerror[i] = (dot(b_y-dummy,b_y-dummy))**.5
bcases=error_th(nerror,dummycases,0.04)
weekend_night=find_percent(weekend_night_hours,sim_hours_n,bcases)
Weekday: Day¶
file='D.csv'
dage=[]
d10=[]
d25=[]
d50=[]
d75=[]
d90=[]
f = open(file)
csv_f = csv.reader(f)
for row in csv_f:
dage.append(float(row[0]))
d10.append(float(row[1]))
d25.append(float(row[2]))
d50.append(float(row[3]))
d75.append(float(row[4]))
d90.append(float(row[5]))
dage=array(dage)
d10=array(d10)
d25=array(d25)
d50=array(d50)
d75=array(d75)
d90=array(d90)
nd10=inter(dage,d10,nbage)
nd25=inter(dage,d25,nbage)
nd50=inter(dage,d50,nbage)
nd75=inter(dage,d75,nbage)
nd90=inter(dage,d90,nbage)
d_x,d_y = find_sleep_distribution(linus_age,nbage,nd10,nd25,nd50,nd75,nd90)
d_x=array(d_x)
d_y=array(d_y)/100.
dsigma=random_floats(1., 2.9, 10000)
dmu=random_floats(d_x[2]-0.5, d_x[2]+0.5, 10000)
dummycases=zeros([len(dsigma),len(sim_hours_d)])
nerror=zeros([len(dsigma)])
for i in range(len(dsigma)):
dummycases[i,:]=fitFunc(sim_hours_d,dmu[i],dsigma[i])
dummy=inter(sim_hours_d,dummycases[i,:],d_x)
nerror[i] = (dot(d_y-dummy,d_y-dummy))**.5
dcases=error_th(nerror,dummycases,0.15)
weekday_day=find_percent(weekday_day_hours,sim_hours_d,dcases)
file='E.csv'
eage=[]
e10=[]
e25=[]
e50=[]
e75=[]
e90=[]
f = open(file)
csv_f = csv.reader(f)
for row in csv_f:
eage.append(float(row[0]))
e10.append(float(row[1]))
e25.append(float(row[2]))
e50.append(float(row[3]))
e75.append(float(row[4]))
e90.append(float(row[5]))
eage=array(eage)
e10=array(e10)
e25=array(e25)
e50=array(e50)
e75=array(e75)
e90=array(e90)
ne10=inter(eage,e10,nbage)
ne25=inter(eage,e25,nbage)
ne50=inter(eage,e50,nbage)
ne75=inter(eage,e75,nbage)
ne90=inter(eage,e90,nbage)
e_x,e_y = find_sleep_distribution(linus_age,nbage,ne10,ne25,ne50,ne75,ne90)
e_x=array(e_x)
e_y=array(e_y)/100.
esigma=random_floats(1., 2.9 ,10000)
emu=random_floats(e_x[2]-0.5, e_x[2]+0.5, 10000)
dummycases=zeros([len(esigma),len(sim_hours_d)])
nerror=zeros([len(esigma)])
for i in range(len(esigma)):
dummycases[i,:]=fitFunc(sim_hours_d,emu[i],esigma[i])
dummy=inter(sim_hours_d,dummycases[i,:],e_x)
nerror[i] = (dot(e_y-dummy,e_y-dummy))**.5
ecases=error_th(nerror,dummycases,0.15)
weekend_day=find_percent(weekend_day_hours,sim_hours_d,ecases)
file='F.csv'
fage=[]
f10=[]
f25=[]
f50=[]
f75=[]
f90=[]
f = open(file)
csv_f = csv.reader(f)
for row in csv_f:
fage.append(float(row[0]))
f10.append(float(row[1]))
f25.append(float(row[2]))
f50.append(float(row[3]))
f75.append(float(row[4]))
f90.append(float(row[5]))
fage=array(fage)
f10=array(f10)
f25=array(f25)
f50=array(f50)
f75=array(f75)
f90=array(f90)
nf10=inter(fage,f10,nbage)
nf25=inter(fage,f25,nbage)
nf50=inter(fage,f50,nbage)
nf75=inter(fage,f75,nbage)
nf90=inter(fage,f90,nbage)
f_x,f_y = find_sleep_distribution(linus_age,nbage,nf10,nf25,nf50,nf75,nf90)
f_x=array(f_x)
f_y=array(f_y)/100.
fsigma=random_floats(1., 2.9, 10000)
fmu=random_floats(f_x[2]-0.5, f_x[2]+0.5, 10000)
dummycases=zeros([len(fsigma),len(sim_hours_d)])
nerror=zeros([len(fsigma)])
for i in range(len(fsigma)):
dummycases[i,:]=fitFunc(sim_hours_d,fmu[i],fsigma[i])
dummy=inter(sim_hours_d,dummycases[i,:],f_x)
nerror[i] = (dot(f_y-dummy,f_y-dummy))**.5
fcases=error_th(nerror,dummycases,0.15)
totaltime=1./7.*(5.*weekday_day_hours + 2.*weekend_day_hours)
total_day=find_percent(totaltime,sim_hours_d,fcases)
Results¶
print
print " \t\t mean \t\t min \t \t max \t\t std"
print "\t\t\t\t\t Night"
print "Weekday:", '\t{:6.2f}'.format(mean(weekday_night)),'\t\t{:6.2f}'.format(min(weekday_night)),'\t\t{:6.2f}'.format(max(weekday_night)),'\t\t{:6.4f}'.format(std(weekday_night))
print "Weekend:", '\t{:6.2f}'.format(mean(weekend_night)),'\t\t{:6.2f}'.format(min(weekend_night)),'\t\t{:6.2f}'.format(max(weekend_night)),'\t\t{:6.4f}'.format(std(weekend_night))
print "Total: ", '\t{:6.2f}'.format(mean(total_night)), '\t\t{:6.2f}'.format(min(total_night)), '\t\t{:6.2f}'.format(max(total_night)), '\t\t{:6.4f}'.format(std(total_night))
print
print "\t\t\t\t\t Day"
print "Weekday:", '\t{:6.2f}'.format(mean(weekday_day)), '\t\t{:6.2f}'.format(min(weekday_day)), '\t\t{:6.2f}'.format(max(weekday_day)), '\t\t{:6.4f}'.format(std(weekday_day))
print "Weekend:", '\t{:6.2f}'.format(mean(weekend_day)), '\t\t{:6.2f}'.format(min(weekend_day)),'\t\t{:6.2f}'.format(max(weekend_day)),'\t\t{:6.4f}'.format(std(weekend_day))
print "Total: ", '\t{:6.2f}'.format(mean(total_day)), '\t\t{:6.2f}'.format(min(total_day)), '\t\t{:6.2f}'.format(max(total_day)), '\t\t{:6.4f}'.format(std(total_day))
After seeing the results, I calmed down, ate some cookies and drank some wine....after all. What I learned from this is simple: We have all have skills. Why not use them? To all our the nerd friends who have, will soon or want to have kids: Use your quantitative skills, and don't be afraid to show you model and the results to your doctor...at least I am not and he or she might get to laugh. Ending on a more serious note, maybe I am kidding myself. Crunching numbers and running models might be actions that are nothing else than feeding my addiction or distracting, but I argue that it is a lot better than freaking out based on a bunch of chat entries.