Tutorials¶

Short Tutorial in the Retention Class.¶

Let’s import the class and see insert a some retention numbers.

Lets see what what we have in here:

                 Retention
DaysSinceInstall
1                    50.0%
7                    15.0%
30                    5.0%

Lets fit an equation to this data, default is a power function and as of right now the only available functions build in are

Power function: \(f(x) = k_1x^{k_2}\) identifier: power

Lets see how well it was fitted:

                 Retention RetentionFit
DaysSinceInstall
                  50.0%        50.1%
                   nan%        32.2%
                   nan%        24.9%
                   nan%        20.7%
                   nan%        18.0%
                   nan%        16.0%
                  15.0%        14.5%
                   nan%        13.3%
                   nan%        12.3%
                  nan%        11.5%
                  nan%        10.9%
                  nan%        10.3%
                  nan%         9.8%
                  nan%         9.3%
                  nan%         8.9%
                  nan%         8.6%
                  nan%         8.2%
                  nan%         7.9%
                  nan%         7.7%
                  nan%         7.4%
                  nan%         7.2%
                  nan%         7.0%
                  nan%         6.8%
                  nan%         6.6%
                  nan%         6.4%
                  nan%         6.3%
                  nan%         6.1%
                  nan%         6.0%
                  nan%         5.9%
                  5.0%         5.7%

Let’s try something interesting - what is the sum of the retention values from the retention array?

7.987575025935071

Hmm, that is not the same as the 8.7 calculated using the integrated function of the power equation. At its core the reason for this is that when performing Riemann integration we look at a smooth curve, whereas simply summing retention values from an array we assume a certain linear relationship between point \(R(i)\) and \(R(i+1)\).

Custom Functions¶

In this example, a power function was used, but any callable function can be used.

NOTE: It is recommended to use the Fitfunction class to create new functions, this class simply needs two functions, the equation used to fit and the integrated equation from 1 to b. The integrated function is only used to calculate the summed retention and can be left out. In that case, just pass the callable function directly. Anyway, example with power function and its integrated companion:

\[R(t) = k_1t^{k_2}.\]

The integral from a to b is simple

\[\int_a^b R(t) dt = \frac{k_1}{k_2+1}\left( b^{ k_2 + 1} - a^{k_2 + 1}\right)\]

Well… Turns out that fitting is not always 100 % similar to real world data, to not get into some bad situations lets set \(a=1\), we can do this, since we know that retention for day 0 is always 1!

\[\int_1^b R(t) dt + 1= \frac{k_1}{k_2+1}\left( b^{ k_2 + 1} - 1^{k_2 + 1}\right) + 1.\]

Lets create a fitter class for our power function:

Lets fit with this function:

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

DaysSinceInstall
     0.51+/-0.04
   0.322+/-0.018
   0.246+/-0.013
   0.204+/-0.010
   0.176+/-0.009
   0.156+/-0.008
   0.140+/-0.008
   0.129+/-0.008
   0.119+/-0.007
  0.111+/-0.007
  0.104+/-0.007
  0.098+/-0.007
  0.093+/-0.006
  0.089+/-0.006
  0.085+/-0.006
  0.081+/-0.006
  0.078+/-0.006
  0.075+/-0.006
  0.072+/-0.006
  0.070+/-0.006
  0.068+/-0.006
  0.066+/-0.006
  0.064+/-0.005
  0.062+/-0.005
  0.060+/-0.005
  0.059+/-0.005
  0.057+/-0.005
  0.056+/-0.005
  0.055+/-0.005
  0.054+/-0.005
Name: RetentionFit, dtype: object

array([0.51007967, 0.32223477, 0.24631593, 0.20356673, 0.17558734,
       0.1556062 , 0.14049706, 0.12860006, 0.11894522, 0.11092453,
       0.10413587, 0.09830176, 0.09322396, 0.0887568 , 0.0847906 ,
       0.08124106, 0.07804224, 0.07514176, 0.07249742, 0.07007482,
       0.06784561, 0.06578619, 0.06387677, 0.06210058, 0.06044333,
       0.05889276, 0.05743829, 0.05607071, 0.054782  , 0.05356512])

array([0.03564148, 0.01775511, 0.01252601, 0.01027747, 0.00909227,
       0.00837123, 0.00788221, 0.00752197, 0.00723962, 0.00700784,
       0.00681092, 0.00663929, 0.00648677, 0.00634921, 0.0062237 ,
       0.00610815, 0.00600099, 0.005901  , 0.00580724, 0.00571896,
       0.00563553, 0.00555645, 0.00548129, 0.00540969, 0.00534132,
       0.00527593, 0.00521327, 0.00515313, 0.00509533, 0.00503971])

So why bother with this integral function anyway? Well, let’s see what happens when we create a data set consisting of retention values from days since install 1 to 180 and sum that using the same power function with the same parameters:

It is also possible to save and load instances of classes (using pickle):

my_retention.save('myretention.pkl')

Loading the data is simply done with

import pyfreya
LoadedRetentionClass = pyfreya.load('myretention.pkl')

Short Tutorial in the Cohort Class.¶

Retention¶

Let’s import the class and see insert a some retention numbers along with the amount of new users in the cohort.

To get more info on retenion see retention tutorial.

                        1
DaysSinceInstall
                   100
               50.0629
               32.1914
               24.8632
               20.6996
               17.9566
               15.9875
               14.4921
               13.3102
                12.348
              11.5464
              10.8662
              10.2802
              9.76917
              9.31867
              8.91795
              8.55872
              8.23447
              7.94001
              7.67118
              7.42456
              7.19733
              6.98716
              6.79206
              6.61038
              6.44069
              6.28175
              6.13252
              5.99207
               5.8596
               5.7344
              5.61586

Note: That the cohort class can also take a retention profile instead of actual retention data points. The name given is not of any particular importance now, but when plotting various aggregates from multiple cohorts easily identifiable names are nice to have - is no name given a random one will be applied.

Daily Active Users¶

Maybe similar cohorts comes in multiple days in a row. It is moddeled like this:

                       1        2        3        4        5        6        7        8        9        10
DaysSinceInstall
                   100      100      100      100      100      100      100      100      100      100
               50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629
                   NaN  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914
                   NaN      NaN  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632
                   NaN      NaN      NaN  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996
                   NaN      NaN      NaN      NaN  17.9566  17.9566  17.9566  17.9566  17.9566  17.9566
                   NaN      NaN      NaN      NaN      NaN  15.9875  15.9875  15.9875  15.9875  15.9875
                   NaN      NaN      NaN      NaN      NaN      NaN  14.4921  14.4921  14.4921  14.4921
                   NaN      NaN      NaN      NaN      NaN      NaN      NaN  13.3102  13.3102  13.3102
                   NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN   12.348   12.348
                  NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN  11.5464

Well - its nice to see this user distribution, but how many daily active users do we have ? (also note the type is a pandas DataFrame)

<class 'pandas.core.frame.DataFrame'>

	dau
Date
1	100
2	150.063
3	182.254
4	207.117
5	227.817
6	245.774
7	261.761
8	276.253
9	289.563
10	301.912

Since users are still active after the influx of 10 days lets see what it looks like after 30 days (10 days of user influx and 20 days of waiting):

	dau
Date
1	100
2	150.063
3	182.254
4	207.117
5	227.817
6	245.774
7	261.761
8	276.253
9	289.563
10	301.912
11	213.458
12	174.261
13	152.35
14	137.256
15	125.875
16	116.836
17	109.408
18	103.15
19	97.7799
20	93.103
21	88.9812
22	85.3123
23	82.0192
24	79.0421
25	76.3338
26	73.8566
27	71.5796
28	69.4776
29	67.5297
30	65.7181

Enough numbers, lets plot some of this. First, lets plot the retention - maybe it fitted the data incorrectly:

How about dau?

If you wonder how long time it takes to reach a certain amount of dau it can be calculated. This does assume a steady influx of users given in new_users and with the retention profile calculated earlier.

Datetime¶

What kind of date is this anyway? Lets use proper human dates from the Gregorian calendar:

Revenue¶

Well how much money did we earn? A premade revenue profile class called ARPDAU is imported and is initialized by setting the ARPDAU to a value.

	dau	revenue
Date
2019-03-14	100	210
2019-03-15	150.063	315.132
2019-03-16	182.254	382.734
2019-03-17	207.117	434.947
2019-03-18	227.817	478.416
2019-03-19	245.774	516.125
2019-03-20	261.761	549.698
2019-03-21	276.253	580.132
2019-03-22	289.563	608.083
2019-03-23	301.912	634.014
2019-03-24	313.458	658.262
2019-03-25	324.324	681.081
2019-03-26	334.604	702.669
2019-03-27	344.374	723.184
2019-03-28	353.692	742.754
2019-03-29	362.61	761.481
2019-03-30	371.169	779.455
2019-03-31	379.403	796.747
2019-04-01	387.343	813.421
2019-04-02	395.015	829.531
2019-04-03	402.439	845.122
2019-04-04	409.636	860.237
2019-04-05	416.624	874.91
2019-04-06	323.416	679.173
2019-04-07	279.963	587.923
2019-04-08	254.212	533.846
2019-04-09	235.631	494.825
2019-04-10	221.064	464.234
2019-04-11	209.099	439.109
2019-04-12	198.971	417.84
2019-04-13	190.214	399.449
2019-04-14	182.519	383.291
2019-04-15	175.675	368.917

This can be plotted too!

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
    100+/-5
    151+/-8
   183+/-11
   208+/-12
   228+/-14
   246+/-15
   261+/-16
   275+/-17
   288+/-17
  300+/-18
  311+/-19
  322+/-20
  331+/-20
  341+/-21
  350+/-22
  358+/-22
  366+/-23
  374+/-23
  382+/-24
  389+/-24
  396+/-25
  403+/-26
  409+/-26
  316+/-23
  271+/-21
  245+/-19
  226+/-19
  211+/-18
  199+/-17
  189+/-17
  181+/-16
  173+/-16
  166+/-16
Name: dau, dtype: object

array([100.        , 151.00796688, 183.23144399, 207.86303692,
       228.21970943, 245.77844367, 261.3390633 , 275.38876963,
       288.24877584, 300.14329757, 311.23575048, 321.64933796,
       331.47951354, 340.80190942, 349.6775894 , 358.15664906,
       366.28075478, 374.08497899, 381.59915514, 388.84889718,
       395.85637904, 402.64093973, 409.21955908, 315.60723627,
       270.8093275 , 244.6301836 , 225.88786696, 211.27502302,
       199.32335936, 189.24093946, 180.54774509, 172.92912575,
       166.16687945])

array([ 5.        ,  8.34935123, 10.56825389, 12.20066331, 13.51786087,
       14.6418741 , 15.63646161, 16.53872917, 17.37200759, 18.15183883,
       18.88905055, 19.59146105, 20.26488384, 20.91374916, 21.54150443,
       22.1508812 , 22.74407817, 23.32289008, 23.88880016, 24.44304807,
       24.98668034, 25.52058868, 26.04553928, 22.86365271, 20.63415691,
       19.40634919, 18.5255104 , 17.82742722, 17.24464372, 16.74232945,
       16.29990806, 15.90409195, 15.54574024])

It is possible to save a cohort class instance (using pickle) and loading it.

facebook.save('facebook_revenue.pkl')
import pyfreya
facebook_loaded = pyfreya.load('facebook_revenue.pkl')

Short Tutorial in the Revenue Classes.¶

Let’s import the class and insert a some retention numbers along with the amount of new users in a cohort. Additional information about Retention and Cohort classes can be found at retention tutorial and cohort tutorial. Let’s also import a predefined revenue spending class (profile) ARPDAU.

                        1
DaysSinceInstall
                   100
               50.0629
               32.1914
               24.8632
               20.6996
               17.9566
               15.9875
               14.4921
               13.3102
                12.348
              11.5464
              10.8662
              10.2802
              9.76917
              9.31867
              8.91795
              8.55872
              8.23447
              7.94001
              7.67118
              7.42456
              7.19733
              6.98716
              6.79206
              6.61038
              6.44069
              6.28175
              6.13252
              5.99207
               5.8596
               5.7344
              5.61586

	dau	revenue
Date
1	100	210
2	50.0629	105.132
3	32.1914	67.6019
4	24.8632	52.2127
5	20.6996	43.4692
6	17.9566	37.7089
7	15.9875	33.5737
8	14.4921	30.4334
9	13.3102	27.9515
10	12.348	25.9309
11	11.5464	24.2475
12	10.8662	22.819
13	10.2802	21.5885
14	9.76917	20.5153
15	9.31867	19.5692
16	8.91795	18.7277
17	8.55872	17.9733
18	8.23447	17.2924
19	7.94001	16.674
20	7.67118	16.1095
21	7.42456	15.5916
22	7.19733	15.1144
23	6.98716	14.673
24	6.79206	14.2633
25	6.61038	13.8818
26	6.44069	13.5254
27	6.28175	13.1917
28	6.13252	12.8783
29	5.99207	12.5833
30	5.8596	12.3052
31	5.7344	12.0422

As can be seen, there is also room for adding uncertainty. Let’s import the base revenue class and define ARPDAU but with uncertainty. It does make use of some properties of the Cohort class, it is recommended to be a bit familiar with that.

	dau	revenue
Date
1	100	210
2	50.0629	105.132
3	32.1914	67.6019
4	24.8632	52.2127
5	20.6996	43.4692
6	17.9566	37.7089
7	15.9875	33.5737
8	14.4921	30.4334
9	13.3102	27.9515
10	12.348	25.9309
11	11.5464	24.2475
12	10.8662	22.819
13	10.2802	21.5885
14	9.76917	20.5153
15	9.31867	19.5692
16	8.91795	18.7277
17	8.55872	17.9733
18	8.23447	17.2924
19	7.94001	16.674
20	7.67118	16.1095
21	7.42456	15.5916
22	7.19733	15.1144
23	6.98716	14.673
24	6.79206	14.2633
25	6.61038	13.8818
26	6.44069	13.5254
27	6.28175	13.1917
28	6.13252	12.8783
29	5.99207	12.5833
30	5.8596	12.3052
31	5.7344	12.0422

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
     210+/-23
     107+/-14
       68+/-8
       52+/-6
       43+/-5
       37+/-4
       33+/-4
       30+/-4
   27.0+/-3.3
  25.0+/-3.1
  23.3+/-2.9
  21.9+/-2.8
  20.6+/-2.6
  19.6+/-2.5
  18.6+/-2.4
  17.8+/-2.3
  17.1+/-2.2
  16.4+/-2.2
  15.8+/-2.1
  15.2+/-2.0
  14.7+/-2.0
  14.2+/-1.9
  13.8+/-1.9
  13.4+/-1.8
  13.0+/-1.8
  12.7+/-1.8
  12.4+/-1.7
  12.1+/-1.7
  11.8+/-1.7
  11.5+/-1.6
  11.2+/-1.6
Name: revenue, dtype: object

array([210.        , 107.11673044,  67.66930193,  51.72634515,
        42.74901227,  36.8733419 ,  32.67730124,  29.50438329,
        27.00601304,  24.97849563,  23.29415112,  21.86853369,
        20.64336873,  19.57703135,  18.63892795,  17.80602528,
        17.06062202,  16.38887082,  15.77976993,  15.22445827,
        14.71571191,  14.24757745,  13.81510062,  13.41412211,
        13.04112202,  12.69309974,  12.36748022,  12.06203999,
        11.77484823,  11.50421944,  11.24867512])

array([22.588714  , 13.73966474,  8.17827559,  6.15442801,  5.07962406,
        4.40195366,  3.9300445 ,  3.57937697,  3.3066122 ,  3.08712987,
        2.90585349,  2.75300466,  2.62194609,  2.5080031 ,  2.40778109,
        2.31875073,  2.23898552,  2.16699004,  2.1015841 ,  2.04182268,
        1.9869392 ,  1.93630468,  1.88939763,  1.84578168,  1.80508855,
        1.76700502,  1.73126286,  1.69763082,  1.66590837,  1.63592067,
        1.60751445])

It is possible to save a revenue class instance (using pickle) and loading it.

facebook.save('facebook_revenue.pkl')
facebook_loaded = pyfreya.load('facebook_revenue.pkl')

Short Tutorial Working With Multiple Cohorts¶

When working with multiple cohorts plotting can’t be performed with the Cohort class methods, instead, import multi plot functions. Let’s create some cohorts first.

Lets plot their retention

tutorials/multicohort/Multi_Cohort_Example_3_0.png

If copies of the Facebook cohort comes in each day in 10 days and copies of the Google cohort comes in 6 days in a row - what does DAU look like?

tutorials/multicohort/Multi_Cohort_Example_5_0.png

tutorials/multicohort/Multi_Cohort_Example_6_0.png

How does revenue look across this? Lets apply a revenue profile and take a look.

tutorials/multicohort/Multi_Cohort_Example_8_0.png

tutorials/multicohort/Multi_Cohort_Example_9_0.png

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

tutorials/multicohort/Multi_Cohort_Example_11_0.png

tutorials/multicohort/Multi_Cohort_Example_12_0.png

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
    100+/-6
   152+/-10
   185+/-13
   209+/-16
   230+/-17
   247+/-19
   262+/-20
   276+/-21
   288+/-22
  300+/-23
Name: dau, dtype: object

array([100.        , 152.2302569 , 184.70997132, 209.30960342,
       229.50731853, 246.84077882, 262.13821507, 275.9019684 ,
       288.46204483, 300.04808586])

array([ 6.        , 10.49918186, 13.4983251 , 15.64446572, 17.31653098,
       18.69011938, 19.85918453, 20.87961789, 21.78730166, 22.60661843])

Tutorials¶

Short Tutorial in the Retention Class.¶

Custom Functions¶

Short Tutorial in the Cohort Class.¶

Retention¶

Daily Active Users¶

Datetime¶

Revenue¶

Short Tutorial in the Revenue Classes.¶

Short Tutorial Working With Multiple Cohorts¶

PyFreya

Navigation

Related Topics