pyfreya.cohort package¶

Submodules¶

pyfreya.cohort.cohort module¶

Short Tutorial in the Cohort Class.¶

Retention¶

Let’s import the class and see insert a some retention numbers along with the amount of new users in the cohort.

To get more info on retenion see retention tutorial.

                        1
DaysSinceInstall
                   100
               50.0629
               32.1914
               24.8632
               20.6996
               17.9566
               15.9875
               14.4921
               13.3102
                12.348
              11.5464
              10.8662
              10.2802
              9.76917
              9.31867
              8.91795
              8.55872
              8.23447
              7.94001
              7.67118
              7.42456
              7.19733
              6.98716
              6.79206
              6.61038
              6.44069
              6.28175
              6.13252
              5.99207
               5.8596
               5.7344
              5.61586

Note: That the cohort class can also take a retention profile instead of actual retention data points. The name given is not of any particular importance now, but when plotting various aggregates from multiple cohorts easily identifiable names are nice to have - is no name given a random one will be applied.

Daily Active Users¶

Maybe similar cohorts comes in multiple days in a row. It is moddeled like this:

                       1        2        3        4        5        6        7        8        9        10
DaysSinceInstall
                   100      100      100      100      100      100      100      100      100      100
               50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629
                   NaN  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914
                   NaN      NaN  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632
                   NaN      NaN      NaN  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996
                   NaN      NaN      NaN      NaN  17.9566  17.9566  17.9566  17.9566  17.9566  17.9566
                   NaN      NaN      NaN      NaN      NaN  15.9875  15.9875  15.9875  15.9875  15.9875
                   NaN      NaN      NaN      NaN      NaN      NaN  14.4921  14.4921  14.4921  14.4921
                   NaN      NaN      NaN      NaN      NaN      NaN      NaN  13.3102  13.3102  13.3102
                   NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN   12.348   12.348
                  NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN  11.5464

Well - its nice to see this user distribution, but how many daily active users do we have ? (also note the type is a pandas DataFrame)

<class 'pandas.core.frame.DataFrame'>

	dau
Date
1	100
2	150.063
3	182.254
4	207.117
5	227.817
6	245.774
7	261.761
8	276.253
9	289.563
10	301.912

Since users are still active after the influx of 10 days lets see what it looks like after 30 days (10 days of user influx and 20 days of waiting):

	dau
Date
1	100
2	150.063
3	182.254
4	207.117
5	227.817
6	245.774
7	261.761
8	276.253
9	289.563
10	301.912
11	213.458
12	174.261
13	152.35
14	137.256
15	125.875
16	116.836
17	109.408
18	103.15
19	97.7799
20	93.103
21	88.9812
22	85.3123
23	82.0192
24	79.0421
25	76.3338
26	73.8566
27	71.5796
28	69.4776
29	67.5297
30	65.7181

Enough numbers, lets plot some of this. First, lets plot the retention - maybe it fitted the data incorrectly:

How about dau?

If you wonder how long time it takes to reach a certain amount of dau it can be calculated. This does assume a steady influx of users given in new_users and with the retention profile calculated earlier.

Datetime¶

What kind of date is this anyway? Lets use proper human dates from the Gregorian calendar:

Revenue¶

Well how much money did we earn? A premade revenue profile class called ARPDAU is imported and is initialized by setting the ARPDAU to a value.

	dau	revenue
Date
2019-03-14	100	210
2019-03-15	150.063	315.132
2019-03-16	182.254	382.734
2019-03-17	207.117	434.947
2019-03-18	227.817	478.416
2019-03-19	245.774	516.125
2019-03-20	261.761	549.698
2019-03-21	276.253	580.132
2019-03-22	289.563	608.083
2019-03-23	301.912	634.014
2019-03-24	313.458	658.262
2019-03-25	324.324	681.081
2019-03-26	334.604	702.669
2019-03-27	344.374	723.184
2019-03-28	353.692	742.754
2019-03-29	362.61	761.481
2019-03-30	371.169	779.455
2019-03-31	379.403	796.747
2019-04-01	387.343	813.421
2019-04-02	395.015	829.531
2019-04-03	402.439	845.122
2019-04-04	409.636	860.237
2019-04-05	416.624	874.91
2019-04-06	323.416	679.173
2019-04-07	279.963	587.923
2019-04-08	254.212	533.846
2019-04-09	235.631	494.825
2019-04-10	221.064	464.234
2019-04-11	209.099	439.109
2019-04-12	198.971	417.84
2019-04-13	190.214	399.449
2019-04-14	182.519	383.291
2019-04-15	175.675	368.917

This can be plotted too!

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
    100+/-5
    151+/-8
   183+/-11
   208+/-12
   228+/-14
   246+/-15
   261+/-16
   275+/-17
   288+/-17
  300+/-18
  311+/-19
  322+/-20
  331+/-20
  341+/-21
  350+/-22
  358+/-22
  366+/-23
  374+/-23
  382+/-24
  389+/-24
  396+/-25
  403+/-26
  409+/-26
  316+/-23
  271+/-21
  245+/-19
  226+/-19
  211+/-18
  199+/-17
  189+/-17
  181+/-16
  173+/-16
  166+/-16
Name: dau, dtype: object

array([100.        , 151.00796688, 183.23144399, 207.86303692,
       228.21970943, 245.77844367, 261.3390633 , 275.38876963,
       288.24877584, 300.14329757, 311.23575048, 321.64933796,
       331.47951354, 340.80190942, 349.6775894 , 358.15664906,
       366.28075478, 374.08497899, 381.59915514, 388.84889718,
       395.85637904, 402.64093973, 409.21955908, 315.60723627,
       270.8093275 , 244.6301836 , 225.88786696, 211.27502302,
       199.32335936, 189.24093946, 180.54774509, 172.92912575,
       166.16687945])

array([ 5.        ,  8.34935123, 10.56825389, 12.20066331, 13.51786087,
       14.6418741 , 15.63646161, 16.53872917, 17.37200759, 18.15183883,
       18.88905055, 19.59146105, 20.26488384, 20.91374916, 21.54150443,
       22.1508812 , 22.74407817, 23.32289008, 23.88880016, 24.44304807,
       24.98668034, 25.52058868, 26.04553928, 22.86365271, 20.63415691,
       19.40634919, 18.5255104 , 17.82742722, 17.24464372, 16.74232945,
       16.29990806, 15.90409195, 15.54574024])

It is possible to save a cohort class instance (using pickle) and loading it.

facebook.save('facebook_revenue.pkl')
import pyfreya
facebook_loaded = pyfreya.load('facebook_revenue.pkl')

Cohort Class¶

class pyfreya.cohort.cohort.Cohort(new_users, days_since_install=None, retention_values=None, retention_function='power', retention_profile=None, start_date=1, revenue_profile=None, name='')[source]¶

Bases: object

Cohort class new_users parameter must be provided. To add retention, either add retention and days since install values or supply a pre-made retention profile - see Retention.

apply_revenue(revenue_profile=None)[source]¶

Given a revenue profile and a cohort apply the revenue profile to get revenue and revenue uncertainty.

Parameters:	revenue_profile (`Optional`[`BaseRevenue`]) – The revenue profile to use, if none is provided it will assume that

one was provided earlier. :return:

days_to_dau(goal, max_days=360)[source]¶

Calculates the number of days until a given dau count have been reached. To not continue into infinity (and beyond) it is ensured that the maximum amount of days is max_days.

Parameters:	goal (`int`) – The amount of DAU that is the goal. max_days – The maximum number of days to look through.
Returns:

days_to_rev(goal, max_days=360)[source]¶

Calculates the number of days until revenue of a single day hav reached goal or above. To not continue into infinity (and beyond) it is ensured that the maximum amount of days is max_days.

Parameters:	goal (`float`) – Daily revenue goal. max_days – The maximum number of days to look trough.
Returns:

days_to_total_rev(goal, max_days=360)[source]¶

Calculates the number of days until the cumulative revenue have reached goal To not continue into infinity (and beyond) it is ensured that the maximum amount of days is max_days.

Parameters:	goal (`float`) – Cumulative revenue goal. max_days – The maximum number of days to look through.
Returns:

plot_dau()[source]¶

Plot daily active users.

Returns:

plot_retention()[source]¶

Plots the retention.

Returns:

plot_revenue()[source]¶

Plot the revenue with uncertainty (left y-axis) and cumulative revenue (right y-axis). The cumulative revenue could also have uncertainty, though it is not obvious how to calculate this. The best bet is probably `error propagation

<https://en.wikipedia.org/wiki/Propagation_of_uncertainty>`_.

Returns:

replicate_cohort(n_days_since_install, post_influx_duration=0)[source]¶

Replicate the cohort over multiple days. The number of dates are concurrent and given in the first parameter. If it is of any interest to see the cohorts after the influx of them have stopped post_influx_duration can be set to some amount of days.

Parameters:	n_days_since_install (`int`) – Number of days a new (equivalent) cohort starts. post_influx_duration – The number of days to wait after the last cohort have been

added. :return:

save(filename)[source]¶

Saves the cohort as a pickle file.

Parameters:	filename (`str`) – Filename for the cohort.
Returns:

Module contents¶

inits cohort