The Lancet



AppendixPrediction delay assessmentPrediction delay of the models was evaluated by adapting a time-shift function [1]. To be more specific, we defined a time-shift parameter ?τm that minimizes the RMSE between the true values y and ym, the estimation of the model m:τm=argminτ1n-τ?i=1+τ?nym, i-fi-τ2, τ∈[0,4]fx is the linear interpolation of discrete vector y, as in the experiments we found that the ?τm of the models could be less than 1 week, the minimum unit resolution of i. 1000 evenly spaced discrete τ values representing time-shift less than 4 weeks are examined. The weeks with influenza activity higher than average+1 standard deviation of overall influenza activity during the corresponding years are considered [2]. ?τm describes a relative delay within the models. To quantify the prediction delay and make the comparison between models, we defined the prediction delay of the models as a combination of ?τm and RMSE?τm:scoredelay=1+?τm×RMSE?τmA smaller score indicates less delay.Model parametersIn our study, the period parameter of SARIMA was set to 52 (1 year = 52 weeks) due to the potential yearly periodicity of the ILI% sequence. By checking the ILI% time series plots of the training data, autocorrelation function (ACF), and partial autocorrelation function (PACF) plots for all possible combinations of 0 or 1 non-seasonal difference and 0 or 1 seasonal difference, we used both a seasonal and a non-seasonal difference for most forecasts. Other parameters were searched in a range from 0 to 2 and determined with an adapted Akaike information criterion (AIC) principle [3]. In order to prevent over-fitting, the parameter combination with AIC value under the first quartile and with minimum number of parameters rather than minimum AIC was selected. As for XGBoost, the hyper-parameters of the model, including learning rate, tree numbers, and maximum tree depth were selected by Bayesian Optimization (Table S7) with the evaluation metrics of the cross-validation error on time series [4]. All hyper-parameters were updated once a year since the addition of one-week data had little impact on the hyper-parameters of the retrained model. Long Short-term Memory (LSTM) is a state-of-the-art tool for long sequence modelling and performs efficiently for sequence analysis problems. In our study, we constructed an LSTM network with an LSTM layer of 64 units and a time step of 2, followed by a dropout layer with the dropout rate of 0·05, a fully connected layer, and a linear activation layer.Statistical significance testAccording to previous studies [5-6], computing prediction intervals (PIs) is important to indicate the likely uncertainty in point forecasts. Since our proposed model is an ensemble model combining a time-series model and a nonlinear model, theoretical PIs are difficult to evaluate. Researchers have developed some alternative computational methods for PI calculation, such as empirically based and resampling (or bootstrapping) methods, which do not rely on exact knowledge of the model [5]. We adopted the bootstrap strategy specialized for time series proposed by Lorenzo Pascual [7] to obtain prediction intervals in this study. The strategy is to construct a set of bootstrap replicates of the series y1, *…,yT * and forecast the future value yT+K *through the model with bootstrap parameters. The prediction limits are defined as the quantiles of the bootstrap distribution function of yT+K *. Details of the bootstrap process can be referred in [7]. Since one-step-ahead rolling-origin-recalibration evaluation was adopted in our study to predict weekly ILI%, the K is 1 here and 5000 bootstrap replicates of the series were obtained and fed to SAAIM to get the bootstrap distribution function of yT+1 *. All points of each bootstrap replicate of the series were fed to SARIMA and the last 104 points were fed to XGBoost. To derive the 95% prediction interval for yT+1 *, the lower and upper bounds of the prediction interval were set to the 2.5% and 97.5% percentage points of bootstrap estimates respectively [7].{L,U} = {Q2.5%, Q97.5%}We tested on the dataset from 2017 to 2018 to see whether the prediction interval covered the true expected value with the prescribed probability [6]. Fig. S4 gives the 95% prediction intervals of every week forecast from 2017 to 2018. It could be observed that 96.2% of the true ILI% data points are enclosed within the prediction intervals. This value is very close to the desired value of 95%. In addition, we compared the average margin of the 95% prediction intervals of SAAIM on the test dataset with all other reference models. As SAAIM shows an at least 15% error reduction over other models, the average margin of the prediction intervals provided in Table S4 confirms the statistical significance of these results, with a minimum margin reduction of 15.57% ((0.533-0.45)/0.533). ReferencesConway AJ, Macpherson KP, Brown JC. Delayed time series predictions with neural networks. Neurocomputing. 1998 Jan 1;18(1-3):81-9. KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociological methods & research. 2004 Nov;33(2):261-304. Bergmeir C, Benítez JM. On the use of cross-validation for time series predictor evaluation. Information Sciences. 2012 May 15;191:192-213.Chatfield C. Prediction intervals for time-series forecasting. In Principles of forecasting 2001 (pp. 475-494). Springer, Boston, MA.Shrestha DL, Solomatine DP. Machine learning approaches for estimation of prediction interval for the model output. Neural Networks. 2006 Mar 1;19(2):225-35.Pascual L, Romo J, Ruiz E. Bootstrap predictive inference for ARIMA processes. Journal of Time Series Analysis. 2004 Jul;25(4):449-65.(A)(B)(C)Fig. S1. RMSE of SAAIM and reference models with different time-shift. (A) RMSE on data of period from 2014 to 2018. (B) RMSE on data of period from 2014 to 2016. (C) RMSE on data of period from 2017 to 2018.(A)(B)(C)Fig. S2. RMSE of SAAIM trained with different feature groups with different time-shift. (A) RMSE on data of period from 2014 to 2018. (B) RMSE on data of period from 2014 to 2016. (C) RMSE on data of period from 2017 to 2018.Fig. S3. Dynamic feature importance of top 50 important features selected by XGBoost. Heat map represents the times that features were used to split the data across all trees during the process of XGBoost model training on a weekly basis, from the first week of 2014 to the last week of 2016. The x axis represents the prediction week, shown as year-week. The y axis represents the top 50 important features, selected based on the average times. Missing values can be the results of the features which were not chosen by XGBoost model at some prediction weeks. Symbols ending with “_bd” are extracted from Baidu Index, while ending with “_wb” are extracted from Weibo statistics. Fig. S4. Calculated 95% prediction intervals of SAAIM for every week forecast from 2017 to 2018. Table S1. Complete keywords of Internet-based public sentiment features, extracted weather features, ILI features (i.e. historical ILI data and statistics) and Time features.IndexCategorySymbol Description (Chinese)1Public Sentiment(Baidu Index)ganmaochishenmewhat.to.take.when.having.a.cold (感冒吃什么)2Public Sentiment(Baidu Index)kesouchishenmewhat.to.take.when.coughing (咳嗽吃什么)3Public Sentiment(Baidu Index)fashaozenmetuishaohow.to.break.a.fever (发烧怎么退烧)4Public Sentiment(Baidu Index, Weibo)shengbingget.sick (生病)5Public Sentiment(Baidu Index, Weibo)liuganinfluenza (流感)6Public Sentiment(Baidu Index, Weibo)shanghuxidaoupper.respiratory.tract (上呼吸道)7Public Sentiment(Baidu Index, Weibo)chuanrancontagious (传染)8Public Sentiment(Baidu Index, Weibo)tiwenbody.temperature (体温)9Public Sentiment(Baidu Index)liuganzhengzhuanginfluenza.symptom (流感的症状)10Public Sentiment(Baidu Index, Weibo)kouzhaoface.mask (口罩)11Public Sentiment(Baidu Index, Weibo)huxidaorespiratory.tract (呼吸道)12Public Sentiment(Baidu Index, Weibo)bingfazhengcomplication (并发症)13Public Sentiment(Baidu Index, Weibo)biantaotitonsil (扁桃体)14Public Sentiment(Baidu Index, Weibo)dikangliresistance (抵抗力)15Public Sentiment(Baidu Index, Weibo)liuganyufangprevent.influenza (流感预防)16Public Sentiment(Baidu Index, Weibo)yiqingepidemic (疫情)17Public Sentiment(Baidu Index, Weibo)yimiaovaccine (疫苗)18Public Sentiment(Baidu Index, Weibo)fangkongprevention.and.control (防控)19Public Sentiment(Baidu Index, Weibo)geliquarantine (隔离)20Public Sentiment(Baidu Index, Weibo)kangyuanantigen (抗原)21Public Sentiment(Baidu Index, Weibo)bisaistuffy.nose (鼻塞)22Public Sentiment(Baidu Index, Weibo)bitirunny.nose (鼻涕)23Public Sentiment(Baidu Index, Weibo)gaoshaohigh.fever (高烧)24Public Sentiment(Baidu Index, Weibo)sangzitengsore.throat (嗓子疼)25Public Sentiment(Baidu Index, Weibo)sangzitongthroat.pain (嗓子痛)26Public Sentiment(Baidu Index, Weibo)yantongpainful.swallowing (咽痛)27Public Sentiment(Baidu Index, Weibo)dapentisneeze (打喷嚏)28Public Sentiment(Baidu Index, Weibo)faliweak (乏力)29Public Sentiment(Baidu Index, Weibo)dishaolow.fever (低烧)30Public Sentiment(Baidu Index, Weibo)quanshensuantongbody.aches (全身酸痛)31Public Sentiment(Baidu Index, Weibo)outuvomit (呕吐)32Public Sentiment(Baidu Index, Weibo)huxikunnandifficulty.breathing (呼吸困难)33Public Sentiment(Baidu Index, Weibo)kesoucough (咳嗽)34Public Sentiment(Baidu Index, Weibo)shishuisleepy (嗜睡)35Public Sentiment(Baidu Index, Weibo)sizhiwulilimb.weakness (四肢无力)36Public Sentiment(Baidu Index, Weibo)touyundizzy (头晕)37Public Sentiment(Baidu Index, Weibo)toutongheadache (头痛)38Public Sentiment(Baidu Index, Weibo)gankedry.cough (干咳)39Public Sentiment(Baidu Index, Weibo)palengfear.of.cold (怕冷)40Public Sentiment(Baidu Index, Weibo)exinnausea (恶心)41Public Sentiment(Baidu Index, Weibo)tanduophlegm (痰多)42Public Sentiment(Baidu Index, Weibo)jirousuantongmuscle.ache (肌肉酸痛)43Public Sentiment(Baidu Index, Weibo)fuxiediarrhea (腹泻)44Public Sentiment(Baidu Index, Weibo)tuishaobreak.a.fever (退烧)45Public Sentiment(Baidu Index, Weibo)shiyubuzhenloss.of.appetite (食欲不振)46Public Sentiment(Baidu Index, Weibo)shiyujiantuiappetite.loss (食欲减退)47Public Sentiment(Baidu Index, Weibo)ganmaoyaocold.medicine (感冒药)48Public Sentiment(Baidu Index, Weibo)gankanggankang (感康)49Public Sentiment(Baidu Index, Weibo)tuishaoyaoantipyretic (退烧药)50Public Sentiment(Baidu Index, Weibo)ganmaoqingrekelicold.heat.granules (感冒清热颗粒)51Public Sentiment(Baidu Index, Weibo)999ganmaoling999.ganmaoling (999感冒灵)52Public Sentiment(Baidu Index, Weibo)kangtaikeContac (康泰克)53Public Sentiment(Baidu Index, Weibo)baijiaheibaijiahei (白加黑)54Public Sentiment(Baidu Index)chaihuchaihu (柴胡)55Public Sentiment(Baidu Index)banlangenkelibanlangen.granules (板蓝根颗粒)56Public Sentiment(Baidu Index)shuanghuangliankoufuyeshuanghuanglian.oral.liquid (双黄连口服液)57Public Sentiment(Baidu Index)qingkailingkeliqingkailing.granules (清开灵颗粒)58Public Sentiment(Baidu Index)kangbingdukoufuyeantiviral.oral.liquid (抗病毒口服液)59Public Sentiment(Baidu Index)jiangtangshuiginger.syrup (姜糖水)60Public Sentiment(Baidu Index, Weibo)kuaikeQuick (快克)61Public Sentiment(Baidu Index, Weibo)kangshengsuantibiotic (抗生素)62Public Sentiment(Baidu Index)kangjunsuantibacterial (抗菌素)63Public Sentiment(Baidu Index, Weibo)tainuoTylenol (泰诺)64Public Sentiment(Baidu Index)weicyinqiaopianvc.yinqiao.tablet (维C银翘片)65Public Sentiment(Baidu Index)weishengsuvitamin (维生素)66Public Sentiment(Baidu Index)juhuachachrysanthemum.tea (菊花茶)67Public Sentiment(Baidu Index)congbaishuigreen.onion.root.water (葱白水)68Public Sentiment(Baidu Index)huoxiangzhengqishuihuoxiangzhengqi.oral.liquid (藿香正气水)69Public Sentiment(Baidu Index)jinyinhuahoneysuckle (金银花)70Public Sentiment(Baidu Index)yuxingcaoyuxingcao (鱼腥草)71Public Sentiment(Baidu Index, Weibo)ganraninfection (感染)72Public Sentiment(Baidu Index, Weibo)dazhenshot (打针)73Public Sentiment(Baidu Index, Weibo)zhikecough.relief (止咳)74Public Sentiment(Baidu Index, Weibo)zhusheinjection (注射)75Public Sentiment(Baidu Index, Weibo)yanzhenginflammation (炎症)76Public Sentiment(Baidu Index, Weibo)bingduganranviral.infection (病毒感染)77Public Sentiment(Baidu Index, Weibo)shuyeintravenous.drip (输液)78Public Sentiment(Baidu Index, Weibo)biyanrhinitis (鼻炎)79Public Sentiment(Baidu Index, Weibo)fenghanganmaowind.cold (风寒感冒)80Public Sentiment(Baidu Index, Weibo)fengreganmaowind.heat (风热感冒)81Public Sentiment(Baidu Index, Weibo)xiaoerganmaocold.in.children (小儿感冒)82Weathertemp_mean Average temperature of the previous week83Weatherfengli_mean Average wind force of the previous week84Weatherhumidity_mean Average humidity of the previous week85Weathertemp_max Maximum temperature of the previous week86Weatherfengli_max Maximum wind force of the previous week87Weatherhumidity_max Maximum humidity of the previous week88Weathertemp_min Minimum temperature of the previous week89Weatherfengli_min Minimum wind force of the previous week90Weatherhumidity_min Minimum humidity of the previous week91Weathertemp_diffrange Temperature difference of the previous week92Weatherfengli_diffrange Wind force difference of the previous week93Weatherhumidity_diffrangeHumidity difference of the previous week94Weathersunny_sum Sunny day count of the previous week95Weatherrain_sum Rainy day count of the previous week96Weatherovercast_sum Overcast count of the previous week97Weathercloudy_sum Cloudy day count of the previous week98Weatherfog_sum Foggy day count of the previous week99Weathersnow_sum Snowy day count of the previous week100Weatherrainy_sum Rainfall totals of the previous week101Weathercloudnum_sumCloud amount totals of the previous week102Weatherdailydiff_max Maximum daily temperature difference of the previous week103Weatherlowest_min Maximum difference between lowest temperatures of two neighboring days of the previous week 104Weathertemp_timemean Average temperature of last 3 weeks105Weathertemp_timevar Variance of temperature of last 3 weeks106Weatherhumi_timemean Average humidity of last 3 weeks107Weatherhumi_timevar Variance of humidity of last 3 weeks108Weatherqiya_mean Average atmospheric pressure of the previous week109Weatherqiya_max Maximum atmospheric pressure of the previous week110Weatherqiya_min Minimum atmospheric pressure of the previous week111Weatherqiya_diffrange Atmospheric pressure difference of the previous week112Weatherwinter Whether the weekend of the prediction week is in winter113Weatherautumn Whether the weekend of the prediction week is in autumn114Weathersummer Whether the weekend of the prediction week is in summer115Weatherspring Whether the weekend of the prediction week is in spring116Weathersunnyr_sum Sunny day count of the prediction week in weather forecast117Weatherrainyr_sum Rainy day count of the prediction week in weather forecast118Weatherovercastr_sum Overcast count of the prediction week in weather forecast119Weathercloudyr_sum Cloudy day count of the prediction week in weather forecast120Weatherfogr_sum Foggy day count of the prediction week in weather forecast121Weathersnowr_sum Snowy day count of the prediction week in weather forecast122Weathertempr_mean Average temperature of the prediction week in weather forecast123Weathertempmax_max Maximum temperature of the prediction week in weather forecast124Weathertempmin_min Minimum temperature of the prediction week in weather forecast125Weathertempdiff_max Maximum daily temperature difference of the prediction week in weather forecast126Weatherlowestr_min Maximum difference between lowest temperatures of two neighboring days of the prediction week in weather forecast127Weathertempr_diffrange Maximum temperature difference of the prediction week in weather forecast128Weathertemp2_diff Average temperature difference between the prediction week and the previous week in weather forecast129Weathertemp2_mindiff Difference between lowest temperatures of the prediction week and the previous week in weather forecast130Weathertemp2_maxdiff Difference between highest temperatures of the prediction week and the previous week in weather forecast131ILIsumPerct1WAgoILI of the previous week132ILIsumPerct2WAgoILI two weeks ago133ILIsumPerct3WAgoILI three weeks ago134ILImeanPastWAverage ILI of last 3 weeks 135ILIsdPastWStandard deviation of ILI of last 3 weeks136ILIILI_1yearagoILI of the same week last year137ILIILI_2yearagoILI of the same week two years ago138ILIILI_3yearagoILI of the same week 3 years ago139ILImeanPastYAverage ILI of the same weeks in last 3 years140ILIsdPastYStandard deviation of ILI of the same weeks in last 3 years141TimeyearNoThe year of prediction week142TimemonthNoThe month of prediction week143TimeweekNoThe week of prediction weekTable S2. Mean absolute percentage error (MAPE) of different models on the dataset from 2014 to 2016 using different training strategies.LASSOXGBoostLSTMSARIMAtwo-year rolling window0.1710.1270.1580.136fixed-origin expanding window0.1920.1550.1440.133Table S3. Prediction delay measurement of SAAIM compared to reference models. 2014-20182014-20162017-2018Time-shift(week)SAAIM0.8930.8331.149LASSO(Baidu_index)1.2610.8171.257LASSO(ILI+Baidu_index)2.3062.3062.330LSTM1.6221.6621.401Delay scoreSAAIM0.1220.1360.052LASSO(Baidu_index)0.2390.2300.091LASSO(ILI+Baidu_index)0.6660.7120.587LSTM0.1950.2160.135Table S4. Prediction delay measurement of SAAIM using different feature groups 2014-20182014-20162017-2018Time-shift(week)SAAIM0.8930.8331.149SAAIM_no_weather1.3171.3810.913SAAIM_no_sentiment1.4651.4931.301SAAIM_no_ILI1.5901.7501.001Prediction delay scoreSAAIM0.1220.1360.052SAAIM_no_weather0.3290.2800.324SAAIM_no_sentiment0.1530.1770.091SAAIM_no_ILI0.8140.7010.766Table S5. The average margin of the 95% prediction intervals of SAAIM compared with reference models from 2017 to 2018.SAAIMLASSO(ILI+Baidu_index)LASSO(Baidu_index)LSTMAverage Margin0.4500.5720.9580.533Table S6. Comparison on accurate metrics between SAAIM and a simple integration model 2014-20182014-20162017-2018RMSE SAAIM0.1750.1830.161 Simple model0.1790.1860.169MAPE SAAIM0.1100.1190.097 Simple model0.1170.1210.110MAE SAAIM0.1170.1280.101 Simple model 0.1240.1310.112Correlation SAAIM0.8920.9050.845 Simple model0.8870.9020.821Table S7. The ranges of partial parameters of XGBoost for Bayesion Optimization. ParameterRangeMinimumMaximummax_depth1035learning_rate0.050.5n_estimators190220gamma 0.10.5min_child_weight0.51subsample0.80.85colsample_bytree0.51 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download