# Nomograms in Stata

If you find these programs and/or the information included in this webpage useful for your research, please consider including the following citation:
nomolog is a user-written program for generating nomograms for binary logistic regression predictive models. Sometimes, these are called Kattan-style nomograms.
nomocox is a variation of nomolog. It generates nomograms for predictive Cox regression models.
Stata 12, Stata 13 and Stata 14. Older versions are not supported. All editions are supported (Small Stata / IC / SE / MP). Some issues may arise with Small Stata, in models with a large number of variables. Neither nomolog or nomocox are stand-alone programs. Both require Stata.
Alexander Zlotnik (Technical University of Madrid & Ramón y Cajal University Hospital) and Dr. Víctor Abraira (IRICYS & Clinical Biostatistics Unit, Ramón y Cajal University Hospital).
Yes.
1) Have a look at this webpage. 2) Have a look at the official documentation for these commands (install the programs and then execute 3) If this does not solve your question, contact me.
- Nomograms allow calculating output probabilities for predictive models with a visual approach. This is useful when presenting the results of your predictive models in a convenient way in a printed format. Nomograms are better than most alternative approaches, such as providing the full regression formula or a table with all regression coefficients. Another possibility is to provide an on-line calculator, but this requires some programming and usually also "hides" the underlying model, while with a nomogram, the process is fully transparent. - With nomograms, variable importance is clear at-a-glance. The longer the line corresponding to a given variable, the more important a variable is. Therefore, nomograms can also be used in descriptive or exploratory data analysis. > This tutorial is mostly self-explanatory, however you can review the following concepts: - How to use factor (categorical) variable syntax - How to use interaction variables [optional, only if you plan to use nomograms for models with interactions]
If you wish to use an older version of nomolog, which appeared in the stata Journal, you should use these commands:
There is an updated version of nomolog with several modifications, which is slightly different from the Stata Journal version. 1) Open Stata 2) Make sure that you have an active Internet connection. If you are behind a proxy (f.ex. if you are connecting from your workplace or your research center), you should follow these instructions. 3) Execute
Follow the same steps 1) and 2) as with the nomolog and 3) Execute
Using these commands:
Nomograms are graphical calculators. Perhaps one of the best formal definitions we have seen is this one:
Nomograms are widely used in several engineering disciplines. The Mizuhashi-Volpert-Smith chart, used in Electrical Engineering, is an example of a nomogram. Many non-linear functions can be represented as nomograms. Such is also the case with logistic and Cox regressions.
Although several nomograms have been used in biomedical research, Kattan nomograms are perhaps the most popular ones. One of the researchers who made several high-profile publications with logistic and Cox regression nomograms is Michael Kattan, PhD. You can generate Kattan-style nomograms with nomolog and nomocox. You can find more details about the specific nomograms that Kattan developed here. You can see an example of a Kattan nomogram here.
Example of a logistic regression nomogram:
Step 1) Establish scores for all variable values. Step 2) Obtain the Total score adding up all the scores obtained in the previous step. Step 3) Obtain the probability of event (Total Score -> Probability of event). Example: What is the probability of union affiliation for a worker who has a tenure of 7.5 years, works 30 hours per week, is 50 years old, is a college graduate and white? Step 1) Establish scores for all variable values.
Step 2) Obtain the Total score adding up all the scores obtained in the previous step.
Step 3) Obtain the probability of event (Total Score -> Probability of event).
- A nomogram is a representation of a predictive model, not a validation tool. If the underlying model is not adequately calibrated, the output probabilities which you may obtain with a nomogram will be generally meaningless. You must ensure that the calibration (goodness-of-fit) of the model is adequate for your purposes on at least one representative validation dataset before generating a nomogram. Ideally, calibration should be tested on several representative validation datasets. One popular method for estimating calibration is the Hosmer-Lemeshow test. You may also use the visual inspection of the Homser-Lemeshow "deciles of risk" and/or calibration graphs and/or another methods. -nomolog- / -nomocox- could examine confidence intervals on the derivation dataset and complain if these were too wide, but how wide is too wide? They could also estimate calibration on the derivation dataset, but this, again, is subjective. Maybe the purpose of a particular nomogram is to produce a rough prediction (for example just three categories: likely / average / not likely) or maybe it is used for discrimination purposes only (as we stated earlier, it is possible to set a cut-off point and decide "yes/no"). This is why we believe that model validation is best left to the end-user.- Try not to generate nomograms for models with convergence problems. If you try to build a nomogram with a model where, say, some coefficients are omitted to achieve convergence, the nomogram will be generated with a warning message, however, it's best to remove the offending variable(s) and re-generate the nomogram. Although nomolog and nomocox have undergone some testing, missing coefficients may result in a representation which is not coherent with the underlying model. - Nomograms should only be used for approximate calculations. If high precision results are required, you should use an alternative approach. It must be said that this is not very common in biomedical research, mainly, because the underlying models themselves are usually not that precise.
> Why are variables limited to 10 labels and why is there pagination if more than 10 variables are used? This has to do with a hard-coded limitation in some graphics generation code within Stata core libraries, which cannot be modified. If too many labels are used, Stata yields an error and stops the execution. This limitation may or may not disappear in future Stata versions. > 1) Open Stata 2) Load a dataset Example:
3) Execute a logistic regression Example:
4) Execute nomolog without any options
This will generate the nomogram with the default options. Coefficients will be forced positive to facilitate calculations. Or, alternatively, execute nomolog with execution options which allow nomogram customization. You may use the command line or just use the dialog box.
You should see the following dialog box:
Steps 1) and 2) are identical to those of nomolog. 3) Set a panel variable and execute a Cox regression Example:
4) Open the dialog box for nomocox
You should see the following dialog box: >
Step 1) Get the scores for all variables values Step 2) Add the scores = Total score Step 3) Calculated the probability of survival for a given number of time units given the Total score
Both nomolog and nomocox support interactions. An interaction is a product of variables, i.e. interaction_variable = var_A * var_B 1) Categorical x Categorical 2) Categorical x Continuous 3) Continuous x Continuous This last case is slightly more complicated since cutpoints need to be specified on the nomogram in order to make it linear. In this particular case we are specifying cutpoints 10, 20 and 30 for var1. If we did a product of continuous variables, the resulting score calculation would not be linear > Again, we'll borrow a dataset from this webpage: http://www.ats.ucla.edu/stat/stata/seminars/stata_survival/ (University of California, Los Angeles - UCLA)
This generates the command:
The resulting nomogram is: We see that the variable If treat=1 and site=0 treat=1 and site=1 If treat = 0, we choose the second line and obtain a score depending on the value of site:This line gives us the scores for: treat=0 and site=0 treat=0 and site=1 Notice that, in this particular case, there is a negative score if treat = 0 and site = 1 (marked with [-] and in red color)We would then calculate the score corresponding to treat: By default both nomolog and nomocox simplify interactions. For example, in this case, we have removed the line corresponding to the effect of However, in some cases, we may want to display all variables independently on the nomogram.
The resulting nomogram is the following: Notice that in this case the score fortreat=0#site is almost zero.Both nomograms (the previous one and this one) are equivalent. However, on the second one more variable has to be taken into account to make to obtain the outcome probability. > 1) Open the dialog box.
2) Change the value in the corresponding box (for example 2.0): 3) Press the "OK" button. 4) This will result in the following command being executed:
> As you most likely know, Stata allows you to associate variable labels to variables. Some commands can make use of either. Example of label definition command:
1) Open the dialog box.
2) Mark the check box: 3) Press the "Submit" button. 4) This will result in the following command being executed:
>
2) Change the value in the corresponding box (for example 1.8): 3) Press the "OK" button. 4) This will result in the following command being executed:
> In the example above, is it legitimate to extend the variable range from 20 to 60 years if your dataset actually ranges from 34 to 46 years? It depends.
2) Go to the "Variable ranges and decimals" tab. Introduce the variable(s), the new minimum(s), maximum(s) and division size(s): 3) Press the "OK" button. 4) This will result in the following command being executed:
1) Open the dialog box.
2) Go to the "Prob. values" tab and insert the probability values you want to display: Notice that here you may also change the probability label size too. 4) This will result in the following command being executed:
The resulting nomogram will look like this: Notice that, in this case, a warning has been added to the graph indicating that the range (min, max values to display) of variable age has been changed, i.e. the nomogram is actually showing an out-of-sample prediction (age values that were not in the derivation dataset). You may remove this warning using Stata's Graph Editor. > Yes! > Yes. Mark the option "Display table with variable divisions and corresponding scores" on the dialog box or use the option "divtable" if you are using the command line. > nomolog and nomocox generate standard Stata graphs, which can be edited with the Stata's Graph Editor. > We would like to thank all Stata Corp technical support personnel we had the opportunity to work with, especially Statistician Joy Wang, and all the members of the Clinical Biostatistics Unit of the Ramon y Cajal University Hospital who have participated in the testing of this program.
- Zlotnik A, Abraira V. Stata logistic regression nomogram generator. 2013 Spanish Stata Users Group Meeting. Madrid, Spain. - Fu AZ, Cantor SB, Kattan MW: Use of nomograms for personalized decision-analytic recommendations. Medical Decision Making 2010, 30(2):267-274. - Kattan MW, Wheeler TM, Scardino PT: Postoperative nomogram for disease recurrence after radical prostatectomy for prostate cancer. Journal of Clinical Oncology 1999, 17(5):1499-1499. - Kattan MW, Eastham JA, Stapleton AM, Wheeler TM, Scardino PT: A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. Journal of the National Cancer Institute 1998, 90(10):766-771. |