
what am I not getting here?My guess is that you screwed up. The missing option on egen total() should help take care of this condition.Why do i get completely different means from resulting variables (About twice as big using egen vs. Regardingīesides I would like to account for missing values, so if all values of var1 for the company x are missing variable, sum1 for company x and specific interval must contain missing values and not 0. That sum x,d is Stata shorthand for summarize x, detail, which returns a detailed statistical summary of the variable x, including lots more statistics. Or more simply egen _sum = total(var1), by(id eventid) Therefore bysort id eventid: egen _sum = total(var1) Based on your example, the event date and "special date" flag ( event) don't seem to matter in calculating the desired sum. It looks like you are essentially trying to create totals for unique combinations of id and eventid, not id and year.

Naturally, if you didn't want to change event, you could generate event2 = event to use in place of event. Li id date event_id DesiredSum Sum, sepby(event_id) Replace event_id = event_id+event_id if i != 1īysort id event_id : egen Sum = total(var1), missing answers from multiple questions whose responses are correlated with each other to form a single variable that is the sum of scores on each individual item. This data structure makes more sense to me */ This example assumes you have already opened the data file in Stata.
#STATA SUMMATION CODE#
* This section of code changes event so that 1 indicates the start of the clear *īysort id : gen i = _n // to maintain sort order I'm also assuming here that event is used to flag the end of the time interval for which the event occurred (I make this assumption based on your sample data and my comment on the question). If you are not opposed to re-coding event into something a bit easier to work with, the following should suffice. Dear Chris, use egen xrowtotal (x1 x2 x3) for the summation in the row (must provide a variables list) and collapse (sum) x to compute a total of the column. | id date year var1 event Desire~m ddate EVENT Sum | Input id str10 date year var1 event DesiredSumīysort id (ddate): gen EVENT = sum(event)īy id: replace EVENT = EVENT - EVENT + 1īysort id EVENT (ddate): egen Sum = total(var1), missing
#STATA SUMMATION HOW TO#
Sorting on date within panel is just cosmetic once we have a division into spells, but still the right thing to do.įor hints from Statalist on how to provide data examples using dataex (SSC), which apply here too with minor modification, see here clear * Reversing and reversing back are both just negation using. Then reverse spell numbering, reverse time to the normal direction and apply egen as in other answers. Here that is entirely a feature, although not quite what the OP wants when applying egen, total(). The crucial small detail here is that sum() ignores missing values, or more precisely treats them as zero. Given markers 1 for the ends of each spell, we can then cumulate backwards using sum(). That said, one trick of reversing time makes subdivision into spells easy.

Besides I would like to account for missing values, so if all values of var1 for the company x are missing variable, sum1 for company x and specific interval must contain missing values and not 0. The starting date for the second interval is the date of the first event + 1 day. The starting date for the first interval for each company is. The intervals are also not equal for different companies. The intervals between events are not equal hence the first interval can contain 60 observations, the next interval 360 observations. There are 5 to 10 events for each company. I have a special variable called event, which takes value of 1 if the event has occurred on special date and missing otherwise.

In my case however, the time interval is determined as an interval between two events.

If the interval was equal to each year I would use the function total(): bysort id year: egen sum1=total(var1) I want to generate new variable ( sum1) which contains a sum of daily observation for var1 for each company and specific time interval. I work with panel data which contain several companies ( id) and cover the period from to ( date, year).
