I'm trying to do some analysis on some data and am having some trouble trying to understand how Excel calculates data. Every time I make a change, Excel recalculates my entire workbook, which takes about 5 minutes on my dual core machine. I'm using Excel 2007.The dataset is fairly large, about 100,000 rows and 13 columns. This data is fixed and doesn't change. Each row is identified by a date and time (time interval is 5 minutes) on column A, and I need to do a lot of conditional array equations to extract the data I need from columns B-M. I'm using array equations to keep me from having to create a lot of dummy data and rows, and generally helps keep the whole process automated and lets me change things without having to do hours of work.For example, if I want to get the average of column B (line voltage) for each day, I would do something simple like:
10/25/2008 {=average( if( (month(Data!$A$2:$A$100000)=10) * (day(Data!$A$2:$A$100000)=25) , $B$2:$B$100000) )}10/26/2008 {=average( if( (month(Data!$A$2:$A$100000)=10) * (day(Data!$A$2:$A$100000)=26) , $B$2:$B$100000) )}10/27/2008 {=average( if( (month(Data!$A$2:$A$100000)=10) * (day(Data!$A$2:$A$100000)=27) , $B$2:$B$100000) )}
1/28/2009 9:13:47 AM
you can go to Tools > Options > Calculation and change it from "automatic" to "manual". Then it only re-evaluates all the formulas when you press F9 or save the file.
1/28/2009 9:53:36 AM
1/28/2009 9:55:09 AM
oh, i didn't read the whole thing. in that case, you're stuck. If you want to get immediate feedback, then Excel must reevaluate all the data. If Excel must reevaluate all the data, then you must wait for it. It does suck, or it is curious, that Excel recalculates everything with almost every keystroke, but I've never found a way around that.
1/28/2009 10:03:21 AM
^^i guess im confused, if he is making a change to the worksheet that has no dependencies on other cells I am assuming its very simple input if not just text or bolding or italicizing and that is causing the problem, then why would he not want to turn off autocalculation and then just press F9 everytime he needs it like agentlion suggested?sure its not that convenient but keep pressing it but it seems like itd still be easier than having it on automatic.[Edited on January 28, 2009 at 10:07 AM. Reason : ]
1/28/2009 10:06:47 AM
I think he's wanting a way to only do the calculations on the cell or row he's actually changed instead of recalculating 100,000 rows of stuff that hasn't changed.edit: it'd be a hell of a lot faster if it just updated the changes automatically then he could use F9 if he wanted to refresh the whole sheet just to be sure..[Edited on January 28, 2009 at 10:24 AM. Reason : asf]
1/28/2009 10:24:02 AM
def
1/28/2009 11:48:34 AM
I think that this is a case where Excel isn't the optimal tool to use.
1/28/2009 1:19:12 PM
I think he knows that I would think you could write a macro to run every time you change cells and make it update whatever math is in the previous cell you were in. That + turning off automatic updates would do it. Hopefully that made sense..Just an idea--no idea if it's feasible.
1/28/2009 1:22:02 PM
also, if you have calculations set to Manual, you can update single cells/formulas by activating those specific cells by double-clicking then Enter, or F2 then Enter. That will update one cell only. I have previously worked on workbooks that are about 20MB with ~20 worksheets and thousands of formulas, and I had to do the Manual Calculation option and just use F9 ever once ina while, because otherwise it was just unusable.^ i found that running Macros that forced calculations switches the calculation mode from Manual to Automatic, unfortunately, so every time you run a macro you ahve to go back into Options and change it back. Maybe it's possible, though, to set Manual mode inside the macro before it exits. [Edited on January 28, 2009 at 1:26 PM. Reason : .]
1/28/2009 1:24:16 PM
1/28/2009 1:24:22 PM
It's about time you add a new skill set to your arsenal. Excel wasn't cut out for that kind of work.
1/28/2009 2:03:07 PM
True. But like I said Excel is the only resource I have to use for this project. Matlab would be great.I haven't explored the possibility of putting all the data into an Access database and having Excel execute the query. I'm assuming it's possible. And I'm sure it would be a lot quicker.
1/28/2009 2:12:30 PM
Where do you work that you can't request a serious statistical package like JMP or SAS when you're working with that much data?There are probably quite a few ways to optimize what you have if this is something you're going to be dealing with on an ongoing basis. For instance, you can add extra columns to do the "if" part of your average statement so when you add data or a cell changes the average part of the equation doesn't have to evaluated everything over again.
1/28/2009 2:39:32 PM
1/28/2009 2:46:48 PM
it is possible to have excel reference itself when executing DB queries, I have limited experience with it, however I think that may work pretty well, don't know how fast it is.
1/28/2009 8:19:50 PM
1/28/2009 8:58:48 PM
The problem is you're going about this all wrong.First of all, it would help if you knew how date codes work.1 = January 1 1900 0:00 hours.1.5 = January 1, 1900 12:00 hours.In fact every date and time is stored in Excel as "the number of days since January 0 1900".10/25/2008 is 39746. Thereby, to test if the day of a given date is 10/25/2008, you use "Int(Cell)=39746". Or if you don't like that, you can use Date(2008,10,25), which returns 39746.I created a random dataset covering every 5 minutes for a year (just over 100,000 rows). The problem with your formula is that it scans the length of the dataset 4 times per evaluation. Once to evaluate Month(), once to evaluate Day(), once for the result of IF() and once (through the pared dataset) to calculate the average. Well if you've got one of these formulae for every day of the year, then yeah it's going to take an eternity to calculate.Replacing your formula with:{=AVERAGE( IF( INT(DateTimeRange)=CellWithDateOfInterest, RangeWithResultsToAverage) )}Speeds up the calculation by probably a factor of 2 or more. But it's still scanning the dataset for every single date I want an average for.But guess what! The results you want (sum, count, average, max, min, product, stdev, variance) can be calculated with a SINGLE pass through the data. And guess what else! Excel has a built in tool to do just that. It's called a PivotTable. Using a PivotTable to calculate the average value in B2 for each individual day in 10 years (over 1,000,000 rows using 5 minute interval data) takes a mere fraction of a second.Your "thousands" of "complex" formulae are the problem. They are calculated individually. If you have to use your method {function ( if ( condition , result ) )}, then "condition" should be replaced with a user function that performs all the necessary comparisons in a single pass through the data. But even then you'd be passing through the data many thousands of times. It may get 10 times as fast, but it will still be slow.In the future, remember excel is not a database. Even though there happens to be an OLEDB driver for it that allows you to query a spreadsheet like a database table.
1/29/2009 9:11:51 AM