Any help would be greatly appreciated <by the way I am new to c++ so I might need a clear explanation>.I need to read in a file that has comma separated data. Specifically,RDU,ILM,AA,1996,2,102JAX,BWI,AE,1997,1,30LAX,JFK,SW,1997,3,79The file actually has over 6 million lines of data but I am testing just for this small example.I need to read in the file and store the variables then run test on those. My problem is I can't read in the data.Here is what I started so far.---
3/28/2006 6:42:45 PM
strtokatoior exec("perl readFile.pl Feed.txt")[Edited on March 28, 2006 at 7:35 PM. Reason : dfsd]
3/28/2006 7:33:39 PM
^thanks. I think I'll try to take the data and read in every line as one string then use strtok.Not to sure about the perl suggestion. Like I said I just started learning c++ two weeks ago. I still think there should be a simpler/shorter way. Any other suggestions? I have MS VS 2005 so if anyone can think of any class etc.. that I have access to that would be useful.
3/28/2006 9:41:03 PM
why don't you just write a perl script
3/28/2006 10:59:48 PM
3/28/2006 11:35:14 PM
^^I don't know perl.^That is what I was trying to do from the get-go. But I couldn't figure out how to read in each line and pick out the variables that are comma delimited. Any suggestions on how to do this? I am expecting this code to take a while. I'll let it run all night if it has too and I have lots of memory. I don't need to do this frequently. Just once for now anyway. Unless I get updated or more complete data.
3/29/2006 12:01:32 AM
well you don't know c++ either so what's the difference.perl:
open(INFILE, "< Feed.txt") or die("Can't open file: $!");my @lines = <INFILE>;close(INFILE);foreach $line (@lines){ chomp($line); @words = split($line, ","); foreach $word (@words){ #PROCESS YOUR WORD HERE }}
3/29/2006 12:13:02 AM
haha nice,I'll take a look at that.-------nvermind I see: 5 min to parse.[Edited on March 29, 2006 at 12:19 AM. Reason : misread][Edited on March 29, 2006 at 12:20 AM. Reason : asd]
3/29/2006 12:13:55 AM
text parsing is perl's forte
3/29/2006 12:15:57 AM
I really need to learn Perl... I'm doing text processing on shitloads of data that's generated from my simulations... ive just been using matlab, but it's kinda slow. (its just what i know )
3/29/2006 12:27:11 AM
damn i hope i didn't just help a terrorist parse out flight schedules...
3/29/2006 12:29:49 AM
yep you figured it out. gg --well except the terrosist part
3/29/2006 12:33:38 AM
what processing are you trying to do - i'm betting you'll need to use a hash of arrays
3/29/2006 12:40:59 AM
Is this on a Windows-based computer? Microsoft's activex data objects database drivers (ADODB) supports comma separated files. (it will do everything for you to make it as searchable as an actual database and it'll probably be faster than any code you write)here's a VB example. i googled and didn't see one in Chttp://www.vb-helper.com/howto_ado_load_csv.htmlYou need to import the adodb dll:
#import "c:\Program Files\Common Files\System\ADO\msado##.dll"
3/29/2006 12:48:09 AM
I can compile VB. I'll take a look at that--thanks yea that's something like what I was looking for. I need something I can just grab off the internet.------------------------I need to compare every line of data to every other line of data and test certain relations.likefor(j=0;j<=sizedata;j++) for(i=0;i<=sizedata;i++) if( year(j)==year(i) && qtr(j)==qtr(i) && j !==i && carr(j)==carr(i) && ( dest(j)==dest(i) || dest(j)=orig(i) || orig(j)=dest(i) || orig(j)=orig(i) )) { sum++;} END COUNT(j)=sum sum=0END[Edited on March 29, 2006 at 12:55 AM. Reason : TOP part][Edited on March 29, 2006 at 12:56 AM. Reason : .][Edited on March 29, 2006 at 12:58 AM. Reason : .]
3/29/2006 12:53:07 AM
what info are you trying to get specifically?
3/29/2006 1:08:55 AM
I forgot to add a " && pass(i) !=0 " in the if()^Kinda hard to explain. But, here goes. It calculates the number of "spokes"(this was what I called 'count' in the above code) in an airport-pair market for a given airline and a given year and quarter. A spoke is like the number of conncetion points to a given airport. So if the year, quarter, and the airline is the same. Then If the dest-origin of one market is connected to either dest-origin in another market then that creates a spoke. But I need to reject the case that the #of passengers in a given market for a given quarter is 0.Kida helps to draw a graph of airports and lines connecting them.[Edited on March 29, 2006 at 1:22 AM. Reason : .][Edited on March 29, 2006 at 1:22 AM. Reason : .]
3/29/2006 1:21:57 AM
alright then, create a hash of arrays. the hash key will be the concatenation of year, airline, and quarter. The array accessed by the key will be the list of source/destination terminals$hash{AA|1996|2} ....> (RDU|ILM, JAX|BWI, LAX|JFK)the routes with zero passengers can be automatically culled as you build your hash structure with a simple if statementI don't fully understand what you're doing yet, though - what do you mean by market?[Edited on March 29, 2006 at 1:29 AM. Reason : s]
3/29/2006 1:28:12 AM
study algorithms
3/29/2006 1:33:34 AM
oh Ok,E.g. the flights from RDU <--> IAD form a market / "airport-pair".
3/29/2006 1:33:57 AM
I think I just figured this out in FORTRAN. It's actually pretty easy. just a simple formatted read statement. I wish that the C++ method would be as easy. There has to be a way to read up to a comma then store everything before that in one var and keep going... Oh well.Thanks for your help everyone. I'd still like to know if someone comes up with a simple method in C++.
3/29/2006 2:06:02 AM
fscanf
3/29/2006 9:02:28 AM
^ eww I know one of my coworkers didn't jsut suggest a potentially very unsafe methodgood thing we're not in the security business anymore
3/29/2006 9:25:18 AM
that wouldn't be unsafe for me, right? does that create some kind of vulnerability like buffer overflow?Actually fscanf looks nice for my application.
3/29/2006 10:09:51 AM
To read in the entire file, like JaegerNCSU suggested do something like this (but for ascii)http://www.cplusplus.com/doc/tutorial/files.html
// reading a complete binary file#include <iostream>#include <fstream>using namespace std;ifstream::pos_type size;char * memblock;int main () { ifstream file ("example.txt", ios::in|ios::binary|ios::ate); if (file.is_open()) { size = file.tellg(); memblock = new char [size]; file.seekg (0, ios::beg); file.read (memblock, size); file.close(); cout << "the complete file content is in memory"; delete[] memblock; } else cout << "Unable to open file"; return 0;}
3/29/2006 8:30:16 PM
^got it! Thanks. In case anyone is interested-------It takes less than 1 min to load the entire data set, then I performed the nested loop on a small sample of 600,000 observations (1/10 total) and wrote out the result in under 30 min. Basically it's able to test 2000 lines of data in 7 sec. That means it is looping 2000 times x 600,000 innerloops in under 30 min. (compared to Matlab testing about 1 line of data per second on the same machine -- though I didn't try mex'ing) I did try to compile matlab script to c using the matlab compiler and that only gave marginal improvement. So I should able to run the entire file in around 4-5 hrs.The Fortran code I tried using was going to take 12 hrs, but I *was* on a different computer. I think it was using a lot of the pagefile not to mention the slower processor. school supplied me with VC++2005 and it's a good thing I have 2GB of system memory. I did have to set the compiler options to allow a commit stack of 300,000,000 Bytes and a reserve size of 400,000,000. And declare everything as short if I could. Probably overkill but I was tired of the damn stack overflow error. VC++ default is only 1MB. I think gcc is around 5MB.[Edited on March 30, 2006 at 1:15 AM. Reason : .][Edited on March 30, 2006 at 1:18 AM. Reason : .][Edited on March 30, 2006 at 1:23 AM. Reason : .]
3/30/2006 1:15:18 AM
why would there be a stack error on non recursive code?
3/30/2006 1:22:25 AM
The arrays were too large.
3/30/2006 1:25:23 AM
umm, dynamic memory allocation? welcome to the 80s
3/30/2006 1:32:28 AM
3/30/2006 1:33:08 AM
^explain please. ^^haha. But, how would dynamic memory allocation have helped?nevermind... duh...[Edited on March 30, 2006 at 1:37 AM. Reason : .][Edited on March 30, 2006 at 1:38 AM. Reason : .]
3/30/2006 1:36:13 AM
^the way you did it, the array is stored on the stack. if you use dynamic memory allocation, it's stored on the heaphttp://c.ittoolbox.com/documents/popular-q-and-a/stack-vs-heap-2112[Edited on March 30, 2006 at 1:37 AM. Reason : I guess i should be happy about job security][Edited on March 30, 2006 at 1:38 AM. Reason : then again, c++ sucks. 50% c++, 50% you]
3/30/2006 1:37:28 AM
" I guess i should be happy about job security"I mean, I guess. If you like to compare yourself with someone who learned FORTRAN 5 years ago and hasn't programed since. Then said person buys a c++ book 2 weeks ago and spent a total of 5 hours or so reading it. This is just some shit I had to learn to do my research. I don't care if it's "perfect" If it gets the fucking job done and I get on with my life.[Edited on March 30, 2006 at 1:42 AM. Reason : .]but this is good to know
3/30/2006 1:41:09 AM
So in the example I gave you about how to read in the whole file at once.
char * memblock;memblock = new char [size];
3/30/2006 1:49:27 AM
yeah, I admit I only used bits and pieces of the stuff you linked to. But I got it working. The thing is, I don't have a lot of time to spend on it-- full time grad student and an RA-- you get the idea. But thanks, I definitely need to look at what your telling me.
3/30/2006 1:56:34 AM
no problem. If you change over to dynamic allocation, I would be willing to bet you would see performance improvements though. but it's up to you.
3/30/2006 2:05:25 AM
i would like for scud to explain why fscanf is particularly insecurebut he won't because he's on vacation oh well...can't you also fin >> var1 >> "," >> var2 >> "," >> var3 >> etc?
3/30/2006 10:11:07 PM