I'm writing an application that will perform some action on rows in a DB table. I want to potentially run multiple instances of this application, what is the best way to keep the rows segregated so no two instances are acting on the same row?The best I can think of is to just have an 'in use' field in the table and toggle it when my application wants to manipulate that row. that seems bleh, im just wondering if there are better ideas out there? if it matters its .net and T-SQL
11/7/2012 1:44:41 PM
read up on pessimistic concurrency. you can either let sql server handle it, or you can implement like you described, though rather than a bool, a lot of times it's done with a timestamp. i assume you aren't trying to prevent reads during a record's use, and that the actions you need to lock it during are short lived updates?[Edited on November 7, 2012 at 1:59 PM. Reason : .]
11/7/2012 1:51:41 PM
The best way to handle this is through transactions http://www.sqlteam.com/article/introduction-to-transactionsIt handles most of the semaphore locking scenarios you can come up with, and has the side benefit of making your database history traceable (for diagnostics, logging and security compliance).
11/7/2012 4:01:39 PM
^ this. if you're manipulating rows in a single table, that will do everything you want and more.
11/7/2012 6:34:12 PM
if you know the row you want ahead of time and you're selecting based on a key, then it should just give you a row lock and you'll be good.if you dont know the row you want and you're selecting based on some criteria (ex: selecting the oldest untouched item in a queue), then a readpast lock would work. either way wrap all your steps in a transaction.
11/7/2012 7:14:29 PM
^ I'm doing the second.I'm not clear on how transactions would work with this - I want to grab and lock rows, then perform some action with the data from the rows in c#, then update the rows that were locked. It doesn't seem like single transactions would allow the middle step (data manipulation in c#). I ended up creating a sessionID table with a list of SessionIDs, a bool 'InUse' and a timestampThen I have the SessionID field in the table I'm actually manipulating - Every new instance of my application gets assigned a sessionID that isn't in use and when the instance grabs data from the table it populates the sessionID field from that table. it obviously wont grab a row if some other session has it in use. All the selects/updates are done in transactions so there shouldn't be concurrency issues when the rows are selected and sessionID's updated.Does that seem like a reasonable approach?
11/7/2012 11:50:29 PM
how big is this application scaling to?
11/8/2012 8:19:13 AM
potentially up to 25 instances o f it would be running
11/8/2012 8:42:15 AM
I mean reads/writes per second and a ratio of reads to writes. It might make sense to break some of this out into something like a nosql db for session state and other data that's write once - read many. Also you might want to have a few read mirrors that can be promoted to master db's should the db go down. Just not sure your performance and fault-tolerance/availability needs. At a small enough scale/risk level, a lot of what I'm saying is over engineering.
11/8/2012 8:58:41 AM
11/8/2012 1:42:46 PM
If you have a set of static steps that are very quick, then wrapping the entire thing in a transaction is the way to go. IDK how c# does it but in java you just tell it "hey wrap all this in a transaction" and whatever db tool you're using will put all the db steps in one long transaction.ex:@Transactionalvoid whatever(){Person p = db.GetNextPerson()doSomeThingTo(p);db.UpdatePerson(p);}the resulting sql will have a lock on the person and then wait until the code is finished, do the update, and then commit the transaction (if all is successful). this is good if its gonna be quick.if you're doing something more complex and need to keep the row "locked" for a while you can use a status column to keep other processes out while you work on it. this can be problematic because you'd need something to monitor the queue for rows that have been marked as inuse for a long time (longer than normal) in case that long running process fails.
11/8/2012 2:47:06 PM
really, the key in all of this is how long you need to lock a row. If you just need to read some random row and perform an operation on it without user intervention, then the transaction levels in SQL Server will do this perfectly for you. You create your SQL connection, start a transaction on it, then read from the table, do your C# manipulations, save the row back to the database on the original SQL connection, and then commit the transaction.when your code reads from the table, because it is in a transaction with the proper isolation level, SQL Server will automagically lock the row that you read. Then, when you commit the transaction, the rows are automagically released. What exactly SQL Server does will depend on the isolation level (hint: google "SQL Server transaction isolation levels"). Different levels will do different things, including blocking other transactions from reading data that you have read, or just reading data that you've modified but not committed.Using these levels is much easier than trying to do it yourself via a second table, because then YOU have to prevent the race conditions. All the second table does is move the race conditions somewhere else, especially if you aren't using transactions.TLDR: use SQL Server's transactions. bone up on SQL Server's transaction isolation levels. Don't reinvent the wheel]
11/8/2012 10:04:34 PM
^Yes, this
11/9/2012 1:01:57 AM
Use transactions. Doesn't matter if it's SQL Server or another database, or .NET / another platform. I work with a J2EE based product that can handle thousands of concurrent users hitting a site at once, all connected to a single DB (DB2, SQL Server, Oracle, etc.).The other consideration you'll want to look into for your application is how long you want the query to run before you cut it off (i.e. a transaction timeout). You don't want a query to hang your system if it never finishes, better to cut if off and throw an error. Similarly, don't want to cut off the query too early. Hit a weird case recently where I had a DB with over a million records and no data was being returned when I queried for it. A low transaction timeout was my culprit because it WAS getting cut off too early.
11/10/2012 8:37:56 AM