i'm taking a database class at johns hopkins. we have to do a little database project at the end of the semester. I'm struggling with how to set my project up and I'm thinking about just changing my project to something a bit easier to model unless I can get some more direction on how to proceed with this. Here's my project proposal:
4/25/2006 8:25:19 PM
4/25/2006 8:40:12 PM
what is a self-journaling table?
4/25/2006 10:47:20 PM
Just a table you have set to be journaled. Meaning that any changes you do to that table are recorded, so you could reference them later. You could do it with triggers if your db doesn't support "self-journaling".
4/26/2006 1:06:17 AM
Question about DB design myself.I'm basically making a poor mans CMDB, yes i should use an open source one, yes this is not a sensible project.Go ahead and forget reasonable things for a bit, lets just assume that none of those things are options, and that I beleive that using a COTS or Opensource solution would be vastly superior than trying to build one in cost, time money, energy and everything else.Right now i'm using 2 tables CI and Attribute, I might extend another table called Relationship, but right now those are stored in the CI table.Is this sensible? It needs to be easily extendable, I.E. tomorrow we might need to add a new configuration for example Mobile device, that may not fit in the other classes.Class struture shown here
5/19/2014 4:52:03 PM
5/19/2014 6:05:33 PM
I would create a CI called Location, which would have a parent of the base/root element.I would then add attributes that describe a location for example State, City Zip.So the CI has a 1 to many relationship with the attribute table.CIID:1Name:LocationType:ClassAttribute Table:ID:1Name:StateValue:Owner:1ID:2Name:CityValue:Owner:1ID:3Name:ZIPValue:Owner:1Template CIs for example the 1 above would be marked as a "Class" in the databaseSo for example, I define a Class in CI called Location and mark 3 attributes as belonging to that class.Now that the Location "Class" is defined in the database, you as a user can create a new "Location" I.E. you have a drop down that populates all the CIs that are marked as class (ex Server, Location, Agency) and you chose "Location" then create new. It then adds a new record to the CI database and 3 new attributes that relate to itCIID:2Name:RaleighType:LocationAttribute Table:ID:5Name:StateValue:NCOwner:2ID:6Name:CityValue:RaleighOwner:2ID:7Name:ZIPValue:27606Owner:2I need to be able to create these "Classes" on the fly, without creating 20 different tables, and within a given class, expand out the attributes that I track.EX, tomorrow i find out that we need to have a new attribute for locations, lets say "Is Large" I go into the database and do the following in the attribute tableAttributeID:112345Name:LargeValue:Owner:1Now Location has 4 attributes, State, City, Zip, and LargeNow Raleigh doesn't currently have a "Large" attribute, but once i update the "Location" class i've got a process that goes though all the records of type "Location" and copies the newly added attribute to them. Values are usually blank, but for example if I wanted a default value of "No" then I would set that attribute value at the parent.
5/19/2014 8:11:13 PM
^Why not use MongoDB for this? Seems like you're explicitly going to have a dynamic schema, which is going to become a fucking nightmare to manage as soon as you turn this on if you're using SQL Server.
5/19/2014 11:59:46 PM
yeah, any reason you need a relational datastore? I was going to suggest riak, dynamo, cassandra, or mongo. hell a dynamo index to S3 xml objects would work really well, scale, and port very well to EMR jobs
5/20/2014 12:56:10 AM
i wish i knew more about those as options but for now ill be happy if i cam get sql. it is currently built and run in ms access. if you have any reccommended reading ill be happy to see what i can learn.i don't think the introduction of a new database solution into our environment would really be supported at any level. Ive also got it in my mind to integrate into sharepoint and visio.i will say that it doesn't have to have real time updates right now im feeding in logs from a ton of systems which get dropped into a import table. i run an import task that handles some normilization and restricts the growth of the actual production table to something reasonable.i also have a task that rights out a conventional table for each defined class for reporting purposes.
5/20/2014 3:04:57 AM
http://www.amazon.com/Seven-Databases-Weeks-Modern-Movement/dp/1934356921if it were me.. i'd push the logs to S3, have data pipeline move them to mysql, and roll from there. if you need a new schema, change the data pipeline config and write a 1 time job to ingest from the old schema
5/20/2014 11:08:38 AM
It's weird moving to a MPP architecture like Greenplum and not using any type of star schema any more. Everything is just stored in distributed tables. Now we're moving to Hadoop and all that's going out the door.
5/20/2014 11:28:53 AM
Those all might be great solutions, but none that I could conceivably implement, either technically or by policy.Why is it going to be a nightmare to manage in SQL?
5/20/2014 4:33:30 PM
Also is his DB setup any better than just using normal relational DB tables...eg: computers, places, projects, etc?(this must be in MSSQL if you all haven't picked up on that yet)
5/20/2014 4:45:45 PM
If this is for log search, why not just download splunk? http://www.splunk.com/downloadSplunk handles the entire data model under the covers, completely dynamic, scales to infinity, and its FREE for commercial use. It will take you all of 30 minutes to have a completely working system. And you can easily plug it into Visio/SharePoint/whatever through its data endpoints.Dynamic schemas in SQL take exponentially more time to normalize as you add rows. Every row added with a new field requires a complete schema update of EVERY record, PLUS reindexing if you're using indexes.With even 100k rows, the updates will start taking minutes to complete, which will need to be transactional to ensure the DB doesnt get corrupted, which means you will start building a massive transaction commit queue.[Edited on May 20, 2014 at 5:11 PM. Reason : .]
5/20/2014 5:10:17 PM
Why I love that all our tables are append only.
5/20/2014 5:45:30 PM
just use an Excel spreadsheet lol
5/20/2014 9:12:32 PM