Posted: 1/16/2012
Hi,
We have a need to desgin a database where we will be getting 1million rows
everyday in table consisting of nearly 300 columns.Can anyone plz guide us on
how to manage such large tables and how to go about the query executions since
queries will take a long time to execute on such a heavy table.
Thanks
Hi there.
About Source:
What is your source database? Can you split that extraction amount into several times during the day, let's say, in each 3 hours?
If your source database is oracle, have you consider archiving information into "litle tables" and then load that tables into your database, so that million rows would be much less in each 3 hours?
About Destination:
Definitely your table should be partitioned. Have you any reference column (extraction date, modified date, etc) so you can partition in that column?
Another problem, your destination table will be queried a lot? The purpose of this question is to understand if you will need indexes to improve query performance. If so, in the other hand you will need to load a huge amount of data, wich in turn indexes will decrease load performance.
Some thoughts to keep in mind.
Posted: 1/17/2012
Our source database is SQL. Here we will be having a window period of about 5 hrs
approx in which we need to take the data from specified files. This is the first time
we will be handling this huge amount of data, so we first need to test what will be
the load on the database.
As you said we have a date column on which we may partition the tables right now
but later it may change (may be to "Regions")according to the requirement.
We will be doing table partitioning also for the first time so any help would be
very much appreciated.
And will rebuilding the indexes everyday degrade the performance of the system for
such a huge table taking into consideration that we get 1000000 rows everyday.
"maybe to Regions", this does not sound me good. Partitioning, has several advantages, such as inserting (switch in and switch out) and performance query. If you implement partitioned by Regions, I think you wont get the benefit of insert.
Rebuilding your indexes everyday isn't also a good idea. If you have such a huge amount of data in that table, rebuilding your indexes will take hours. You should consider rebuild index for a certain range of partitions.
Take these ideas I give you and check what best fits your requirement:
- Partitioned tables
- Partitioned indexes
Since your source database is SQL Server, check if CDC (Change data capture) will improve your solution.
Posted: 1/24/2012
Thanks for the links. We are able to implement partitions on year basis.
We will be getting 1million rows per day which means 30million per month and
365 million per yr and so on. So we are seeking out ways to further partition the
table either on monthly basis or monthly and then again partition based on certain
other Ids.So how can we do this in SQL server.
As far as data is concerned we are generating the files and then planning to bulk
insert them into the SQL DB.In real time we will be getting a million rows per day but
right now we ll be inserting the whole year's data at once(one year data split into
multiple files). Can you suggest the ways of achieving this faster.
Or do u have any other work around for this whole process of insertering and retrieving
the data faster?
When I first read your post, I thought, "Wow - that's huge and I've got nothing for ya - good luck!" -- after all, Marcoadf's suggestions were all good.
But then I thought about some of the larger projects I've been a part of - maybe I do have some general ideas that I can share. Most of this you may well have already thought of or implemented. However, just in case there is something useful here:
One thing to consider with such a large dataset is "federated" SQL Servers. Here's a link to more info on that: http://msdn.microsoft.com/en-us/library/ms190429.aspx
Otherwise, here is what I'm picturing and my notes on how I would probably start tackling a data load like the one you have to do.
..that's how I'd look at it - or at least start.
Hope it helps or maybe sparks and idea.
Keith Hyer
Posted: 1/27/2012
Hi again,
I am not sure if we can do wat u hav suggested all this rite now. We are only trying to
simulate the original situation.So we want to bulk insert one year data(365 million) into the
table and check the performance of the queries by implementing partitioning on the tables,
adding indexes correctly,etc.
And because we have such huge number of rows we want to partition based on more
than one column so that we get better query performance. This is wat we are expecting finally.
But do not know how to go about it..