Handle millions of rows evryday

Who is online?  0 guests and 0 members
Home  »  Forums   »  microsoft business intelligence   »  sql server   » Handle millions of rows evryday

Handle millions of rows evryday

Topic RSS Feed

Posts under the topic: Handle millions of rows evryday

Posted: 1/16/2012

Jedi Youngling 59  points  Jedi Youngling
  • Joined on: 3/23/2011
  • Posts: 26

Hi,

We have a need to desgin a database where we will be getting 1million rows

everyday in table consisting of nearly 300 columns.Can anyone plz guide us on 

how to manage such large tables and how to go about the query executions since 

queries will take a long time to execute on such a heavy table.

 

Thanks


Posted: 1/16/2012

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

Hi there.

About Source:

What is your source database? Can you split that extraction amount into several times during the day, let's say, in each 3 hours?

If your source database is oracle, have you consider archiving information into "litle tables" and then load that tables into your database, so that million rows would be much less in each 3 hours?

About Destination:

Definitely your table should be partitioned. Have you any reference column (extraction date, modified date, etc) so you can partition in that column?

Another problem, your destination table will be queried a lot? The purpose of this question is to understand if you will need indexes to improve query performance. If so, in the other hand you will need to load a huge amount of data, wich in turn indexes will decrease load performance.

Some thoughts to keep in mind.

 


Posted: 1/17/2012

Jedi Youngling 59  points  Jedi Youngling
  • Joined on: 3/23/2011
  • Posts: 26

Hi,

 Our source database is SQL. Here we will be having a window period of about 5 hrs 

approx in which we need to take the data from specified files. This is the first time 

we will be handling this huge amount of data, so we first need to test what will be

the load on the database.

  As you said we have a date column on which we may partition the tables right now

but later it may change (may be to "Regions")according to the requirement.

We will be doing table partitioning also for the first time so any help would be

very much appreciated.

 And will rebuilding the indexes everyday degrade the performance of the system for

such a  huge table taking into consideration that we get 1000000 rows everyday.


Posted: 1/17/2012

Padawan 1392  points  Padawan
  • Joined on: 1/3/2010
  • Posts: 250

"maybe to Regions", this does not sound me good. Partitioning, has several advantages, such as inserting (switch in and switch out) and performance query. If you implement partitioned by Regions, I think you wont get the benefit of insert.

Rebuilding your indexes everyday isn't also a good idea. If you have such a huge amount of data in that table, rebuilding your indexes will take hours. You should consider rebuild index for a certain range of partitions.

Take these ideas I give you and check what best fits your requirement:

- Partitioned tables

- Partitioned indexes

Since your source database is SQL Server, check if CDC (Change data capture) will improve your solution.

 


Posted: 1/24/2012

Jedi Youngling 59  points  Jedi Youngling
  • Joined on: 3/23/2011
  • Posts: 26

Thanks for the links. We are able to implement partitions on year basis.

We will be getting 1million rows per day which means 30million per month and

365 million per yr and so on. So we are seeking out ways to further partition the

table either on monthly basis or monthly and then again partition based on certain

other Ids.So how can we do this in SQL server. 

As far as data is concerned we are generating the files and then planning to bulk

insert them into the SQL DB.In real time we will be getting a million rows per day but

right now we ll  be inserting the whole year's data at once(one year data split into

multiple files). Can you suggest the ways of achieving this faster.

  Or do u have any other work around for this whole process of insertering and retrieving

the data faster? 

 

Thanks

 


Posted: 1/24/2012

Padawan 1437  points  Padawan
  • Joined on: 3/24/2010
  • Posts: 196

When I first read your post, I thought, "Wow - that's huge and I've got nothing for ya - good luck!" -- after all,  Marcoadf's suggestions were all good.

But then I thought about some of the larger projects I've been a part of - maybe I do have some general ideas that I can share.  Most of this you may well have already thought of or implemented.  However, just in case there is something useful here:

One thing to consider with such a large dataset is "federated" SQL Servers.   Here's a link to more info on that:  http://msdn.microsoft.com/en-us/library/ms190429.aspx

Otherwise, here is what I'm picturing and my notes on how I would probably start tackling a data load like the one you have to do. Millions of rows

..that's how I'd look at it - or at least start.

Hope it helps or maybe sparks and idea.

Keith Hyer

 


Posted: 1/27/2012

Jedi Youngling 59  points  Jedi Youngling
  • Joined on: 3/23/2011
  • Posts: 26

Hi again,

           I am not sure if we can do wat u hav suggested all this rite now. We are only trying to

simulate the original situation.So we want to bulk insert one year data(365 million) into the

table and check the performance of the queries by implementing partitioning on the tables,

adding indexes correctly,etc.

        And because we have such huge number of rows we want to partition based on more

than one column so that we get better query performance. This is wat we are expecting finally.

But do not know how to go about it..


Page 1 of 1 (7 items)