Is it 'Big Data'? Need new strategies/structures: I have 10X as many records now, exponential possibilities!

Who is online?  0 guests and 0 members
Home  »  Forums   »  bidn   »  general bi discussion   » Is it 'Big Data'? Need new strategies/structures: I have 10X as many records now, exponential possibilities!

Is it 'Big Data'? Need new strategies/structures: I have 10X as many records now, exponential possibilities!

Topic RSS Feed

Posts under the topic: Is it 'Big Data'? Need new strategies/structures: I have 10X as many records now, exponential possibilities!

Posted: 6/4/2012

Jedi Youngling 14  points  Jedi Youngling
  • Joined on: 2/2/2012
  • Posts: 7

Greetings,

I have been doing "reporting" for a long time, but always had the benefit of Report Models built by others and ETL handled by others. I have a new job and am now 1/2 data architecht, 1/2 report writer.

My problem is that I've gone from working with thousands of records to millions of records and now my queries don't finish, etc.

I'm sure I have my work cut out for me: Learn to build efficient queries, tweak execution plans, etc., but do I need to REBUILD my relational database into something else (Cube, Pre-processed whatever)?

This is new territory for me and I want to work toward a solution, but stumble as little as possible since I have less time than I'd like.

I'd like to give an example:

I used to count discharges from the hospital...easy. I then grouped by a category (DRG) or grouped by hospital, still no problem.

What I have now is up to 25 diagnoses and procedures for every discharge going back years and I need to run reports on this data (regularly). So now, instead of querying just discharges, I've joined to 'AdditionalDx' and am looking for similar cases (data mining?). It's just too many records for my old structure--it's way too slow or doesn't return results at all. Ideas? Thanks!


Posted: 6/15/2012

Jedi Knight 2242  points  Jedi Knight
  • Joined on: 2/22/2010
  • Posts: 201

You've got a LOT of things in play here. Also the term "big data" gets thrown around a lot. Yes, millions of rows is alot but when you hear folks talking about "big data" nowadays, they're usually referring to very large, unstructured data. Basically NoSQL stuff but the term can be flexible I suppose. Anyways, back to the topic at hand.

Your queries aren't completing now, are you doing SELECT * FROM large dimensions joining on large facts? If so, yes, you've got some query tuning to do. How much data (size) are you talking about? Millions of rows can still be under a gig in size. How much memory is in your system? When you run your queries, what are the (highest) wait stats? The wait stats will tell you where you need to direct your performance tuning attentions. What are the disk setups for everything? Lots of things you can do to tune IO. As you can see there are a TON of things you could look at here.


Page 1 of 1 (2 items)