I have been doing "reporting" for a long time, but always had the benefit of Report Models built by others and ETL handled by others. I have a new job and am now 1/2 data architecht, 1/2 report writer.
My problem is that I've gone from working with thousands of records to millions of records and now my queries don't finish, etc.
I'm sure I have my work cut out for me: Learn to build efficient queries, tweak execution plans, etc., but do I need to REBUILD my relational database into something else (Cube, Pre-processed whatever)?
This is new territory for me and I want to work toward a solution, but stumble as little as possible since I have less time than I'd like.
I'd like to give an example:
I used to count discharges from the hospital...easy. I then grouped by a category (DRG) or grouped by hospital, still no problem.
What I have now is up to 25 diagnoses and procedures for every discharge going back years and I need to run reports on this data (regularly). So now, instead of querying just discharges, I've joined to 'AdditionalDx' and am looking for similar cases (data mining?). It's just too many records for my old structure--it's way too slow or doesn't return results at all. Ideas? Thanks!
You've got a LOT of things in play here. Also the term "big data" gets thrown around a lot. Yes, millions of rows is alot but when you hear folks talking about "big data" nowadays, they're usually referring to very large, unstructured data. Basically NoSQL stuff but the term can be flexible I suppose. Anyways, back to the topic at hand.
Your queries aren't completing now, are you doing SELECT * FROM large dimensions joining on large facts? If so, yes, you've got some query tuning to do. How much data (size) are you talking about? Millions of rows can still be under a gig in size. How much memory is in your system? When you run your queries, what are the (highest) wait stats? The wait stats will tell you where you need to direct your performance tuning attentions. What are the disk setups for everything? Lots of things you can do to tune IO. As you can see there are a TON of things you could look at here.