Showing posts with label mining. Show all posts
Showing posts with label mining. Show all posts

Friday, March 30, 2012

OLAP and/or data mining?

Hello,

If I wrote the next ebay (yes I know, yawn-snore) and I had a database
with 5 million auction items in it, what would be a really good
strategy to get a search done very quickly? Would it involve
something called OLAP and/or "data mining"? The only technology I am
familiar with is simply SQL Server databases with stored procedures.
I think I'd be guessing correctly and say that this technology simply
wouldn't be fast enough *on it's own* to do super fast queries against
massive amounts of data.

Any insights would be of great interest. Thanks.

-Frameworker."Framework fan" <tempframeworkfan@.hotmail.com> wrote in message
news:f109ac80.0312221359.2269c2f0@.posting.google.c om...
> Hello,
> If I wrote the next ebay (yes I know, yawn-snore) and I had a database
> with 5 million auction items in it, what would be a really good
> strategy to get a search done very quickly? Would it involve
> something called OLAP and/or "data mining"? The only technology I am
> familiar with is simply SQL Server databases with stored procedures.
> I think I'd be guessing correctly and say that this technology simply
> wouldn't be fast enough *on it's own* to do super fast queries against
> massive amounts of data.

OLAP is exactly what you'd want.

Data mining may be useful post-auction to see who bought what, what things
need more promotion etc.

I don't know why you don't think SQL Server wouldn't be fast enough for a
small setup like 5 million auctions.

I have a database with 14 million rows that's probably as complex as an
auction database and most queries are a couple of seconds or less. (some
are longer because of some fairly complex floating point math that has to be
done along the way.)

I have another database that handles at least that many inserts per day w/o
a major problem. And this is all on 3+ y.o. equipment.

Give me a quad Xeon MP box and the right disk subsystem and this thing could
scream.

> Any insights would be of great interest. Thanks.
> -Frameworker.|||"Greg D. Moore (Strider)" <mooregr@.greenms.com> wrote in message
news:fCNFb.56190$Ug6.2898@.twister.nyroc.rr.com...
> "Framework fan" <tempframeworkfan@.hotmail.com> wrote in message
> news:f109ac80.0312221359.2269c2f0@.posting.google.c om...
> > Hello,
> > If I wrote the next ebay (yes I know, yawn-snore) and I had a database
> > with 5 million auction items in it, what would be a really good
> > strategy to get a search done very quickly? Would it involve
> > something called OLAP and/or "data mining"? The only technology I am
> > familiar with is simply SQL Server databases with stored procedures.

Using stored procedures as component building blocks
you can create all forms of data mining and analytical
processing applications.

> > I think I'd be guessing correctly and say that this technology simply
> > wouldn't be fast enough *on it's own* to do super fast queries against
> > massive amounts of data.
> OLAP is exactly what you'd want.
> Data mining may be useful post-auction to see who bought what, what things
> need more promotion etc.
> I don't know why you don't think SQL Server wouldn't be fast enough for a
> small setup like 5 million auctions.

Seconded.

If you write your procs in the right manner, and maintain
the data integrity in your database then you should have
null performance problems.

> I have a database with 14 million rows that's probably as complex as an
> auction database and most queries are a couple of seconds or less. (some
> are longer because of some fairly complex floating point math that has to
be
> done along the way.)
> I have another database that handles at least that many inserts per day
w/o
> a major problem. And this is all on 3+ y.o. equipment.
> Give me a quad Xeon MP box and the right disk subsystem and this thing
could
> scream.
>
> > Any insights would be of great interest. Thanks.
> > -Frameworker.|||Hi Greg,

I have one follow up question please:

I'd like to incorporate OLAP in to my design. Do I go ahead and
create a sensibly normalised relational database design, with many
dozens of stored procedures to query the tables in "the usual manner",
*then* add on OLAP as a kind of "bolt on", or does OLAP technology
require a totally different database/stored procedures design strategy
right from the word go?

Thanks!

- Frameworker.

"Greg D. Moore \(Strider\)" <mooregr@.greenms.com> wrote in message news:<fCNFb.56190$Ug6.2898@.twister.nyroc.rr.com>...
> "Framework fan" <tempframeworkfan@.hotmail.com> wrote in message
> news:f109ac80.0312221359.2269c2f0@.posting.google.c om...
> > Hello,
> > If I wrote the next ebay (yes I know, yawn-snore) and I had a database
> > with 5 million auction items in it, what would be a really good
> > strategy to get a search done very quickly? Would it involve
> > something called OLAP and/or "data mining"? The only technology I am
> > familiar with is simply SQL Server databases with stored procedures.
> > I think I'd be guessing correctly and say that this technology simply
> > wouldn't be fast enough *on it's own* to do super fast queries against
> > massive amounts of data.
> OLAP is exactly what you'd want.
> Data mining may be useful post-auction to see who bought what, what things
> need more promotion etc.
> I don't know why you don't think SQL Server wouldn't be fast enough for a
> small setup like 5 million auctions.
> I have a database with 14 million rows that's probably as complex as an
> auction database and most queries are a couple of seconds or less. (some
> are longer because of some fairly complex floating point math that has to be
> done along the way.)
> I have another database that handles at least that many inserts per day w/o
> a major problem. And this is all on 3+ y.o. equipment.
> Give me a quad Xeon MP box and the right disk subsystem and this thing could
> scream.
>
> > Any insights would be of great interest. Thanks.
> > -Frameworker.|||I'm a speed freak! If I can get a query to run .25 seconds faster
using OLAP technology, I'm all for it. :o)

"mountain man" <hobbit@.southern_seaweed.com.op> wrote in message news:<f3WFb.62838$aT.6089@.news-server.bigpond.net.au>...
> "Greg D. Moore (Strider)" <mooregr@.greenms.com> wrote in message
> news:fCNFb.56190$Ug6.2898@.twister.nyroc.rr.com...
> > "Framework fan" <tempframeworkfan@.hotmail.com> wrote in message
> > news:f109ac80.0312221359.2269c2f0@.posting.google.c om...
> > > Hello,
> > > > If I wrote the next ebay (yes I know, yawn-snore) and I had a database
> > > with 5 million auction items in it, what would be a really good
> > > strategy to get a search done very quickly? Would it involve
> > > something called OLAP and/or "data mining"? The only technology I am
> > > familiar with is simply SQL Server databases with stored procedures.
>
> Using stored procedures as component building blocks
> you can create all forms of data mining and analytical
> processing applications.
>
> > > I think I'd be guessing correctly and say that this technology simply
> > > wouldn't be fast enough *on it's own* to do super fast queries against
> > > massive amounts of data.
> > OLAP is exactly what you'd want.
> > Data mining may be useful post-auction to see who bought what, what things
> > need more promotion etc.
> > I don't know why you don't think SQL Server wouldn't be fast enough for a
> > small setup like 5 million auctions.
>
> Seconded.
> If you write your procs in the right manner, and maintain
> the data integrity in your database then you should have
> null performance problems.
>
> > I have a database with 14 million rows that's probably as complex as an
> > auction database and most queries are a couple of seconds or less. (some
> > are longer because of some fairly complex floating point math that has to
> be
> > done along the way.)
> > I have another database that handles at least that many inserts per day
> w/o
> > a major problem. And this is all on 3+ y.o. equipment.
> > Give me a quad Xeon MP box and the right disk subsystem and this thing
> could
> > scream.
> > > > Any insights would be of great interest. Thanks.
> > > > -Frameworker.|||"Framework fan" <tempframeworkfan@.hotmail.com> wrote in message
news:f109ac80.0312240543.63e644ac@.posting.google.c om...
> I'm a speed freak! If I can get a query to run .25 seconds faster
> using OLAP technology, I'm all for it. :o)

I think you completely misunderstand what OLAP is or does. (And I mispoke
below (forgive me, I was tired). You want an OLTP system, not an OLAP.

OLAP and getting better performance are orthogonal to each other. i.e.
using OLAP is not a way to increase performance.

>
> "mountain man" <hobbit@.southern_seaweed.com.op> wrote in message
news:<f3WFb.62838$aT.6089@.news-server.bigpond.net.au>...
> > "Greg D. Moore (Strider)" <mooregr@.greenms.com> wrote in message
> > news:fCNFb.56190$Ug6.2898@.twister.nyroc.rr.com...
> > > > "Framework fan" <tempframeworkfan@.hotmail.com> wrote in message
> > > news:f109ac80.0312221359.2269c2f0@.posting.google.c om...
> > > > Hello,
> > > > > > If I wrote the next ebay (yes I know, yawn-snore) and I had a
database
> > > > with 5 million auction items in it, what would be a really good
> > > > strategy to get a search done very quickly? Would it involve
> > > > something called OLAP and/or "data mining"? The only technology I
am
> > > > familiar with is simply SQL Server databases with stored procedures.
> > Using stored procedures as component building blocks
> > you can create all forms of data mining and analytical
> > processing applications.
> > > > I think I'd be guessing correctly and say that this technology
simply
> > > > wouldn't be fast enough *on it's own* to do super fast queries
against
> > > > massive amounts of data.
> > > > OLAP is exactly what you'd want.
> > > > Data mining may be useful post-auction to see who bought what, what
things
> > > need more promotion etc.
> > > > I don't know why you don't think SQL Server wouldn't be fast enough
for a
> > > small setup like 5 million auctions.
> > Seconded.
> > If you write your procs in the right manner, and maintain
> > the data integrity in your database then you should have
> > null performance problems.
> > > I have a database with 14 million rows that's probably as complex as
an
> > > auction database and most queries are a couple of seconds or less.
(some
> > > are longer because of some fairly complex floating point math that has
to
> > be
> > > done along the way.)
> > > > I have another database that handles at least that many inserts per
day
> > w/o
> > > a major problem. And this is all on 3+ y.o. equipment.
> > > > Give me a quad Xeon MP box and the right disk subsystem and this thing
> > could
> > > scream.
> > > > > > > > Any insights would be of great interest. Thanks.
> > > > > > -Frameworker.

Monday, March 26, 2012

Offline Data Mining

Hi,

I am currently working for a client that is interested in performing some data mining on their customer data. The plan was to build a data warehouse, cubes, RS reports etc. and serve this up over the web.

However, we now have the problem of having to cater for countries with a very low bandwidth where the Reporting Services approach would not deliver the required performance due to the dependency of the solution on the network bandwith.

I am therefore looking into processing offline cubes and copying them over the network as part of the overnight process to a server in the countries.

I noticed that PivotTable Services in 2000 had the CREATE MINING MODEL ...... DDL statement for creating offline mining models.

Is this capability still there with SSAS 2005?

Can anyone post any links to usefull resources on this subject?

Thanks in advance.

There a couple of ways to deal with offline scenarios:

1. AS 2005 does support creation of local mining models if you are connected to a .cub file (in the "local" mode formerly known as PivotTable Services) through the MSOLAP.3 OLE DB provider. Only the Decision Trees and Clustering algorithms are supported in this case. You can then export these models individually, ship the files over to your server and import them back into your server. See http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/129.aspx for more details.

2. If the clients have AS 2005 Standard Edition, they can create models using all algorithms and export as described above. In addition, you can also export server models, ship them over to the client and import them back in, allowing local querying without going over the network.

|||

Yes, the capability is still there in SQL Server 2005.

This page contains some information about client scenarios with Analysis Services 2005, including local mining models: http://msdn2.microsoft.com/en-us/library/ms174518.aspx

More details specific for working with local mining models: http://msdn2.microsoft.com/en-us/library/ms345148.aspx#sqldmprgrm_topic7

The local mining models are limited in functionality. Major limitations:

- only 2 algorithms are supported (decision trees and clustering)

- BI Dev Studio cannot be used to explore local mining models

- server models cannot be exported and then imported in local cubes

|||Good stuff. Thanks for the help guys.sql