Well, I decided to take the plunge and migrate from Windows XP to Ubuntu. Now, technically, I have been wanting to do this for a while, but I was dependant on my windows to dial up to my internet (some problem with my ISP here). I tried to do it in Ubuntu a few months ago, but it was too complicated for me. However, it left me with a yearning to try again.
So, I waited for the new Ubuntu to come out and in parallel fixed my router.
Why?
The reason I thought it was a good idea to move to linux was mainly for self-improvement.
I recently read some blogs (couldn't find them now) that mentioned that with all the new web apps, there will be less and less emphasis on the OS. In 2011 (gartner says), this will hit a crucial point.
I was thinking, maybe more people will adopt free linux when everything important will be in "the cloud" and all you really need is a browser anyway. So I know how to use linux, I will be ready for that time and help people start adopting it.
Another 2 important reasons for me were:
1) That all servers use linux for their web apps and to be more comfortable in that environment will be a big plus.
2) To get myself more into the developing frame of mind. This maybe just me, but I noticed that the hardcore developers use linux or macs. Not that I am knocking windows, but I don't really feel that developers in windows are as serious as linux or mac for some reason (did you see some online presentation videos for a new technology. ALL the presenters use macs. Especially ones from Google. Am I missing something?)
Installing
I did some research, downloaded some pdfs and was all ready to go. I decided to make my current windows into a VMware image in case I needed something from it.
I downloaded Ubuntu and tried to figure it out. Bascially, you get 700mbs which you download of the net and anything else you want, you need to download. Fedora, comes with 4gbs and you need to download very little.
I had real problems understanding how to install things.. until I discovered DAM DAM DAM... repositories. Ok, so forgive me if you now are saying to yourself "Duh, I already knew that". I am new to it.
What I learned was, if you type "sudo apt-get install the-whole-universe-for-free-woohoo-I-love-it", you just magically get it. No need to download, configure (well, most dont) and all that trouble. You just use apt-get. You can even do it visually with the package manager.
This for me, is a huge advantage over other platforms. Its just so welcoming and encourages you to improve.
For example, I could download MySQL off the website, then install it manually. Or, I could just write 1 line and it will download, install itself, do some configuring and automatically upgrade (I think) when a new version goes into the repository. How cool is that?
The Tipping Point
So I was trying to install VMware and this was really difficult. Opening the bin/rpm file was difficult enough, but after succeeding in that, it gave me some error.
I almost gave up, until I found this article. It shows you that you can do everything with command lines and after playing around with linux, I am becoming more and more comfortable with them.
However, the important part is in the comments.
Someone posted the exact error I had.. and the guy told him how to solve the problem... by changing the ACTUAL CODE. Oh my god! change the actual code of the supplier? Unheard of in windows. If you have an error, you file a report somewhere and wait a week if your lucky.
Just the fact that I was allowed to look into someone else's code (and VMware is a big company) and learn and/or change it (I wont go into licenses), really makes me feel part of something.
I now really understand that open source doesn't mean free as in free beer, but free as in freedom of expression (sorry, that was a bit deep).
In conclusion.
I really like Ubuntu. Its not perfect and it needs some work, but this time, I feel that I can be part of it and help it improve.
A useful article to install Ubuntu <--
Hello Fellow Bloggers
I am hereby officially requesting if you could kindly enable your blog to have subscription via email.
The reason I am asking this is because I like to get information from your wonderful blogs, but unfortunately, feedburner is blocked in china (and wordpress sometimes, blogspost sometimes, typepad...etc). This makes it difficult and highly annoying to get feeds from blogs.
The usual method to overcome this difficulty is to find proxies and sometimes install browser plugins to connect to the various proxies (vidalia is also good), but this slows down the process of trying to view the blog.
Now, I realize that most of you have no problem at all viewing blogs and are used to reading them every day with no delay in speed or increase of blood pressure.
But, please, consider the poor unfortunate people that live in "restrictive" parts of the world and would still like to be informed as to whats going on in outside their country of residence.
To help you enable email subscription for your blog, please use either feedburner or feedblitz.
Thank you for your understanding.
I finally finished my first data warehouse! and it only took me 3 days!
Well, to be fair, the data warehouse design was already planned and it wasn't really that big anyway, but I am still happy about it.
I was asked on Monday to do a data warehouse for my company's head quarters in Germany. I work in Beijing, so its like.... very slow to connect to there. They gave me the database design, some SQL statements to generate a few dimensions and "rough" business rules for the data.
Now, I haven't done anything like this before, but I really wanted to try. So I did it my way.
My way is to use a lot of Views with long SQL statements instead of cursors or stored procedures. I like it this way, because I feel like I can see the data and catch problems instead of programming blindly to find out later that there is a big problem. So basically, for me, its more comfortable.
I took the SQL statements that loaded the dimensions, cleaned them a bit and put them in Views. The Views in Oracle (we used Oracle for this data warehouse. I told them MySQL is better, but they preferred Oracle)... the views are very restrictive. I thought it was good, because it made me confident that they will work properly. However, Views don't allow for UNION, so for that I used Materialized Views.
Materialized Views was something I learned while doing this data warehouse and I think they are very cool, to be honest. It allowed me to create a sort of temp table that more or less needed no code from me to maintain. I can refresh it with 1 line in a stored procedure. I could also schedule it to refresh, but I didn't learn how to do that. I also used Materialized Views to replace views that needed too much calculations and I got REALLY annoyed waiting for it finish from the server in Germany.
I checked my SQL statements with looking at the Views and using a lot of count()'s on many different columns. Count()'s don't count NULL values and NULL is something that cant go into a fact table of a data warehouse. I also brushed up on my AND/OR conditions when I had missing data from one table and I had to find it in maybe another table. After that, I took this value and INNER JOIN'd it with another table. So an example that worked well for me is:
left join src_dim_team_all dt on ((u2dt.TEAM = dt.TEAM)) orIn this example, the join takes the data from the first table.. or.. if the first table's value is null, it takes it from the second table. A simple OR didn't work for me and I only found that out when I tested it with some count()'s.
((u2dt.TEAM is null) and (dt.TEAM = f.TEAM))
In the end, 99.99% of the work was done using Views and 1 stored procedure to do the:
insert into mydimension from select * from myviewwhich made it very very easy for me to abstract the complexity into several Views.
At the end, the final extracting, transforming and loading was ridiculous. EVERY tiny thing took 5-10 mins to run on the German server and after that I had to test it if its correct.
However, I enjoyed this experience and it would be one more thing I can add on my CV.
Thanks for reading my blog.
After trying very hard to install the Yaws webserver for Erlang on windows, asking 2 friends of mine to try to install it and pleading with the original creator to make an installer, I have given up.
So my very short lived relationship with Erlang has come to an end also. I am just too busy to sit down and learn how to install a program that seems too complicated to me before I even start programming in it.
I hope one day, Yaws will use an installer like Bitrock. Until then, I will spend my time either improving the languages I know or learning new ones that seem cool and not that complicated to get into.
I was complaining in an earlier post that I have problems with linux style installation.
I found a company that can help me solve that!
BitRock makes open source software easier to use by providing a complete automated solution for Open Source Application Deployment.
Its quite cool and they have a LAMP stack installer here.
Now I am suggesting that someone at Erlang does it too. They have a Lyme stack which is Linux + Yaws + Mnesia + Erlang. (a comparison can be found here)
Anyway, I am still stuck on installing yaws :(
I asked a friend to help until someone makes a cool installer.
For those who haven't heard this song, please go to here.
Its a very cool song about programmers, done in an "open-source" way, in that the guy who wrote the song isn't signed by any record label and hes trying to promote himself through his site.
Truly an anthem song.
SQL is in Fact a Programming Language
While I was updating my blog recently, I read the top part of it which said "My adventures in database developing". I realized that I have really neglected database developing with all my research into other programming languages.
So I was trying to play around with SQL statements to brush up on my skills.
Luckily (or unluckily), someone at my work, a java developer, needed to do a data export and thought that I can do it in 1 SQL line. He said.... and I quote "You only need 1 SQL line. Whats the problem?".
Whenever I hear the "whats the problem" line, there usually is one.
So, with him having great confidence in my SQL-ing ability or a complete misunderstanding of database developing, I was forced to do the data export.
I realized early on that it would take a long time. Just to start off, I needed like 20+ columns, but each column was a row somewhere in 4 different tables.
To make a long story short, I did complete the task and to simply pat myself on the back, I will paste it here. Just to show that, yes, you can program in SQL.
(but next time, please give me rights on the server to do it with several views).
The SQL statement is about 2-3 pages long, by the way and you don't really need to read all of the statementSELECT
CONTACT.CON_ID as contact_id,
contact.CON_EMAIL_LOWER as email,
contact.CON_CREATION_DATE as creation_date,
lastname.CONATTR_VALUE as givenname,
firstname.CONATTR_VALUE as familyname,
languages.CONATTR_VALUE as "LANGUAGE",
gender2.SELVAL_VALUE as gender,
mobile.CONATTR_VALUE as mobilephone,
office.CONATTR_VALUE as officephone,
home.CONATTR_VALUE as home,
address.CONATTR_VALUE as address,
zipcode.CONATTR_VALUE as zipcode,
province2.SELVAL_VALUE as province,
city2.SELVAL_VALUE as city,
intendedpurchase2.SELVAL_VALUE as intendedpurchase,
ownedcarbrand2.SELVAL_VALUE as ownedcarbrand,
minimodel3.SELVAL_VALUE as minimodel,
dealer3.SELVAL_VALUE as dealer,
subject3.SELVAL_VALUE as subject,
SUBSCRIPTION.sub_id as subscription_id,
brochure13.SELVAL_VALUE as brochure1,
brochure23.SELVAL_VALUE as brochure2,
brochure33.SELVAL_VALUE as brochure3,
message2.CAMATTR_VALUE as message
from CONTACT contact
left join CONTACT_ATTRIBUTE lastname on lastname.CONATTR_CON_ID = CONTACT.CON_ID and lastname.CONATTR_ATTR_ID = 1010
left join CONTACT_ATTRIBUTE firstname on firstname.CONATTR_CON_ID = CONTACT.CON_ID and firstname.CONATTR_ATTR_ID = 1020
left join CONTACT_ATTRIBUTE languages on languages.CONATTR_CON_ID = CONTACT.CON_ID and languages.CONATTR_ATTR_ID = 1000
left join
(SELECTION_VALUE gender2 INNER JOIN CONTACT_ATTRIBUTE gender on gender.CONATTR_SEL_ID=gender2.SELVAL_SEL_ID and gender2.SELVAL_LANGUAGE = 'zh' )
on gender.CONATTR_CON_ID = CONTACT.CON_ID and gender.CONATTR_ATTR_ID = 1030
left join CONTACT_ATTRIBUTE mobile on mobile.CONATTR_CON_ID = CONTACT.CON_ID and mobile.CONATTR_ATTR_ID = 1100
left join CONTACT_ATTRIBUTE office on office.CONATTR_CON_ID = CONTACT.CON_ID and office.CONATTR_ATTR_ID = 1110
left join CONTACT_ATTRIBUTE home on home.CONATTR_CON_ID = CONTACT.CON_ID and home.CONATTR_ATTR_ID = 1120
left join CONTACT_ATTRIBUTE address on address.CONATTR_CON_ID = CONTACT.CON_ID and address.CONATTR_ATTR_ID = 1200
left join CONTACT_ATTRIBUTE zipcode on zipcode.CONATTR_CON_ID = CONTACT.CON_ID and zipcode.CONATTR_ATTR_ID = 1210
left join
(SELECTION_VALUE province2 INNER JOIN CONTACT_ATTRIBUTE province on province.CONATTR_SEL_ID=province2.SELVAL_SEL_ID and province2.SELVAL_LANGUAGE = 'zh' )
on province.CONATTR_CON_ID = CONTACT.CON_ID and province.CONATTR_ATTR_ID = 510
left join (SELECTION_VALUE city2 INNER JOIN CONTACT_ATTRIBUTE city on city.CONATTR_SEL_ID=city2.SELVAL_SEL_ID and city2.SELVAL_LANGUAGE = 'zh' )
on city.CONATTR_CON_ID = CONTACT.CON_ID and city.CONATTR_ATTR_ID = 520
left join (SELECTION_VALUE intendedpurchase2 INNER JOIN CONTACT_ATTRIBUTE intendedpurchase on intendedpurchase.CONATTR_SEL_ID=intendedpurchase2.SELVAL_SEL_ID and intendedpurchase2.SELVAL_LANGUAGE = 'zh' )
on intendedpurchase.CONATTR_CON_ID = CONTACT.CON_ID and intendedpurchase.CONATTR_ATTR_ID = 1300
left join (SELECTION_VALUE ownedcarbrand2 INNER JOIN CONTACT_ATTRIBUTE ownedcarbrand on ownedcarbrand.CONATTR_SEL_ID=ownedcarbrand2.SELVAL_SEL_ID and ownedcarbrand2.SELVAL_LANGUAGE = 'zh' )
on ownedcarbrand.CONATTR_CON_ID = CONTACT.CON_ID and ownedcarbrand.CONATTR_ATTR_ID = 1400
left join SUBSCRIPTION on CONTACT.CON_ID=SUBSCRIPTION.SUB_CON_ID
left join
(SELECTION_VALUE minimodel3 INNER JOIN
(CAMPAIGN_ATTRIBUTE minimodel2 INNER JOIN SUBSCRIPTION minimodel on minimodel2.CAMATTR_SUB_ID=minimodel.SUB_ID)
on minimodel2.CAMATTR_ATTR_ID=52001)
on minimodel2.CAMATTR_SEL_ID = minimodel3.SELVAL_SEL_ID and minimodel3.SELVAL_LANGUAGE = 'zh' and minimodel.SUB_CON_ID=CONTACT.CON_ID
and SUBSCRIPTION.SUB_ID in (select distinct(CAMATTR_SUB_ID) from CAMPAIGN_ATTRIBUTE group by CAMATTR_SUB_ID)
left join
(SELECTION_VALUE dealer3 INNER JOIN
(CAMPAIGN_ATTRIBUTE dealer2 INNER JOIN SUBSCRIPTION dealer on dealer2.CAMATTR_SUB_ID=dealer.SUB_ID)
on dealer2.CAMATTR_ATTR_ID=52002)
on dealer2.CAMATTR_SEL_ID = dealer3.SELVAL_SEL_ID and dealer3.SELVAL_LANGUAGE = 'zh' and dealer.SUB_CON_ID=CONTACT.CON_ID
and SUBSCRIPTION.SUB_ID in (select distinct(CAMATTR_SUB_ID) from CAMPAIGN_ATTRIBUTE group by CAMATTR_SUB_ID)
left join
(SELECTION_VALUE subject3 INNER JOIN
(CAMPAIGN_ATTRIBUTE subject2 INNER JOIN SUBSCRIPTION subject on subject2.CAMATTR_SUB_ID=subject.SUB_ID)
on subject2.CAMATTR_ATTR_ID=50001)
on subject2.CAMATTR_SEL_ID = subject3.SELVAL_SEL_ID and subject3.SELVAL_LANGUAGE = 'zh' and subject.SUB_CON_ID=CONTACT.CON_ID
and SUBSCRIPTION.SUB_ID in (select distinct(CAMATTR_SUB_ID) from CAMPAIGN_ATTRIBUTE group by CAMATTR_SUB_ID)
left join
(SELECTION_VALUE brochure13 INNER JOIN
(CAMPAIGN_ATTRIBUTE brochure12 INNER JOIN SUBSCRIPTION brochure1 on brochure12.CAMATTR_SUB_ID=brochure1.SUB_ID)
on brochure12.CAMATTR_ATTR_ID=51001 and brochure12.CAMATTR_SEL_ID=5100101)
on brochure12.CAMATTR_SEL_ID = brochure13.SELVAL_SEL_ID and brochure13.SELVAL_LANGUAGE = 'zh' and brochure1.SUB_CON_ID=CONTACT.CON_ID
and SUBSCRIPTION.SUB_ID in (select distinct(CAMATTR_SUB_ID) from CAMPAIGN_ATTRIBUTE group by CAMATTR_SUB_ID)
left join
(SELECTION_VALUE brochure23 INNER JOIN
(CAMPAIGN_ATTRIBUTE brochure22 INNER JOIN SUBSCRIPTION brochure2 on brochure22.CAMATTR_SUB_ID=brochure2.SUB_ID)
on brochure22.CAMATTR_ATTR_ID=51001 and brochure22.CAMATTR_SEL_ID=5100102)
on brochure22.CAMATTR_SEL_ID = brochure23.SELVAL_SEL_ID and brochure23.SELVAL_LANGUAGE = 'zh' and brochure2.SUB_CON_ID=CONTACT.CON_ID
and SUBSCRIPTION.SUB_ID in (select distinct(CAMATTR_SUB_ID) from CAMPAIGN_ATTRIBUTE group by CAMATTR_SUB_ID)
left join
(SELECTION_VALUE brochure33 INNER JOIN
(CAMPAIGN_ATTRIBUTE brochure32 INNER JOIN SUBSCRIPTION brochure3 on brochure32.CAMATTR_SUB_ID=brochure3.SUB_ID)
on brochure32.CAMATTR_ATTR_ID=51001 and brochure32.CAMATTR_SEL_ID=5100103)
on brochure32.CAMATTR_SEL_ID = brochure33.SELVAL_SEL_ID and brochure33.SELVAL_LANGUAGE = 'zh' and brochure3.SUB_CON_ID=CONTACT.CON_ID
and SUBSCRIPTION.SUB_ID in (select distinct(CAMATTR_SUB_ID) from CAMPAIGN_ATTRIBUTE group by CAMATTR_SUB_ID)
left join
(CAMPAIGN_ATTRIBUTE message2 INNER JOIN SUBSCRIPTION message on message2.CAMATTR_SUB_ID=message.SUB_ID and message2.CAMATTR_ATTR_ID=50002)
on message.SUB_CON_ID=CONTACT.CON_ID
and SUBSCRIPTION.SUB_ID in (select distinct(CAMATTR_SUB_ID) from CAMPAIGN_ATTRIBUTE group by CAMATTR_SUB_ID)
I apologize in advance if I get anyone's feed-reader stuck.
Thank you for reading my blog.
Today, we run into an issue when a client of ours. They wanted to see some of the data on our website and when doing a search, they didnt see all the days they asked for in the search
Why didn't they see all the data? Because they did a search by date and some of the dates were stored via the website in the form of '2008-01-01 00:00:00' and some were stored via the database in the form of '2008-01-01 12:35:49'.
Now, for some magical reason, if you hide the time in the date in your searches, like so:
where signupdate between '2008-01-01' and '2008-01-02'then you might not see all the data between the days 01 to 02.
or
where signupdate between date('2008-01-01 00:00:00') and date('2008-01-02 00:00:00')
Depending on which way your date was stored, you might only see the dates between those 2 dates and not equal to those dates as well.
So for example, if you had:
2008-01-01 00:00:00
2008-01-01 00:00:00
2008-01-02 00:00:00
2008-01-02 00:00:00
2008-01-03 00:00:00
2008-01-03 00:00:00
and you ran the search above, then you would see:
2008-01-01 00:00:00
2008-01-01 00:00:00
(I once worked for another company that booked hotel rooms online. For them when you booked the room, the nights were counted. So if you would do a search, on the website, all the date allocations would be set to midnight - 00:00:00.)
In our situation we had a mix of dates which caused use problems with 1 table and was ok with another.
We decided to solve the problem like this
where signupdate between '2008-01-01 00:00:00' and ' 2008-01-02 23:59:59'
That is, from what I understand, the time a day starts (00:00:00) and the time a day ends (23:59:59). So we figured we covered all the issues with this statement and we are happy with it.
I hope it helps you to if you also run into these kind of problems.

