Monday, December 2, 2019

Thursday, December 31, 2009

Solution for using UTF8 format bibtex from Zotero with Latex

When exporting the bibtex file from Zotero, the file is in utf-8 encoding. So there is a problem when I directly used it in latex. I dug for quite a while online but didn't find any ideal solution for this. The problem is when converting utf-8 to iso-8859-1, there will be some unidentified characters, leading to abnormal display in generated pdf file.
Finally, I got this solution.

1. Locate the Zotero data directory

By default, Zotero data is stored within your Firefox profile in these OS-dependent directories.

On a Mac:
/Users//Library/Application Support/Firefox/Profiles//zotero

On Windows 2000/XP:
C:\Documents and Settings\\Application Data\Mozilla\Firefox\Profiles\\zotero

On Windows Vista:
C:\Users\\AppData\Roaming\Mozilla\Firefox\Profiles\\zotero

On most Linux distributions:
~/.mozilla/firefox//zotero

2. Locate translator file BibTex.js in the $zoterodir/translators directory

3. Open that file and change the following

Zotero.addOption("exportCharset", "UTF-8");

into

Zotero.addOption("exportCharset", "ISO-8859-1");

Now you can export the bib file again from zotero, and the file is in ascll format ready for use.

Monday, June 15, 2009

Calculating Percentile

Procedure of calculating (k%) percentile:

Assume that we have an array M of n numbers
(1) Sort in increasing order, calculate (n-1)*k%, the integer part is i, and decimal part is j.
(2) Result=(1-j)*M_(i+1) + j*M_(i+2).
Special cases:
if j=0, the results is M_(i+1); if M_(i+1)=M_(i+2) either of them is the result.

Quartile can be calculated this way,
1st Quartile k%=25%
2nd Quartile k%=50%
3rd Quartile k%=75%

Thursday, April 2, 2009

Boosting and Perceptron Learning for Ranking

Boosting can be considered to be a greedy algorithm for finding the parameter w that minimize the loss function. w initially is set to (w0,0,...,0), w0 is set based on the base model. When applied to ranking problems, the loss function is an upper bound on the number of "ranking errors", a ranking error being a case where an incorrect candidate gets a higher value than a correct candidate. The iterations is going through all the possible features where a single feature is chosen(greedy) and its weight is updated at each iteration. The number of rounds of iteration will be decided using the cross validation. The final output is the final parameter setting w.

Perceptron algorithm for ranking makes a pass over the traning set instead of features, at each tranining example storing a parameter vector w(i) i=1,2,...,n, which is initially set to be all zeros, only modified when a mistake is made on an example. The update would be the difference of the offending examples' representations(between the 1st rank candidate and the candidate of this example with the highest score based on current w). Generally w(n) is taken as the final parameter to decide the ranking given a new test example. Since during the training, n parameter settings have been constructed, each of which will have its own highest ranking candidate. The idea of taking each of the settings to "vote" for a candidate is called voted perceptron.

The ranking function for both cases: F(x,w)=w*h(x) where h(x) is the feature vectors representing x.

Friday, March 27, 2009

Search Engines Beyond Google

Natural language processing search engine is increasingly on demand which aims to more accurately meet different information needs as well as support automatic understanding and digestion.

Powerset is first applying its natural language processing to search, aiming to improve the way we find information by unlocking the meaning encoded in ordinary human language. Powerset's first product is a search and discovery experience for Wikipedia, launched in May 2008. Powerset's technology improves the entire search process. In the search box, you can express yourself in keywords, phrases, or simple questions. On the search results page, Powerset gives more accurate results, often answering questions directly, and aggregates information from across multiple articles. Finally, Powerset's technology follows you into enhanced Wikipedia articles, giving you a better way to quickly digest and navigate content.

The following article discussed five search engines which may be a good choice when we want some specific answers.


Top 5 Non-Google Search Engines

Many times you can't just live your life using only one search engine. There's alot of good options out there, that are just as good, maybe even better than Google. Google is great for general information, but when you want some more concrete, and reliable answers, it may be best to look elsewhere. Here are the top 5 Non-Google search engines (sans Yahoo and MSN)

1. Sweet Search

Sweet Search is a new engine provided by FindingDulcinea.com, an encylopedic guide site. You'll find that it's a lot more selective than your average search engine: For a search that might return millions on Google you'll get 500 from Sweet Search. This isn't necessarily a bad thing. All the results from Sweet Search are reliable: They are hand selected by findingDulcinea's staff to ensure that they are all high quality websites. You won't be finding any R. Kelly fansites from Kim in Wisconsin. It also doesn't hurt that Sweet Search prefaces their search results with a handful of relevant guide selections from FindD.

2. Kosmix

The great thing about this page is that it covers all the bases: Anything you search, you get the
web results, audio, video, tweets, shopping, images, conversations taking place on sites like Yahoo! Answers and Answerbag, and related searches, all on the same page. Needless to say it's extremely comprehensive, and if you're searching on a more general level you'll get more then enough information.

3. Ask.com

Ask.com's greatest feature isn't in it's web search function, but it's answer search. It has questions and answers catalogued from all kinds of sites like Yahoo! Answers, Ehow, Askville, Answerbag, and Wiki Answers. Those however, are not the only sites it is limited to. It includes really any site that the relevant question, or questions close to it, has been asked or discussed.

4. Silkwise

Where as Ask.com is an answer search, Silkwise is more of a comprehensive question database. You ask your question and in time it is answered by at least one expert. Every answer you get is nigh guaranteed to be comprehensive and highly detailed.

5. ChaCha

Chacha is a question and answer site but with a twist: You text your questions to them and the answers are texted back to you on the fly, written by real people. Unlike pages like Ask.com or Silkwise which have only specific questions answered, you can literally ask anything at ChaCha. Of course you are not going to be asking deep philosophical questions, or for a how-to on assembling a car, but for a quick fact check or just a short answer it's great. It doesn't hurt that you can look in the online database for anything that might've already been asked.