Stephen's Technical Blog: February 2005

Sunday, February 27, 2005

Internet Access via Bluetooth on Linux

About a month ago, I went to Sham Siu Po and found that there was a Bluetooth adaptor which cost $1xx. So, what I came up was:
can I use it to build a wireless network?

Most people use 802.11[abg], but it seems you require to buy an adaptor for each computer in order to have wireless access, and one router for the network too. So, the total cost is much higher than using bluetooth way.

Bluetooth seems to be one common connectivity for home appliances. Imagine you can connect to a TV set and control it remotely while you are working your word processing software. :)

P.S.
The drawbacks are:
- low link speed (only 721kbps for USB 1.1 Bluetooth 1.1)
- might be less than 10 Bluetooth devices
- [is it high transmitting power at the Antenna?]

This link inspires me to write this blog. :)
http://www.osnews.com/story.php?news_id=9834

for windows
http://www.whizoo.com/bt_setup/
Palm Quick Answers -- Internet Access via Bluetooth on Linux: "Internet Access via Bluetooth on Linux"

Wednesday, February 23, 2005

Tomcat 5 Chinese Encoding problem...

實際運用 Tomcat 5.0.19，我們了解在不修改 Tomcat 原始碼的狀況下，使用者透過 Form submit 的資料將一律以 ISO8859-1 處理，程式設計師必須自行將字串將轉換為 Big5(繁體中文) or GB2312/GBK(簡體中文)，我們在應用程式中，對所有的 request.getParameter("xx"); 作了 toBig5String() 的處理，理論上，所有的中文問題應該不會出現才對，結果，還是發現某些狀況下，中文還是變成亂碼！

經過分析整理，我們發現問題出在 QueryString 的解析，以前在 Tomcat 4.x 時代，無論 SUBMIT 時採用 GET or POST，Tomcat server 對 parameters 的處理都採用相同的編碼，但在 Tomcat 5.x 版，不知何故，卻將 QueryString 的解析獨立出來，目前確認，Form 的 Method 採用 GET 及直接將參數寫在 URL 上的中文，上傳到 Tomcat 時，無論如何轉碼，都會變成亂碼，那怕你事先作過 URLEncode 也一樣。

網站上，有人針對這個問題，建議將所有中文改採用 base64 編碼，到了 server 上，程式將自行土 base64 decode 回來，確保中文不會發生問題。這樣作法當然可以解決這個問題，但是所有網頁變成限定要採用 POST，且程式設計師要隨時分清楚，那個參數是採用 GET 上傳，那個參數是採用 POST 上傳，然後再針對不同的方式採用不同的解析，這樣的程式一點兒移植性都沒有，更別提跨平台、跨國際語言了。

研究 Tomcat 的文件及原始碼，我們找到了問題所在及解決的方法，只有按著以下的作法，才能使 Form submit 的資料完全按著 ISO8859-1 的編碼，當然，若是全照著 Tomcat 的文件說明去作，肯定還是不行，你還是得加上這個參數到 server.xml 中才行。

解決方案

請先研究 $TOMCAT_HOME/webapps/tomcat-docs/config/http.html 這個說明檔，擷錄重點如下：
URIEncoding：This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.

useBodyEncodingForURI：This specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitely set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.

上述二個 Tomcat 參數，是設定在 server.xml 中的 http 區塊，要解決 QueryString 中文變成亂碼的問題，你必須至少設定這二個參數其中之一。
URIEncoding 請設定為 URIEncoding="ISO-8859-1" 指定為 "ISO-8859-1" 編碼，讓 QueryString 的字元編碼與 post body 相同。
useBodyEncodingForURI 這是用來相容 Tomcat 4.x 版的，設定的值是 "true" or "false"，意思是指 "要不要讓 QueryString 與 POST BODY 採用相同的字元編碼 ?"，若是設成 true，那也可達到 "ISO-8859-1" 編碼的需求。
建議，採用 URIEncoding 的設定，畢竟 useBodyEncodingForURI 的作法是為了相容 Tomcat 4.X。不過若照原文的說明，理論上這二個參數都不設，Tomcat 也該採用 "ISO-8859-1" 的編碼，那為什麼還是會有問題呢 ? 我們由 Tomcat Source Code 來看就清楚了。

// 這一段碼是 Tomcat 用來解 QueryString 的程式，
// 在 org.apache.tomcat.util.http.Parameters 這個 class 裡。
private String urlDecode(ByteChunk bc, String enc)
throws IOException {
if( urlDec==null ) {
urlDec=new UDecoder();
}
urlDec.convert(bc);
String result = null;
if (enc != null) {
bc.setEncoding(enc);
result = bc.toString();
}
else {
CharChunk cc = tmpNameC;
cc.allocate(bc.getLength(), -1);
// Default encoding: fast conversion
byte[] bbuf = bc.getBuffer();
char[] cbuf = cc.getBuffer();
int start = bc.getStart();
for (int i = 0; i < bc.getLength(); i++) {
cbuf[i] = (char) (bbuf[i + start] & 0xff);
}
cc.setChars(cbuf, 0, bc.getLength());
result = cc.toString();
cc.recycle();
}
return result;
}

請特別注意紅色區塊，當 Tomcat 發現 QueryString 並沒有設定 encode 時，並非像文件中所說預設採用 ISO-8859-1 的編碼，而是用一段 fast conversion 來處理，才會造成中文問題，所以，還是必須在 Server.xml 中，加上 URLEncoding 的參數設定才行哦。

Connector 的設定範例：

debug="0"
acceptCount="100"
connectionTimeout="20000"
disableUploadTimeout="true"
port="80"
redirectPort="8443"
enableLookups="false"
minSpareThreads="25"
maxSpareThreads="75"
maxThreads="150"
maxPostSize="0"
URIEncoding="ISO-8859-1"
>

Monday, February 21, 2005

Solaris Ethernet Drivers for ADMtek & Macronix based chips

Solaris Ethernet Drivers - Main: "Solaris Ethernet Drivers"

Found the Solaris 10 x86 is not detecting my NIC. So, this URL might be helpful.

Thursday, February 17, 2005

SCO Group is delisted from NASDAQ!

"You have the day!" I would say to SCO.

LWN: Welcome to LWN.net: "SCO Group to be delisted
[Commerce] Posted Feb 17, 2005 14:19 UTC (Thu) by corbet

The SCO Group has put out a press release informing the world that it is being kicked out of the NASDAQ market for failure to comply with the reporting requirements. SCO is appealing the decision. 'The Company has been unable to file its Form 10-K for the fiscal year ended October 31, 2004 because it continues to examine certain matters related to the issuance of shares of the Company's common stock pursuant to its equity compensation plans. The Company is working to resolve these matters as soon as possible and expects to file its Form 10-K upon completion of its analysis.'

Tuesday, February 15, 2005

Java Forums - Can I serialize Message objects?

Java Forums - Can I serialize Message objects?: "Re: Can I serialize Message objects?
Author: agnesjuhasz Apr 8, 2002 2:30 AM (reply 4 of 4)
Hi Avanish,

I could resolve the serialization by this way

// on the client side
MimeMessage mimemessage = new MimeMessage((javax.mail.Session)null);
// do what you need
...
// put the content of mimemessage into encoded String what is Serializable
ByteArrayOutputStream baos = new ByteArrayOutputStream();
message.writeTo(baos);
byte[] bytearray = baos.toByteArray();
Base64Encoder encoder = new Base64Encoder();
String base64encodedmessage = encoder.encode(bytearray);

// On the server side
// decode the received string
Base64Decoder decoder = new Base64Decoder();
byte[] bytearray = decoder.decodeBuffer(base64encodedmessage );
ByteArrayInputStream bais = new ByteArrayInputStream(bytearray);

mailprops.setProperty('mail.from',sender);
Session session = Session.getInstance(mailprops,null);
session.setDebug(debug);

MimeMessage mimemessage = new MimeMessage(session,bais);

Hope this helps.
Agnes"

Monday, February 07, 2005

Inserting some text to TextArea in Internet Explorer...

// input is the TextArea object
input.focus();
var oSel=document.selection;
if (oSel && oSel.createRange){
oSel.createRange().duplicate().text = insText;
}

for Gecko base:
var len = input.selectionEnd;
input.value = input.value.substr( 0, len ) + insText + input.value.substr(len);
input.setSelectionRange(len+insText.length,len+insText.length);

Saturday, February 05, 2005

howto use "update-alternatives"...

update-alternatives --verbose --install /usr/bin/java java /usr/local/jdk/bin/java 500 --slave /usr/share/man/man1/java.1 java.1 /usr/local/jdk/man/man1/java.1

this: add java

Spam Laws

Here is some information about the Spam laws in the world. So far, Hong Kong has no such regulations.

Spam Laws: "Spam Laws"

Friday, February 04, 2005

Java Forums - Why no font.properties.zh_HK?

Here is a follow up about the HK font stuff in jdk 1.4
Java Forums - Why no font.properties.zh_HK?: "Java Forums - Why no font.properties.zh_HK?"

Java Forums - 1999 or 2001 version of HKSCS

I am finding some information on the conversion of Hong Kong Character set in Java. I tried to import a csv file with String in Big5 encoding. I found that some of the character cannot be
converted.

Java Forums - 1999 or 2001 version of HKSCS: "Java Forums - 1999 or 2001 version of HKSCS"

Just a simple idea....

- Just think to write a classifier to classify the gender of a given chinese name.

Thursday, February 03, 2005

Linux: TCP Random Initial Sequence Numbers

Linux: TCP Random Initial Sequence Numbers: "Linux: TCP Random Initial Sequence Numbers"

The following are copied from kerneltrap (the above source). THe article is quite well written and explains the implementation issue of TCP Sequence number in LINUX.

rom: linux AT horizon.com
Subject: Re: [PATCH] OpenBSD Networking-related randomization port
Date: 29 Jan 2005 07:24:29 -0000

> It adds support for advanced networking-related randomization, in
> concrete it adds support for TCP ISNs randomization

Er... did you read the existing Linux TCP ISN generation code?
Which is quite thoroughly randomized already?

I'm not sure how the OpenBSD code is better in any way. (Notice that it
uses the same "half_md4_transform" as Linux; you just added another copy.)
Is there a design note on how the design was chosen?

I don't wish to be *too* discouraging to someone who's *trying* to help,
but could you *please* check a little more carefully in future to
make sire it's actually an improvement?

I fear there's some ignorance of what the TCP ISN does, why it's chosen
the way it is, and what the current Linux algorithm is designed to do.
So here's a summary of what's going on. But even as a summary, it's
pretty long...

First, a little background on the selection of the TCP ISN...

TCP is designed to work in an environment where packets are delayed.
If a packet is delayed enough, TCP will retransmit it. If one of
the copies floats around the Internet for long enough and then arrives
long after it is expected, this is a "delayed duplicate".

TCP connections are between (host, port, host port) quadruples, and
packets that don't match some "current connection" in all four fields
will have no effect on the current connection. This is why systems try
to avoid re-using source port numbers when making connections to
well-known destination ports.

However, sometimes the source port number is explicitly specified and
must be reused. The problem then arises, how do we avoid having any
possible delayed packets from the previous use of this address pair show
up during the current connection and confuse the heck out of things by
acknowledging data that was never received, or shutting down a connection
that's supposed to stay open, or something like that?

First of all, protocols assume a maximum packet lifetime in the Internet.
The "Maximum Segment Lifetime" was originally specified as 120 seconds,
but many implementations optimize this to 60 or 30 seconds. The longest
time that a response can be delayed is 2*MSL - one delay for the packet
eliciting the response, and another for the response.

In truth, there are few really-hard guarantees on how long a packet can
be delayed. IP does have a TTL field, and a requirement that a packet's
TTL field be decremented for each hop between routers *or each second of
delay within a router*, but that latter portion isn't widely implemented.
Still, it is an identified design goal, and is pretty reliable in
practice.

The solution is twofold: First, refuse to accept packets whose
acks aren't in the current transmission window. That is, if the
last ack I got was for byte 1000, and I have sent 1100 bytes
(numbers 0 through 1099), then if the incoming packet's ack isn't
somewhere between 1000 and 1100, it's not relevant. If it's
950, it might be an old ack from the current connection (which
doesn't include anything interesting), but in any case it can be
safely ignored, and should be.

The only remaining issue is, how to choose the first sequence number
to use in a connection, the Initial Sequence Number (ISN)?

If you start every connection at zero, then you have the risk that
packets from an old connection between the same endpoints will
show up at a bad time, with in-range sequence numbers, and confuse
the current connection.

So what you do is, start at a sequence number higher than the
last one used in the old connection. Then there can't be any
confusion. But this requires remembering the last sequence number
used on every connection ever. And there are at least 2^48 addresses
allowed to connect to each port on the local machine. At 4 bytes
per sequence number, that's a Petabyte of storage...

Well, first of all, after 2*MSL, you can forget about it and use
whatever sequence number you like, because you know that there won't
be any old packets floating around to crash the party.

But still, it can be quite a burden on a busy web server. And you might
crash and lose all your notes. Do you want to have to wait 2*MSL before
rebooting?

So the TCP designers (I'm now on page 27 of RFC 793, if you want to follow
along) specified a time of day based ISN. If you use a clock to generate
an ISN which counts up faster than your network connection can send
data (and thus crank up its sequence numbers), you can be sure that your
ISN is always higher than the last one used by an old connection without
having to remember it explicitly.

RFC 793 specifies a 250,000 bytes/second counting rate. Most
implementations since Ethernet used a 1,000,000 byte/second counting
rate, which matches the capabilities of 10base5 and 10base2 quite well,
and is easy to get from the gettimeofday() call.

Note that there are two risks with this. First, if the connection actually
manages to go faster than the ISN clock, the next connection's ISN will
be in the middle of the space the earlier connection used. Fortunately,
the kind of links where significant routing delay appear are generally
slower ones where 1 Mbyte/sec is a not too unreasonable limit. Your
gigabit LAN isn't going to be delaying packets by seconds.

The second is that a connection will be made and do nothing for 4294
seconds until the ISN clock is about to wrap around and then start
doing packets "ahead of" the ISN clock. If it then closes the connection
and a new one opens, once again you have sequence number overlap.

If you read old networking papers, there were a bunch of proposals for
occasional sequence number renegotiation to solve this problem, but in the
end people decided to not worry about it, and it hasn't been a problem
in practice.

Anyway... fast forward out of the peace and love decade and welcome to
the modern Internet, with people *trying* to mess up TCP connections.
This kind of attack from within was, unfortunately, not one of the
scenarios that the initial Internet designers considered, and it's
been a bit of a problem since.

All this careful worry about packets left over from an old connection
randomly showing up and messing things up, when we have people *creating*
packets deliberately crafted to mess things up! A whole separate problem.
In particular, using the simple timer-based algorithm, I can connect to
a server, look at the ISN it offers me, and know that thats the same
ISN it's offering to other people connecting at the same time. So I
can create packets with a forged source address and approximately-valid
sequence numbers and bombard the connection with them, cutting off that
server's connection to some third party. Even if I can't see any of
the traffic on the connection.

So people sat down and did some thinking. How to deal with this?
Harder yet, how to deal with this without redesigning TCP from scratch?

Well, if a person wants to mess up their *own* connections, we can't
stop them. The fundamental problem is that an attacker A can figure
out the sequence numbers that machines B and C are using to talk to
each other. So we'd like to make the sequence numbers for every
connection unique and not related to the sequence numbers used on any
other connections. So A can talk to B and A can talk to C and still not
be able to figure out the sequence numbers that B and C are using between
themselves.

Fortunately, it is entirely possible to combine this with the clock-based
algorithm and get the best of both worlds! All we need is a random offset,
unique for every (address, port, address, port) quadruple, to add to
the clock value, and we have all of the clock-based guarantees preserved
within every pair of endpoints, but unrelated endpoints have their ISNs
at some unpredictable offset relative to each other.

And for generating such a random offset, we can use cryptography.
Keep a single secret key, and hash together the endpoint addresses,
and you can generate a random 32-bit ISN offset. Add that to the
current time, and everything is golden. A can connect to B and
see and ISN, but would need to do some serious cryptanalysis to
figure out what ISN B is using to talk to C.

Linux actually adds one refinement. For speed, it uses a very
stripped-down cryptographic hash function. To guard against that
being broken, it generates a new secret every 5 minutes. So an
attacker only has 5 minutes to break it.

The cryptographic offset is divided into 2 parts. The high 8 bits are
a sequence number, incremented every time the secret is regenerated.
The low 24 bits are produced by the hash. So 5 minutes after booting,
the secret offset changes from 0x00yyyyyy to 0x01zzzzzz. This is at
least +1, and at most +0x1ffffff. On average, the count is going up
by 2^24 = 16 million every 300 seconds. Which just adds a bit to the
basic "1 million per second" ISN timer.

The cost is that the per-connection part of the ISN offset is limited
to 24 and not 32 bits, but a cryptanalytic attack is pretty much
precluded by the every-5-minutes thing. The rekey time and the number of
really-unpredictable bits have to add up to not wrapping the ISN space
too fast. (Although the 24 bits could be increased to 28 bits without
quite doubling the ISN counting speed. Or 27 bits if you want plenty
of margin. Could I suggest that as an improvement?)

--- drivers/char/random.c 2004-12-04 09:24:19.000000000 +0000
+++ drivers/char/random.c 2005-01-29 07:20:37.000000000 +0000
@@ -2183,26 +2183,26 @@
#define REKEY_INTERVAL (300*HZ)
/*
* Bit layout of the tcp sequence numbers (before adding current time):
- * bit 24-31: increased after every key exchange
- * bit 0-23: hash(source,dest)
+ * bit 27-31: increased after every key exchange
+ * bit 0-26: hash(source,dest)
*
* The implementation is similar to the algorithm described
* in the Appendix of RFC 1185, except that
* - it uses a 1 MHz clock instead of a 250 kHz clock
* - it performs a rekey every 5 minutes, which is equivalent
- * to a (source,dest) tulple dependent forward jump of the
+ * to a (source,dest) tuple dependent forward jump of the
* clock by 0..2^(HASH_BITS+1)
*
- * Thus the average ISN wraparound time is 68 minutes instead of
- * 4.55 hours.
+ * Thus the average ISN wraparound time is 49 minutes instead of
+ * 4.77 hours.
*
* SMP cleanup and lock avoidance with poor man's RCU.
* Manfred Spraul [email blocked]
*
*/
-#define COUNT_BITS 8
+#define COUNT_BITS 5
#define COUNT_MASK ( (1<-#define HASH_BITS 24
+#define HASH_BITS 27
#define HASH_MASK ( (1<
static struct keydata {

Anyway, in comparison, the algorithm in your patch (and presumably
OpenBSD, although I haven't personally compared it) uses a clock
offset generated fresh for each connection. There's a global counter
(tcp_rndiss_cnt; I notice you don't have any SMP locking on it) which
is incremented every time an ISN is needed. It's rekeyed periodically,
and the high bit (tcp_rndiss_msb) of the delta is used like the COUNT_BITS
in the Linux algorithm.

The ISN is generated as the high sequence bit, then 15 bits of encrypted
count (with some homebrew cipher I don't recognize), then a constant
zero bit (am I reading that right), then the 15 low-order bits are
purely random.

It's a slightly different algorithm, but it does a very similar function.
The main downsides are that the sequence number can easily go backwards
(there's no guarantee that consecutive calls will return increasing
numbers since tcp_rndiss_encrypt scrambles the high 15 bits), and
that it's not SMP-safe. Two processors could read and use the same
tcp_rndiss_cnt value at the same time, or (more likely) both call
tcp_rndiss_init at the same time and end up toggling tcp_rndiss_msb twice,
thereby destroying the no-rollback property it's trying to achieve.

Oh, and the single sequence bit in the offsets means that the
TCP ISN will wrap around very fast. Every 10 minutes, or every
60000 TCP connections, whichever comes first.

Regarding the first issue, it's possible that the OpenBSD network stack
takes care to remember all connections for 2*MSL and continues the
sequence number if the endpoints are reused, thereby avoiding a call to
ip_randomisn entirely.

But the second deserves some attention. The Linux code takes some care
to avoid having to lock anything in the common case. The global count
makes that difficult.

Wednesday, February 02, 2005

[Debian] Linux Journal: compiling javahl for subclipse

[Debian] Linux Journal: compiling javahl for subclipse: "$ apt-get install libtool autoconf g libapr0-dev libneon24-dev $ tar zxvf subversion-1.1.1.tar.gz $ cd subversion-1.1.1 $./autogen.sh $ export JAVA_HOME=/usr/lib/j2se/1.4/ $ ./configure --enable-javahl --with-jdk=$JAVA_HOME --with-jikes=$JAVA_HOME/bin/javac $ make $ mkdir subversion/bindings/java/javahl/classes # have problems finding DirEntry.class otherwise $ make javahl $ make install-javahl "

Web: Page Containing Non-Secure Item?

Page Containing Non-Secure Item?: "Page Containing Non-Secure Item"

If you find some warning of secure / non-secure stuff, please see this.

Tuesday, February 01, 2005

"Difference between Workflow and Rule engines"

: "Difference between Workflow and Rule engines

There are some blurry lines here. My quick answer is:

Workflow - typically a flow of information and actions associated with people. This term became popular 15-20 years ago with things like expense report approvals, or imaging systems that would route things like credit card receipts and problems to different people to take action on rather than sending paper around with a sign-off sheet on top.

BPM - Business Process Management. This term became popular over the past 5-10 years and typically refers to a combination of people and system oriented processes. So say someone approves an expense report, but then it kicks off a series of actions in several systems like payroll and general ledger.

BPM Engine - keeps track of the state of various items as they pass thru a graph of tasks and actions. Think of drawing a typical flow chart where the BPM engine keeps track of each items flowing thru the flow chart. Go to http://www.jbpm.org for some good overview docs and pointers.

Rules Engine - allows for complex set of rules to be applied to a complex set of data to make decisions. From the http://www.drools.org web site: 'Rule Engines and expert systems are a great way to collect complex decision-making logic and work with data sets too large for humans to effectively use. Rule engines can make decisions based on hundreds of thousands of facts, quickly, reliably and repeatedly.'"

Stephen's Technical Blog