江端さんの忘備録(2020-04-07)

2020-04-07 「ITエンジニアではない元東京都知事は、詳しく知らないで発言したのだろう」と思っています。 [長年日記]

『1億人ぐらいのアンケートデータをSNSで収集して、DB(データベース)につっこんで、集計する程度のシステムなら、私の自宅のパソコンでも作れる』ということを、

"A system that collects about 100 million questionnaire data on SNS, inserts it into a DB (database), and aggregates, can be created on even my personal computer at home"

「ITエンジニアではない元東京都知事は、詳しく知らないで発言したのだろう」と思っています。

"The former Governor of Tokyo, who is not an IT engineer, would have spoken without knowing the details."

多分、この程度のシステム構築なら、1日あればできると思います。

Probably, I think that this kind of system construction can be done in one day.

1億人程度のレコードの記録など、スクリプト言語で処理できますし、データ集計も一人のオペレータが、数分あれば完了する程度です。

Such as recording about 100 million records, can be processed by a script language and the data aggregation can be completed in a few minutes by a single operator.

-----

今回、厚生労働省が実施した、

This time, the Ministry of Health, Labor and Welfare implemented

『無料通信アプリのLINE(ライン)を使った、新型コロナウイルス対策のために厚生労働省と協力して実施した全国アンケート』

"A nationwide survey conducted by the Ministry of Health, Labor and Welfare using the free communication application "LINE" to combat new coronavirus"

は「珍しく大あたりのシステム」と思っています。

is "unusual jackpot system", I think.

なぜなら、

Because, this system

■利用者は勿論、システム管理側も、早く、安く、簡単にデータ処理できる

can process data for users, as well as system managers, quickly, cheaply, and easily.

■無記名、匿名性の担保

can secure anonymity.

■有意な回答が集めやすい(興味のない人間は、誤答を入力する手間をかけるくらいなら、無視するだろう)

is easy to collect meaningful answers (uninterested humans will ignore it if they take time to enter wrong answers)

からです。

こんなアホみたいな低コストで、数千万人の単位のデータが収集できる ―― こんな「美味しいシステム」滅多にありません。

In this way, this system can collect data of tens of millions of people at a low cost. Such a “delicious system” is rare.

-----

ちなみに、私、3億人分のシミュレーションを、自宅のパソコンで、連続1万回やったことありますけど、その時は、14時間程度かかりました。

By the way, I ran a simulation of 300 million people on my home computer 10,000 times in a row, and it took about 14 hours.

つまり、1億人分のデータ処理は、0.2秒以内で完了しているということです。

In other words, data processing for 300 million people is completed in about 0.5 seconds.

シミュレーションによるメモリ書込みを、DB書込みと同程度と見なせるかと問われれば、最近のDBの書込みは、メモリ書込みよりも早いことがあります(本当)

If someone ask me whether memory writing by simulation can be considered as similar to DB writing, I can say that recent DB writing may be faster than memory writing (It is true)

今回のLINEによる、1億人分のDBによる書込みや検索は、CPUトータル時間なら、数秒程度だったかもしれません。

Writing and searching with 100 million DBs by LINE this time could have taken a few seconds if the total CPU time was used.

-----

この元東京都知事は、データ集計に「数十人がデータ結果を印刷したり、電卓叩いたり」と、そんな作業をしているイメージを持ってしまったのかもしれません。

The former Governor of Tokyo may have had the image of "several dozens of people print data results and hit calculators" in data aggregation.

しかし、そのイメージは正しくありません。

But that image is not correct.

『集計オペレーションと資料作成は、1人のエンジニアが、数時間以内に完了していた』

"Aggregation operations and documentation were completed within a few hours by one engineer."

と推認できます。

I estimated that.

なぜなら、私程度のエンジニアでも、1億行のSQL処理くらい、この時間内で完了できるからです。

That's because even an engineer, like me, can complete about 100 million lines of SQL processing in this time.

最近の、ビッグデータやらSNSのメッセージ処理のパフォーマンスは、現場にいる私ですら、時々、驚かされるくらいです。

The performance of big data and SNS message processing these days has sometimes surprised me even on site.

ですから、彼の発言は、仕方がなかったかもしれません。

So his remarks could not have been helped.

-----

それでも、自分の得意分野でないことについての発言は、少し時間をかけて自力で調べるか、有識者に尋ねてから発言された方がいいと思います。

Nevertheless, if he is saying something that is not his specialty, it is better to spend a little time researching it himself or asking an expert before speaking.

何かや誰かを批判する時は、特に留意すべきでしょう。

We should be especially careful when criticizing something or someone.

自戒を込めて、この日記を記載してます。

I write this diary with my self-awareness.

[ツッコミを入れる]