企業資料整合 · ETL 平台 Enterprise Data Integration · ETL Platform
你的資料,
不該是一團亂。
Your Data Is
a Mess. Let's Fix It.
「IT 部門花了三個月還搬不完資料,報表每次跑出來數字都不一樣——因為沒人知道哪個系統的資料才是對的。」 "Your IT team spent three months and still can't finish the migration. Reports show different numbers every time — because no one knows which system has the right data."
如果這聽起來很熟悉,你不孤單。我們專門解決這種問題。 If this sounds familiar, you're not alone. This is exactly what we do.
⚙️ 我們不是賣軟體——我們是替你做完整理工作的人。做完之後,乾淨的資料庫完整交給你,不需要訂閱、不需要登入。 ⚙️ We're not selling software — we're the team that does the cleanup work. When we're done, the cleaned database is yours to keep. No subscription, no login.
PhantomDecomp 是什麼?一句話:整理你內部已有的資料。 What is PhantomDecomp? In one sentence: we tidy up the data you already have inside your company.
把分散在各系統(SQL Server / MariaDB / Excel / 舊系統)的資料合成一份乾淨的。如果您要的是「查外部公開資料」(例如查地號、查公司登記),那是我們另一個產品 CloakFetch ↗。 We unify data scattered across your internal systems (SQL Server / MariaDB / Excel / legacy DB) into one clean database. If you need to query external public data (Taiwan business registry, parcel lookups), that's our separate product CloakFetch ↗.
這些情境,是不是很熟悉? Sound Familiar?
四個系統,四個版本的真相 Four Systems, Four Versions of the Truth
業務用一套、財務用一套、客服又是另一套。每次開會,大家拿出來的數字都不一樣。老闆問「到底哪個才對?」——沒人敢回答。 Sales uses one system, finance uses another, and customer service has their own. Every meeting, everyone's numbers are different. The boss asks "which one is right?" — nobody dares answer.
IT 說要半年,預算要兩百萬 IT Says Six Months and $60K
你只是想把舊系統的資料搬到新系統。聽起來很簡單,對吧?結果 IT 說「格式不一樣、編碼有問題、欄位對不上」——然後報了一個你聽不懂的天文數字。 You just want to move data from the old system to the new one. Sounds simple, right? Then IT says "formats don't match, encoding issues, fields don't align" — and quotes a number that makes your head spin.
搬完了,資料卻爛掉了 Migration Done, Data Ruined
地址變亂碼、電話少了區碼、同一個人變成三筆資料。花了半年搬完,結果新系統的報表比舊的還不準——早知道就不搬了。 Addresses became garbled, phone numbers lost area codes, one person became three records. After six months of migration, the new reports are less accurate than the old ones.
三步,從混亂到清晰 Three Steps. Chaos to Clarity.
告訴我你的狀況 Tell Me Your Situation
你有幾個系統?資料量大概多少?哪些報表不準?不需要準備任何文件,用講的就好。 How many systems? Roughly how much data? Which reports are wrong? No documents needed — just tell me.
我們幫你把資料理乾淨 We Clean and Unify Your Data
地址修正、電話補齊、重複資料合併、格式統一。你不需要懂技術,我們處理。 Fix addresses, complete phone numbers, merge duplicates, unify formats. You don't need to understand the tech — we handle it.
你拿到一份乾淨的資料庫 You Get One Clean Database
所有系統的資料匯集在一起,每個人查到的數字都一樣。報表終於可以信了。 All your systems' data in one place. Everyone sees the same numbers. Reports you can finally trust.
- ✓整合後的乾淨資料庫(你指定格式:SQL Server / MySQL / Excel / DuckDB)Clean integrated database (in your chosen format: SQL Server / MySQL / Excel / DuckDB)
- ✓處理紀錄(每筆資料怎麼變化都可追溯)Audit log (every record's transformation traceable)
- ✓未來維護建議書Maintenance recommendation document
我們已經用這個方法在內部專案中整合過將近 5 億筆紀錄。不是理論,是做出來的。 We've integrated nearly 500 million records on internal validation projects with this method. Not theory — real, delivered work.
以上是我們內部累積驗證的成果。實際幫您處理時,最終品質視您原始資料的狀況而定——我們會在 48 小時免費評估報告中給您針對您資料的具體預估。 Numbers reflect our internal validation work. When we work on your data, the final quality depends on your source — we'll give you concrete estimates specific to your situation in the free 48-hour assessment.
我們具體幫你做什麼 What We Actually Do for You
六個模組,解決六種最常見的資料問題 Six modules for six of the most common data problems
把不同系統的資料搬到一起 Bring All Your Systems Together
不管你的資料在哪——舊系統、新系統、Excel 檔案——我們都能把它們匯集到同一個地方,而且搬的過程不會搞丟任何東西。中途出狀況可以從上次的進度繼續,不用整個重來。 Wherever your data lives — legacy systems, new platforms, Excel files — we bring it all together without losing a single record. If something interrupts the process, we resume from the last checkpoint instead of starting over.
修好那些「看起來對但其實錯」的資料 Fix Data That Looks Right but Isn't
您的客戶名單裡,身分證有的全大寫有的混雜小寫、生日有的西元有的民國、性別欄位「男 / M / 1 / 先生」四種寫法都有——我們把這些通通統一掉,變成同一套乾淨格式。 In your customer list, IDs come in mixed letter cases, birth dates are stored as ROC year or CE year, gender is recorded as "男 / M / 1 / 先生" — four different ways for the same thing. We unify all of it into one clean standard.
讓每一筆地址都找得到人 Make Every Address Actually Findable
舊資料裡的地址常常是「台北縣」(早就改制成新北市了)、全形半形混用、漏寫巷弄、有的甚至寫成手寫筆記——我們把每一筆地址修到可以對應、可以寄信、業務看得懂的狀態。 Old records have outdated city names (e.g. "Taipei County" merged into New Taipei a decade ago), mixed character widths, missing alley numbers, even handwritten notes. We fix every address so it's matchable, mailable, and your sales team can actually use it.
找出客戶之間的關聯(誰是誰的家人 / 股東 / 利害關係人) Find Hidden Connections Between Customers
同一個家庭的成員、共同股東、配偶、繼承關係——這些藏在資料裡的關聯,靠人工一筆一筆比對要看到死。我們從您的資料自動把這些連結建出來,做資產調查、風險評估、客戶分析時,一眼看完整關係網。 Family members, joint shareholders, spouses, inheritance lines — these connections hide in your data and would take forever to map by hand. We surface them automatically so asset investigations, risk reviews, and customer analysis show the full picture at a glance.
把分散的土地、建物登記資料串成完整產權地圖 Turn Scattered Land Records Into a Complete Title Map
土地登記、建物登記、所有權人資料分散在各個檔案——我們串成一張完整的產權地圖,每一筆共有人都拆出來、每一塊地連到實際的人。做產權追蹤、繼承推導、資產調查時不用再手動對照到頭暈。 Land titles, building registrations, owner records all live in separate files. We stitch them into one complete title map: every co-owner decomposed, every parcel traced to its actual people. Title tracking, inheritance analysis, and asset investigations become a one-step lookup.
讓每通電話都打得通 Make Every Phone Number Actually Work
電話少了區碼、手機跟市話混在同一欄、有的還是分機號碼——客服打了一半才發現號碼是錯的。整理完之後,每一筆電話都是能直接撥出去的格式。 Missing area codes, mobile and landline jumbled in the same column, some entries are just extensions — your customer service team finds out mid-call. After we're done, every number is in a directly-dialable format.
你可能遇到的狀況 Situations You Might Recognize
三個我們做過或聽過的真實情境,看看哪一個跟你最像 Three real situations we've handled or heard — see which one fits your case
合併兩家公司,客戶名單對不上 Two companies merged — customer lists won't match
王老闆併購了一家小公司,發現兩邊系統的客戶名單一搬就亂——同一個客戶因為地址寫法不同、電話格式不同,被當成不同人,重複寄 DM、重複請款、客服還會打到對方公司。我們花 3 週把兩邊資料對上,交付一份統整後的客戶總表,每個客戶只有一筆紀錄。 A CEO acquired a smaller company and discovered both customer lists became chaos when merged — the same customer counted as two different people because address formats differed, phone formats differed. Result: duplicate mailers, duplicate invoices, customer service calling the wrong contact. We spent 3 weeks reconciling both sides and delivered one unified master list with one record per customer.
老客戶資料 20 年沒整理,DM 一半退件 20 years of old data — half the DMs come back
行銷部寄了 5 萬封 DM,2 萬封地址無法投遞退回。原因是地址寫的還是 20 年前的舊地名(「台北縣」、被合併前的舊路名、改編過的門牌號)。我們對應到目前的行政區劃、自動更新門牌、補齊巷弄資訊,把無法投遞的地址降到 5%。 A marketing team mailed 50,000 DMs; 20,000 came back undeliverable. The addresses still used names from 20 years ago — old county names, pre-merger road names, renumbered house numbers. We mapped everything to the current administrative boundaries, auto-updated street numbers, filled in missing alley details, and dropped the undeliverable rate to 5%.
報表三套不同數字,老闆要的是「真相」 Three reports, three different numbers — the boss wants the truth
業務部說有 12 萬客戶、客服部說有 8 萬、財務部說有 15 萬。老闆每次開會都搞不清楚到底哪個才是真的。我們把三邊系統的資料合在一起去重、對齊、修正格式,才發現實際是 9.7 萬獨立客戶。從此老闆看一份報表就好。 Sales says 120K customers, customer service says 80K, finance says 150K. The boss can never tell which is right. We pulled all three systems together, deduped, aligned, normalized — turned out the real number was 97K unique customers. One report from then on.
案例為去識別化版本,數字與時程依實際情境調整。 Cases above are de-identified composites; figures and timelines vary by actual situation.
真實案例 Real Case Study
5.4 萬筆地名亂碼修復,1 週交付 54K Garbled Place Names Recovered in 1 Week
客戶問題 The Problem
客戶從 10 年前裝的舊系統轉出資料時,地名出現大量「??」亂碼。業務人員看不懂這是哪裡,導致客戶資料無法分區、行銷無法切名單。 When the client exported records from a legacy system installed 10 years ago, place names came out as bulk "??" garbage. Sales couldn't tell where customers were located, marketing couldn't segment the list.
我們做的 What We Did
對到全國 17,830 個官方地段名稱,用「自動推論 + 人工抽樣校對」雙線進行,把破損的地名一筆一筆還原回原本的鄉鎮市區/路名。剩下無法判斷的標記出來請客戶確認。 Cross-referenced against 17,830 official township and street names, ran an "auto-inference + sample human verification" pipeline, restored every broken place name back to its original district. Remaining ambiguous entries were flagged for the client to review.
成果 Result
- ✓5.4 萬筆地名修復54,000 place names recovered
- ✓46 筆無法判斷標記出來46 ambiguous entries flagged
- ✓時程:1 週Timeline: 1 week
「我們本來打算重新建檔,要花半年。後來發現可以用程式跑回來,只剩下 46 筆需要人工判斷——這 46 筆我們自己處理就好。」 "We were going to rebuild the entire dataset from scratch — would have taken six months. We didn't realize the data could be programmatically recovered, with only 46 entries left for our team to verify by hand."
— 客戶資料部主管(去識別化) — Data Operations Lead (anonymized)
ETL 技術架構 ETL Technical Architecture
從異質來源到統一分析,端到端資料流 End-to-end data flow from heterogeneous sources to unified analytics
- BCP high-speed export
- JSON checkpoint
- Batch parallel load
- ID / Gender / DOB norm
- Address 5-stage clean
- Phone normalization
- Kinship graph build
- Spouse cross-match
- Owner decomposition
您的資料,值得被認真對待 Your Data Deserves Serious Treatment
「你的資料問題不是你的錯——是這些系統從來沒被設計成要一起工作。」 "Your data problem isn't your fault — these systems were never designed to work together."
PhantomDecomp 採專案制服務。我們會根據您的資料規模、來源格式與治理目標,量身規劃整合方案。 PhantomDecomp operates on a project basis. We tailor integration plans based on your data scale, source formats, and governance objectives.
合理價格帶Price Range
NT$15-80 萬NT$150K-800K
依資料規模、系統數量、清洗複雜度報價。比傳統 ETL 廠商(NT$200 萬+)省 60-90%。 Based on data scale, system count, complexity. 60-90% less than traditional ETL vendors (NT$2M+).
⛔ 不按人頭、不收月費、不收伺服器託管費——做完交付即結案。 ⛔ No per-seat fees, no monthly subscription, no hosting fees — once delivered, the engagement is closed.
典型時程Typical Timeline
2-8 週2-8 Weeks
小型專案 2 週、複雜跨系統 8 週。比傳統 6 個月快 4-12 倍。 Small projects: 2 weeks. Complex multi-system: 8 weeks. 4-12x faster than traditional 6-month projects.
告訴我你有幾座資料庫、多少筆紀錄——48 小時內給你一份初步整合評估報告,免費。 Tell me how many databases you have and how many records — I'll deliver a preliminary integration assessment within 48 hours, free.
不需要準備任何文件,一通電話就能開始。 No documents needed — one call is enough to start.
預約 30 分鐘免費諮詢Book a 30-min Free Call
了解你的狀況,給出具體建議Understand your situation, get tailored advice
選擇時段Pick a Time常見問題 FAQ
PhantomDecomp 跟 CloakFetch 有什麼不同? How is PhantomDecomp different from CloakFetch? ▼
CloakFetch 是「查外面的資料」——透過 API 查詢台灣公開企業資訊。PhantomDecomp 是「整理你自己的資料」——把你公司內部散落在 SQL Server、MariaDB、CSV 等不同系統的資料統一清洗、正規化、整合成一份乾淨的資料庫。 CloakFetch queries external public data — Taiwan business info via API. PhantomDecomp organizes your internal data — cleaning, normalizing, and integrating records scattered across SQL Server, MariaDB, CSV, and other systems into one clean database.
需要買很貴的伺服器嗎? Do I need expensive servers? ▼
不用。我們在一台普通的電腦上就處理過將近 5 億筆資料。不需要買叢集、不需要上雲端。你現有的設備通常就夠了。 No. We've processed nearly 500 million records on a standard workstation. No cluster, no cloud infrastructure needed. Your existing hardware is usually enough.
整合過程中,我的資料安全嗎? Is my data safe during integration? ▼
安全。所有處理都在你的環境內進行,資料不會離開你的網路。我們支援加密資料庫直接處理,過程中有斷點續傳機制,不會因為中斷而遺失已處理的結果。 Absolutely. All processing happens within your environment — data never leaves your network. We support encrypted database processing with checkpoint recovery, so nothing is lost if interrupted.
技術人員看這裡:實際處理數據 For Technical Teams: Real Numbers ▼
以下數據來自實際專案驗證(單機 32 GB RAM):Metrics from real-world project validation (single node, 32 GB RAM):
- 總整合量:4.89 億筆 / 25.75 GB(DuckDB 壓縮後)Total: 489M records / 25.75 GB (DuckDB compressed)
- SQL Server BCP 匯出:2.87 億筆 / 24 分鐘SQL Server BCP export: 287M rows / 24 min
- MariaDB 匯入加速:56 倍MariaDB import: 56x speedup
- 編碼修復:2.9 億次 \x00 清除,0% 遺失Encoding repair: 290M \x00 removals, 0% loss
- 地址涵蓋率 99.5%、健康度 90.3%Address coverage 99.5%, health 90.3%
- 身分證命中率 92.5%National ID hit rate 92.5%
技術棧:Python + pyodbc + DuckDB + Next.jsStack: Python + pyodbc + DuckDB + Next.js
系統架構:四層資料流 System Architecture: 4-Layer Data Flow ▼
完整端到端架構,從異質來源到統一分析:End-to-end architecture, from heterogeneous sources to unified analytics:
① 資料來源層① Data Sources
SQL Server (TDE) · MariaDB · CSV/Flat · Legacy DB
▼
② 處理引擎層② Processing Engines
- • 匯入引擎:BCP 高速匯出 + JSON 斷點續傳Ingestion: BCP export + JSON checkpoint
- • 治理引擎:身分證/性別/生日正規化、地址五階段清洗、電話統一Governance: ID/gender/DOB normalization, 5-stage address cleanup
- • 關係引擎:親屬連結建構、配偶交叉比對、所有權人拆表Relationships: kinship graph, spouse cross-match, owner decomposition
▼
③ 分析型資料庫③ Analytical Database
DuckDB · 4.89 億筆 / 25.75 GB / 32 GB RAM489M rows / 25.75 GB / 32 GB RAM
▼
④ 應用層④ Application Layer
Next.js · 查詢介面 / 關係圖譜 / 統計報表 / 資料匯出Query UI / Graph View / Analytics / Export
48 小時免費評估包含什麼? What's included in the 48-hour free assessment? ▼
我們會分析你現有的資料來源、格式、欄位品質,產出一份具體的整合方案書,包含預估工期、清洗規則建議、預期成果。無需提供完整資料,抽樣即可。 We analyze your existing data sources, formats, and field quality, then deliver a concrete integration proposal including estimated timeline, cleaning rules, and expected outcomes. Only sample data needed — no full dataset required.