|
What is Data
INTRODUCTION
Data is
the life-blood of every business, but why?
Simply because
customer knowledge lies at the heart of every organisation and used correctly
can make sales, marketing, business and customer development more productive.
This
intelligence however is seldom visible or held in a format that is easy to
interpret and apply. Very often it also has to be derived from some kind of
data transformation or analysis process.
Therefore any
company or organisation that wants to benefit from its in-house intelligence
needs to look to its data.
The following gives a very
brief overview as to some of the basics but if you want further explanation
please give Tech4T a call on +44 (0)1733 890790 and we will give you some free
advice.
WHAT DOES DATA LOOK LIKE?
Data is information
that has been captured and then stored electronically in a variety of ways and
formats. e.g. information (data) captured from:
-
Data
entry screens when processing orders
-
From
Customer Relationship Management (CRM) incorporating sales lead systems
-
Accounts
-
Web-based systems
-
Disparate customer or acquisition databases
-
Technical
support systems
-
Electronic
points of sale (tills that capture customer information)
-
Market
research surveys
-
Spreadsheets
-
Mailing
lists, etc.
HOW IS IT
STORED - TERMINOLOGY
Data is
stored in files as rows and columns; these files are often referred to
as tables or lists.
· A
row is usually referred to as a record or
case (in
statistical terms).
A row could represent a single customer record or a sales transaction - all
information about a single customer or a single sales transaction in a single
row
· A
column is usually referred to as a
field
or a
variable (in
statistical terms).
A column could represent a single piece of information within a row (record).
For example, ‘Title’, ‘FirstName’, ‘Surname’, ‘Address Line 1’, etc., are all
fields
Where data is stored as
text (ASCII), columns are either separated by either a delimiter or are aligned
in fixed positions, known as a fixed field format.
A delimiter is usually a
tab, comma, pipe '|', right square bracket ']' and to avoid columns being
incorrectly split, text - such as address lines - that may contain a character
used as a delimiter, are enclosed with
quotes. Also a
header record is often included which describes the data.
Below is an example
data file of
a comma separated file
with
surrounding quotes and a
header record:
|
"URN","Name","Address1","Address2","Town","Postcode" |
|
"1","Joe Brown","Flat 3,
The White Cottage","Oscar Road","Newtown","BD1
4LW" |
|
"2","Heny
Smith","11
Chestnut Drive","","Peterborough","PE1
2UT" |
Other file
formats are unique to the software being used to store the data and can usually
be identified by the file extension. Text (ASCII) files typically have the
extension .txt or .dat or .tab or .csv. dBASE files
use .dbf, MS Excel .xls, SPSS .sav, MS Access .mdb
and so on.
The
information stored in the columns and rows can be held in varying formats - the
three most common being ‘Text’, ‘Values’ and ‘Dates’.
Often constraints are
applied when information is captured to ensure data is always stored in the
correct format. e.g. forcing UPPER case when capturing a Postcode; Proper case
(upper and lower casing) on names and addresses; forcing the correct number of
decimal places for numeric data such as sales values; selecting items to enter
from a pick list (look-up file or list) to ensure only allowed items can be
added such as a list of products, media source codes, gender descriptions, etc.
The UK Postcode has its own
unique structure and in itself is a powerful marketing, analysis and selection
source.
POSTCODES AND POSTAL
GEOGRAPHY
The Royal Mail maintains a UK-wide
system of postcodes to identify postal delivery areas. As most people know
their postcode, varying organisations use this as their main geographic
reference when collecting data. This reference can be related to any geographic
unit used for statistical production, such as a district or electoral ward. In
addition, when de-duplicating customer and prospect name and address records,
the Postcode can be used to help qualify the match process. Postal geography is
thus very valuable.
The UK Postcode
has 5 components. 4 are visible for addressing purposes as set out below, the
5th - delivery point suffix (or address key) is used by the Royal Mail to
define a unique address delivery point.
Postcode structure
Postcodes are alphanumeric
references comprising an outward code of 2-4 characters and an inward code of 3
characters. For example:
PO16 7DZ
PO16 = outward code 7DZ = inward code
The postcode is structured
hierarchically, supporting 4 levels of geographic unit:
As of September 2004, the breakdown
is as follows:
|
Geographic
unit |
Number in UK |
Approx households per unit |
| PO - Postcode
area (one or two characters) |
124 |
221,774 |
PO16 - Postcode
district (one or two numbers)
NB. Sometimes presented as a number followed by
a single character in London for example - EC1W |
2,934 |
9,373 |
| PO16 7 -
Postcode sector |
9,903 |
2,777 |
| PO16 7DZ - Unit
postcode |
1.76 million
approx |
15.63 |
These 1.76 million postcodes cover
over 27.5 million delivery points and comprise 1.55 million small user and 0.21
million large user postcodes (see below).
Unit postcodes
Unit postcodes are the base unit of
postal geography and fall into two types:
Large user postcodes: allocated to
single addresses receiving at least 500 mail items per day (e.g. business
addresses).
Small user postcodes: collections of
(usually) adjacent addresses. A single small user postcode may contain up to
100 addresses, but 15 is a more typical number.
Note: It is possible for large
buildings with many separate delivery points (e.g. a tower block) to have more
than one unit postcode within the building.
Postcode structure recoding
As postcode components can vary in length, to
enable appropriate alignment for sorting (for de-duplication or data analysis)
a fixed structure is preferred. One option is as follows:
Left justify
Postcode area (one or two characters) in positions 1 and 2
Right justify
postcode district (1 or 2 digits or 1 digit and a letter for London) in
position 3 and 4
Sector (single
digit) goes in position 5
Unit, 2 letters,
go in position 6 and 7
B1 2NZ becomes B__12NZ
B11 3NQ becomes B_113NQ
DATABASE
A
database comprises a collection of files (or tables) that are linked in a
particular way to eliminate repeating data and facilitate extraction of
selected information to form a particular customised ‘view’ or to create a new
file containing just the selected fields and data.
Where multiple databases
need to be incorporated in a company wide information system, a large database
is created to act as a central store - taking its data feeds from company-wide
operational databases. This is termed a data warehouse. A
subset of this, designed for a particular department or a specific application, is termed a data mart.
The
usual database structure is termed a relational (or normalised) database where,
for example, an address for a company is held only once in one table (or file),
and all contacts within that company (stored in a separate table) relate to
that address record.
The example above shows a few
fields from a customer record (one record per customer) linked to multiple
sales transactions.
The two
tables link on what is termed a URN - unique reference number. This is the main
way a record (or case) is identified - by its URN.
For
analytics and fast data interrogation, however, (to find, for example, all
customers who spent over £1000 last month) the less tables that need to be
interrogated will result in a faster result.
Therefore data files need to be combined in
as few tables as possible with summarised and repeated data, This is also how
data is best presented for statistical analysis.
This process,
termed de-normalisation, can also include file flattening (reducing for
example, a number of sales transactions into a single customer summarised
record).
Technologies4Targeting usually de-normalise data for analysis and
targeting.
USING YOUR
DATA
The process of
transforming your data into intelligence can be complex and best left to data
experts. Our website denotes much of its space to this process but the
following steps are usually what most companies need to follow:
-
Define your business and marketing goals
-
Identify what intelligence you need to meet
your goals
-
Work out what data might give you the basis
of the intelligence you need and identify all relevant sources
-
Extract the data - samples to start with -
and work out how it does or doesn't link together. Then once you are clear
what you need, extract and audit the data, identifying and labelling what
each piece of information is, what it relates to and why it was captured if
possible
-
Then validate and clean any addresses and
standardise formats of all other fields - especially any coding or
descriptions
-
Next you need to remove duplicate records
and merge the varying data files, even if they don't link and remembering not
to lose any information from removed records, or, leaving any transactions
that were previously linked to a deleted contact without a parent!
-
Now enhance the data using external lists
and suppression files
-
Next comes the transformation process to
prepare the data for analysis. Hopefully you will now have a single view of
all data that, with the right analytics, can provide the intelligence you
need
-
Nearly done. Finish the analysis - keeping
in mind your business goals - and create the output. This could be simple
reports but more likely it will a file of some kind to help you apply your
findings
-
Finally make the intelligence work for you
company - whatever it takes!
|