Binary Files

INTRODUCTION:

Welcome to the 3rd of this file manipulation series, you guessed it by the title already, we'll be covering the concept of binary file access. What are those? The answer comes in more than one flavor. you could say that any file that you can't open in a text editor and read is either a random access file or a binary file. Here are a few examples of such files:

  • Executable Files: These are of course files with the .EXE extension or the .COM extension that can be executed. There is also, for example, files that are produced by compiler called object files which are also binary files and tehre's more too.
  • Word Processor Files: You'd think that a Word document isn't a binary file right? Guess again. Try to open a Word document in good old Notepad, you'll see that there isn't much in there that you can actually read this way.
  • Database and Spreadsheet Files: There's more than one database format. These files are especially created to hold, manage and retrieve data in the fastest means possible. These are also binary files as they can be opened the same way that executable can be opened.
  • Bitmaps and other graphic Files: Although most graphical image file formats has a specific structure to allow to view the actual graphic it is still a binary file in that it can be opened and read just like other binary files.

All this to say that there are many reasons binary files are used hence more than one structure behind them as well. The important thing to know is that if you are provided with the right information, you will be able to open any and all of these binary files.

ADVANTAGES OF USING BINARY FILES:

Since all these companies use binary files for their storage system (be it documents, databases, spreadsheets, graphics, vector graphics and everything else), there must be a reason why these companies chose binary over the other formats. There's more than one reason actually. Here are the main advantages:

  • Access Speed: Because of the nature of binary files (no structures, no fields, no text, just bytes of different length basically) it's always instantaneous to get to any position the binary file.
  • Reading Speed: When you're reading an element in the binary file, the datatype of the variable you're reading into determines how many bytes will be read in the binary file. it's really that simple, an integer variable will read 2 bytes from the file in QuickBasic and 4 bytes in FreeBasic. Have you noticed just about here, that reading and writing to a binary file probably works alot like ready and writing to memory? if so, then you understand why binary files are faster than sequential file (and random access files in most cases)
  • Data Size: This is a specific advantage when compared to sequential files. Sequential files rely on a comma to determine field length and the carriage return and line feed for the end of line. Binary files have none of these. All data is positioned one after the other.
  • Data Flexibility: This is a specific advantage when compared to Random Access Files mostly. Random Access Files are in many cases a waste of space. Each Record will take the maximum size of a given structure to make sure it can later accept a variable of that maximum length. That's not the case in binary files at least it doesn't have to be, but it could be.

WORKING WITH BINARY FILES:

To work with binary files I do have to say you need to plan things ahead at least a bit. if you think of what I just said for the advantages, you can imagine why we might need to plan ahead. Don't forget that the variables you choose to read from or write to a binary file represents the actual amount of bytes that will be read. Planing ahead (a little) will help you stay ahead of the situation especially when changes to what is being read or written occurs.

Now, just like Random Access File, you'll be using the OPEN and CLOSE statements and the GET and PUT statements to GET (read) from the file and PUT (write) to the file. Here is a first small example to set the command syntax straight. The example will open a file in binary mode, put 2 values in the file and then read them back in and finally it will close the file.

-------------------------------START OF PROGRAM---------------------------------

' ---------------------------------

' Work variables for this example

' ---------------------------------

DIM FileHandle AS INTEGER

DIM FirstField AS INTEGER ' TO READ/WRITE 2 BYTES (4 BYTES IN FREEBASIC)

DIM SecondField AS LONG ' TO READ/WRITE 4 BYTES (4 BYTES IN FREEBASIC)

' ------------------------

' First we open the file

' ------------------------

FileHandle = FREEFILE

OPEN "TEST.BIN" FOR BINARY AS #FileHandle

' ------------------------

' Next we write 2 values

' ------------------------

FirstField = 4

SecondField = 13040392

PUT #FileHandle, , FirstField

PUT #FileHandle, , SecondField

' ------------------------------------------------------

' We clear the variables to test the reading of values

' ------------------------------------------------------

FirstField = 0

SecondField = 0

' ---------------------------------------------------

' We position ourselves on the first byte (to read)

' ---------------------------------------------------

SEEK #FileHandle, 1

GET #FileHandle, , FirstField

GET #FileHandle, , SecondField

' ------------------------------------------

' We print the variables we have just read

' if it worked, 4 and 13040392 should show

' ------------------------------------------

PRINT FirstField, SecondField

' ------------------------------

' We close the file, of course

' ------------------------------

CLOSE #FileHandle

--------------------------------END OF PROGRAM----------------------------------

I believe this little example is pretty much self explanatory so I'm not going to comment on it too much. Basically notice the change in the OPEN statement which now says OPEN "FileName.Ext" FOR BINARY AS #FileHandle. I use the FREEFILE here as well because it's good practice in binary files just like any other file types. Naturally, at the end of the program I close the file.

A MORE PRACTICAL USE OF BINARY FILES:

What you saw in the first sample program above probably has you wondering what the real use of binary files is. It's important to know that to read or write binary files you can use any data type, including User Defined Types that you create. You can also mix and match data types as well. For example you could easily have a TYPE / END TYPE structure for the header part of the file and a list of LONG values for the rest of the file. It really depends on what the needs are. That's the beauty of binary files, the way you can mix and match types of data and as long as you know what you did you will always be able to read the data the way it's supposed to be read.

In this section we'll be creating a little example that you'll be able to use as a template for your own creations. Most binary files created by software manufacturers are generally composed of 2 parts. The header and the rest of the file (which can be one big remander of the file (in the case of graphic file formats) or more of a repetetive nature (much like the now well known DBase File Format (the DBF file). Some might have 3 or more parts as well. Our example will user TYPE Definitions to create a generic database structure somewhat similar to DBASE's DBF file format. Note that for this example we won't be creating Indexes or relationship, we'll simply create a format to store the data we need.

DATABASE FORMAT DEFINITION:

When create a file format that can change, you have to ask yourself how to go about it. What you will need to know in order to have control over the rest of the file. In other words, you'll want to know what will go in the header part of the database file. Here's a User Defined Type that typically would be adequate for our needs.

' ------------------------------------------------------

' This Structure holds the database header information

' ------------------------------------------------------

TYPE HeaderInformation

DatabaseName AS STRING * 30

DatabaseVersion AS LONG

Created AS STRING * 10' STRING representing a date

Modified AS STRING * 10' STRING representing a date

FieldCount AS INTEGER

RecordCount AS LONG

HeaderLength AS INTEGER' Length of this header structure

RecordLength AS LONG ' Length of the Values taken by a record

RecordOffset AS LONG ' Position of the start of the 1st Record

END TYPE

Next we'll need a structure to hold each field definition. We'll need to know the field name, it's type, the length it will occupy and it's value. Because of the variety of data types that can be in value, we'll use a String to hold the value. The structure will look like this:

' -------------------------------------------------

' This Structure holds Field Specific Information

' -------------------------------------------------

TYPE FieldInformation

FieldNumber AS LONG

FieldNameAS STRING * 32

FieldTypeAS STRING * 1 ' T=Text, I=Integer, L=Long, D=Double

FieldLength AS INTEGER

END TYPE

Of course, when you have written data to the database file, you can use a user defined type, like in this example, to load and save the information. You could use a series of variables of the desired types as well depending on the needs. Here we are making a database example, so, a user defined type becomes most appropriate for this type of binary file usage.

' ------------------------------------------------------------------------

' This Structure holds Employee Information read from the database file

' -----------------------------------------------------------------------

TYPE EmployeeInformation

EmployeeNumber AS LONG

EmployeeName AS STRING * 40

Address1 AS STRING * 50

Address2 AS STRING * 30

City AS STRING * 20

StateAS STRING * 30

ZipCode AS LONG

TelephoneAS STRING * 14

Fax AS STRING * 14

HourlyRate AS DOUBLE

END TYPE

We will now need variables to hold everything in order to create the database file itself. We will need a variable for the File Handle, one for the Header and an array to hold the field definitions. In this example we will use a fixed array but we could use a dynamic array if we'd need the number of fields to change.

' ----------------------------------------------

' Needed Variable Declarations for this module

' ----------------------------------------------

DIM DBHeader AS HeaderInformation

DIM DBFields(1 TO 10) AS FieldInformation ' 10 fields for this example

DIM CurrentEmployee AS EmployeeInformation

DIM DatabaseHandleAS INTEGER

DIM Counter AS INTEGER

DIM WorkLengthAS INTEGER

DIM TotalRecordLength AS INTEGER

Now that we have all this defined, we will go about defining the structure and saving that definition as the beginning of the binary file. To do so, we'll need to make sure that all the fields we plan on defining are indeed defined and that the fields in the header section are filled prior to writing it to the binary file.

First we'll create a subroutine that will serve to assign values to our field definition array. This step will save us many lines of repetitive coding when we go about actually adding the field information for the database.

SUB AddField(TheNumber AS LONG, TheName AS STRING, TheType AS STRING, Length AS INTEGER)

DBFields(Number).FieldNumber = Number

DBFields(Number).FieldName = Name

DBFields(Number).FieldType = TheType

DBFields(Number).FieldLength = Length

END SUB

Next comes the actual Assignment of the different Fields that define the structure of our database. In this example we know the number of fields already. However, we could be dealing with a structure created by another user perhaps and we'd need to determine everything by reading the header, then the field definition and finally going about reading records in the database area. For now, let's just create the fields we'll be using here.

' ------------------------------------------

' Add the definitions to the DBField Array

' ------------------------------------------

CALL AddField( 1, "EmployeeNumber", "I", 2 ) ' Last parameter would be 4 in FreeBasic

CALL AddField( 2, "EmployeeName", "T", 40 )

CALL AddField( 3, "Address1", "T", 50 )

CALL AddField( 4, "Address2", "T", 30 )

CALL AddField( 5, "City", "T", 20 )

CALL AddField( 6, "State", "T", 30 )

CALL AddField( 7, "ZipCode","I", 2 ) ' Last parameter would be 4 in FreeBasic

CALL AddField( 8, "Telephone", "T", 14 )

CALL AddField( 9, "Fax","I", 14 )

CALL AddField(10, "HourlyRate", "D", 8 )

Now that our field definition is complete we have everything we need to fill up our database Header with the needed information.

' -------------------------------------------------

' Evaluate a few values before assigning DBHeader

' -------------------------------------------------

WorkLength = LEN(DBHeader)

FOR Counter = 1 TO UBOUND(DBFields)

TotalRecordLength = TotalRecordLength + DBFields(Counter).FieldLength

NEXT Counter

' --------------------------------------------------------------

' Populate the DBHeader structure with appropriate information

' --------------------------------------------------------------

DBHeader.DatabaseName = "Employee"

DBHeader.DatabaseVersion = 1

DBHeader.Created = DATE$

DBHeader.Modified = DATE$

DBHeader.FieldCount = 9

DBHeader.RecordCount = 0

DBHeader.HeaderLength = WorkLength

DBHeader.RecordLength = TotalLength

DBHeader.RecordOffset = WorkLength + TotalLength + 1

At this point, all the structures are filled with the information we need and are ready to be saved to the database file. this is done quite simply in this example. If you prepare yourself properly in your own binary file projects, it could very well be this easy to write any file format you could invent. Here's the code to do the actual writing of the file format and field definition.

' ----------------------------------------------------------------

' Write The contents of the file definition to the database file

' ----------------------------------------------------------------

DatabaseHandle = FREEFILE

OPEN RTRIM$(DBHeader.DatabaseName) + ".df" FOR BINARY AS #DatabaseHandle

PUT #DatabaseHandle, , DBHeader

FOR Counter = 1 TO UBOUND(DBFields)

PUT #DatabaseHandle, , DBFields(Counter)

NEXT Counter

It doesn't get much shorter than this does it? When your data structures are prepared, it really is that simple. Next we'll add a few records In there for the sake of having data to work with. Once again, I'll use a user defined type to create a structure to represent the data I'll be reading. In this example I know the structure so I can do that, but in a more complete database application, you'd have to code yourself a dynamic mean to read and accomodate any data type (Perhaps an array of strings in which you convert the data read into the data type it should be treated as). We'll create 2 records in this example.

' ----------------------------------------------------------------------

' Assign values to the First Employee to be Saved to the database file

' ----------------------------------------------------------------------

CurrentEmployee.EmployeeNumber = 1

CurrentEmployee.EmployeeName = "Stephane Richard"

CurrentEmployee.Address1 = "I don't know"

CurrentEmployee.Address2 = "And I never will"

CurrentEmployee.City = "Somewhere"

CurrentEmployee.State = "Out There"

CurrentEmployee.ZipCode = 12345

CurrentEmployee.Telephone = "(000) 000-0000"

CurrentEmployee.Fax = "(000) 000-0000"

CurrentEmployee.HourlyRate = 37.5

' ---------------------------------------------

' Write this information to the database file

' ---------------------------------------------

PUT #DatabaseHandle, DBHeader.RecordOffset, CurrentEmployee

' ----------------------------------------------------------------------

' Assign values to the First Employee to be Saved to the database file

' ----------------------------------------------------------------------

CurrentEmployee.EmployeeNumber = 2

CurrentEmployee.EmployeeName = "Kristian Virtaken"

CurrentEmployee.Address1 = "Ain't got a clue"

CurrentEmployee.Address2 = "blueberry hill"

CurrentEmployee.City = "South of North"

CurrentEmployee.State = "down here"

CurrentEmployee.ZipCode = 12121

CurrentEmployee.Telephone = "(111) 111-1111"

CurrentEmployee.Fax = "(111) 111-1111"

CurrentEmployee.HourlyRate = 38.5 ' more than fair on salaries.

' ---------------------------------------------

' Write this information to the database file

' ---------------------------------------------

PUT #DatabaseHandle, , CurrentEmployee

' --------------------------------------------------------------------

' Let's not forget to update the Record Count in the database header

' --------------------------------------------------------------------

DBHeader.RecordCount = 2

PUT #DatabaseHandle, 1, DBHeader ' Save the whole header structure

At this point The database file should have everything it needs to manage itself saved in it's contents. What do we want to do next? How about reading the information back to make sure it got saved ok? To do that in this example, we'll position ourselves on the RecordOffset value from the DBHeader structure and loop RecordCount number of times reading the record and displaying it. We'll use CurrrentEmployee to read each record and display them. Again everything is prepared with the user defined types we created, when you can do that prior to handling the file, you can definitely shorten code and coding time as well. Here is the code to do the positioning and reading of the records.

' --------------------------------------------------------------

' Position our file pointer and loop 2 times to read and print

' --------------------------------------------------------------

SEEK #DatabaseHandle, DBHeader.RecordOffset

FOR Counter = 1 to DBHeader.RecordCount

GET #DatabaseHandle, , CurrentEmployee

PRINT "EMPLOYEE NUMBER: " + LTRIM$(STR$(CurrentEmployee.EmployeeNumber))

PRINT "EMPLOYEE NAME..: " + CurrentEmployee.EmployeeName

PRINT "HOURLY RATE....: ";

PRINT USING "##.##"; CurrentEmployee.HourlyRate

NEXT Counter

' ------------------------------

' And we close the file handle

' ------------------------------

CLOSE #DatabaseHandle

Fairly simple isn't it? Although I'm not the worse at writing tutorials, I can't claim to be 100% clear at everything I explain. you can email me any comments/questions/request for new tutorials even and I'll see what I can do. Hope you enjoyed this long but I hope good tutorial. I've certainly enjoyed writing it.

Stephane Richards

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.